[MidoNet-dev] Draft proposal: Ping through NAT support (#435)
Pino de Candia
gdecandia at midokura.com
Fri Feb 15 16:12:31 UTC 2013
On Friday, February 15, 2013 at 3:40 PM, Navarro, Galo wrote:
> Hello dev force!
Galo, thanks for the write-up!
> I'm sending below a draft proposal for issue #435. Thanks in advance
> for any comments / doubts / corrections / suggestions / improvements.
> # Use case
> Currently it's impossible to ping from a private network behind NAT to
> an external address. Translation is not applied for ICMP because
> lacking ports, we'd need to match on additional ICMP-specific fields
> that are not supported by OpenFlow nor OVS, so it in practise we can't
> route ICMP replies back to the correct sender.
> This feature is typically supported by iptables and commodity routers,
> so we'd like to make Midonet able to circumvent OVS/OpenFlow's
> limitations. This will involve:
> 1. Implementing ICMP packets in NAT rules
> 2. Forcing user-space processing for ICMP req/repl
> ## Support ICMP messages in NAT rules
> For ICMP messages, the identifier would act as transport source /
> destination (see [RFC3022], esp. sections 2.2 and 4.1, as well as
> [RFC5508]). The NatLeaseManager would treat these as ports, thus
> being able to discriminate origins of ICMP echo requests sent to the
> same destination:
> If both A and B send an ICMP(src,dst,type,id) to Z accross a router R.
> - A sends ICMP(A,Z,req,x), R translates to ICMP(R,Z,req,x')
> - B sends ICMP(B,Z,req,y), R translates to ICMP(R,Z,req,y')
> Nat mappings are: (A,x,Z,x') and (B,y,Z,x'). Thus:
I asked Galo about the exact key-value pairs we would write to Cassandra and he explained (for the ICMP from A):
forward-key: (A, id, Z, id) --- ip_src, tp_src, ip_dst, tp_dst - I think that's the order in the code.
forward-value (R, id)...... or (R, id') if we're translating the identifier
return-key: (R, id, Z, id)...... or (R, id', Z, id') if we're translating the identifier
return-value: (A, id)
But we probably will NOT translate the identifier - and just rely on ICMP's random choice of identifier to avoid conflicts/collisions.
> - Z sends ICMP(Z,R,rep,x'), R translates to ICMP(Z,A,rep,x)
> - Z sends ICMP(Z,R,rep,y'), R translates to ICMP(Z,B,rep,y)
> One benefit of this approach is that we'd be able to reuse most of the
> NatMapping code, rather than writing separate mapping for ICMP messages
> alone. The main insufficiency of the current implementation comes from
> the possibility of clashes between ICMP identifiers and port numbers
> used by other applications This is solved in practise by including the
> protocol in the mapping criteria (found references to this in  and
> This solution would require adding the protocol to the current
> NatLeaseManager + NatMapping. The logic to allocate a NAT lease would
> work like this:
> - TCP/UDP: identical to current implementation
> - ICMP: lease the ICMP identifier as both source and destination "port"
> This leaves one chance of collision (both A and B may send an ICMP req
> with the same identifier, R won't be able to reverse-translate). To get
> around this we can:
> 1. Drop an ICMP echo request if the identifier is already leased. These
> leases should probably have a low TTL.
> 2. Make the NatLeaseManager hold a separate list of free identifiers and
> assign them similarly as is done for ports.
> (1) provides less complexity at the cost of some dropped ICMP requests
In our chat we agreed to avoid any 'reservation' of identifiers that looks like SNAT today.
(We're trying to deprecate the SNAT block leases in favor of randomly selecting the SNAT port)
> ## Force userspace processing of ICMP messages
> This is necessary because ODP does not parse the identifier fields of
> the ICMP messages which is the only way to map src and dst accross a
> 1. Simulate ICMP messages should be simulated normally, but never
> produce installed flows.
> 2. Make ForwardNatRule and ReverseNatRule deal with ICMP by themselves.
> Rules are only provided with context that may be relevant for installed
> flows, so the ICMP identifier is not there and NAT rules cannot perform
> the translation. There are various options to solve this:
> - Make the Router artificially set the WildcardMatch's transport dst and
> src to the ICMP identifier before they enter chain.apply. This
> approach is problmatic considering that non-NAT rules will suddenly
> need to deal with a NAT-specific hack. Also, this solution will simply
> not work to support ICMP errors since, as it will be explained further
> down, we'll need rules to examine the contents of the ICMP payload.
> - Extend WildcardMatch to include either the original packet so
> that each rule can simply examine the contents freely.
> - Extend WildcardMatch adding the ICMP source/dst identifier.
Here's an idea that came up while we were chatting:
Extend WildcardMatch like this:
- it has a new field 'icmp_echo_identifier'
- when parsing a packet to make a WMatch, if it's an ICMP echo request or reply,
fill the icmp_echo_identifier field
Now, when we simulate an ICMP traversing a NAT, after we decide the mapping,
we have enough expressive power in the WMatch that we can make wildcarded
flows that deal with ICMP of a specific identifier.
So, all ICMPs will come up to Midolman, but only the first one in the flow will
need to be simulated. Subsequent ones will be matched in the Wildcard Flow
Table and immediately result in an 'emit' command to the datapath (because as
you said we shouldn't install kernel flows or we'll get incorrect behavior - won't be
> The last option seems better because it makes it easy to reflect that,
> in fact, we're extending the packet parsing capabilities of ODP.
> WildcardMatch would start including "ODP-supported" fields, and
> "ODP-unsupported" fields (ICMP id and payload would be part of these).
> MM will emit the modified packet, but when it comes to installing new
> flows the FlowController will simply ignore those that involve
> unsupported matching fields. If further versions of ODP start parsing
> unsupported fields, the corresponding flows can start being installed.
> ## Further support for ICMP error messages (#513)
> As Jacob mentions one of the most important ICMP error messages to
> support accross NAT would be ICMP Destination Unreachable, esp.
> fragmentation required, etc. An ICMP error from Z to A triggered in
> response to a TCP packet NAT'ed by R would look like this:
> ICMP(Z,R,dest-unreachable,frag-reqd, (A,x',Z,y'))
> By examining the payload R would be able to use the payload for the
> reverse mapping and deliver the ICMP error to A:
> ICMP(Z,A,dest-unreachable,frag-reqd, (A,x,Z,y))
> ## API changes
> Initially it should not necessarily involve any API changes since most
> of the work is localized in core code, but since part of the proposal
> involves adding the protocol to NAT rules it may be considered to
> expose this field also in the API as part of this issue.
> ## References
> : <http://tools.ietf.org/html/rfc3022>
> : <http://tools.ietf.org/html/rfc5508#page-6>
> : <http://hasenstein.com/linux-ip-nat/diplom/node6.html>
> : <http://superuser.com/questions/135094/how-does-a-nat-server-forward-ping-icmp-echo-reply-packets-to-users#135098>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the MidoNet-dev