[MidoNet-dev] Draft proposal: Ping through NAT support (#435)

Navarro, Galo galo at midokura.com
Mon Feb 18 11:36:35 UTC 2013


Hi, thanks for your comments.

I've copied the proposal to a wiki page, incorporating your comments,
additions and some amendments.

The URL is: https://sites.google.com/a/midokura.jp/wiki/icmp-echo-support-through-nat

@Pino: do you want me to link to it from the features page?

Cheers,
/g

On 18 February 2013 09:32, Sblendido, Rossella <rossella at midokura.com> wrote:
> Hi guys,
>
> some comments inline...
>
> Thanks,
>
> Rossella
>
> Il giorno 15/feb/2013 17:12, "Pino de Candia" <gdecandia at midokura.com> ha
> scritto:
>
>
>>
>> On Friday, February 15, 2013 at 3:40 PM, Navarro, Galo wrote:
>>>
>>> Hello dev force!
>>
>> Galo, thanks for the write-up!
>>>
>>>
>>> I'm sending below a draft proposal for issue #435. Thanks in advance
>>> for any comments / doubts / corrections / suggestions / improvements.
>>>
>>> # Use case
>>>
>>> Currently it's impossible to ping from a private network behind NAT to
>>> an external address. Translation is not applied for ICMP because
>>> lacking ports, we'd need to match on additional ICMP-specific fields
>>> that are not supported by OpenFlow nor OVS, so it in practise we can't
>>> route ICMP replies back to the correct sender.
>>>
>>> This feature is typically supported by iptables and commodity routers,
>>> so we'd like to make Midonet able to circumvent OVS/OpenFlow's
>>> limitations. This will involve:
>>>
>>> 1. Implementing ICMP packets in NAT rules
>>> 2. Forcing user-space processing for ICMP req/repl
>>>
>>> ## Support ICMP messages in NAT rules
>>>
>>> For ICMP messages, the identifier would act as transport source /
>>> destination (see [RFC3022][1], esp. sections 2.2 and 4.1, as well as
>>> [RFC5508][2]). The NatLeaseManager would treat these as ports, thus
>>> being able to discriminate origins of ICMP echo requests sent to the
>>> same destination:
>>>
>>> If both A and B send an ICMP(src,dst,type,id) to Z accross a router R.
>>>
>>> - A sends ICMP(A,Z,req,x), R translates to ICMP(R,Z,req,x')
>>> - B sends ICMP(B,Z,req,y), R translates to ICMP(R,Z,req,y')
>>>
>>> Nat mappings are: (A,x,Z,x') and (B,y,Z,x'). Thus:
>>
>> I asked Galo about the exact key-value pairs we would write to Cassandra
>> and he explained (for the ICMP from A):
>> forward-key: (A, id, Z, id) --- ip_src, tp_src, ip_dst, tp_dst  - I think
>> that's the order in the code.
>> forward-value (R, id)......       or (R, id') if we're translating the
>> identifier
>> return-key: (R, id, Z, id)......      or (R, id', Z, id') if we're
>> translating the identifier
>> return-value: (A, id)
>>
>> But we probably will NOT translate the identifier - and just rely on
>> ICMP's random choice of identifier to avoid conflicts/collisions.
>
> +1 ICMP's identifiers should be random enough.
>
>
>>>
>>>
>>> - Z sends ICMP(Z,R,rep,x'), R translates to ICMP(Z,A,rep,x)
>>> - Z sends ICMP(Z,R,rep,y'), R translates to ICMP(Z,B,rep,y)
>>>
>>> One benefit of this approach is that we'd be able to reuse most of the
>>> NatMapping code, rather than writing separate mapping for ICMP messages
>>> alone. The main insufficiency of the current implementation comes from
>>> the possibility of clashes between ICMP identifiers and port numbers
>>> used by other applications This is solved in practise by including the
>>> protocol in the mapping criteria (found references to this in [3] and
>>> [4]).
>>>
>>> This solution would require adding the protocol to the current
>>> NatLeaseManager + NatMapping. The logic to allocate a NAT lease would
>>> work like this:
>>>
>>> - TCP/UDP: identical to current implementation
>>> - ICMP: lease the ICMP identifier as both source and destination "port"
>>
>> lease->use?
>>>
>>>
>>> This leaves one chance of collision (both A and B may send an ICMP req
>>> with the same identifier, R won't be able to reverse-translate). To get
>>> around this we can:
>>> 1. Drop an ICMP echo request if the identifier is already leased. These
>>> leases should probably have a low TTL.
>>> 2. Make the NatLeaseManager hold a separate list of free identifiers and
>>> assign them similarly as is done for ports.
>>>
>>> (1) provides less complexity at the cost of some dropped ICMP requests
>>
>> In our chat we agreed to avoid any 'reservation' of identifiers that looks
>> like SNAT today.
>> (We're trying to deprecate the SNAT block leases in favor of randomly
>> selecting the SNAT port)
>
> I remember we discussed how to improve that...didn't we write it up?
>
>
>>>
>>>
>>> ## Force userspace processing of ICMP messages
>>>
>>> This is necessary because ODP does not parse the identifier fields of
>>> the ICMP messages which is the only way to map src and dst accross a
>>> NAT.
>>>
>>> 1. Simulate ICMP messages should be simulated normally, but never
>>> produce installed flows.
>>> 2. Make ForwardNatRule and ReverseNatRule deal with ICMP by themselves.
>>>
>>> Rules are only provided with context that may be relevant for installed
>>> flows, so the ICMP identifier is not there and NAT rules cannot perform
>>> the translation. There are various options to solve this:
>>>
>>> - Make the Router artificially set the WildcardMatch's transport dst and
>>> src to the ICMP identifier before they enter chain.apply. This
>>> approach is problmatic considering that non-NAT rules will suddenly
>>> need to deal with a NAT-specific hack. Also, this solution will simply
>>> not work to support ICMP errors since, as it will be explained further
>>> down, we'll need rules to examine the contents of the ICMP payload.
>>> - Extend WildcardMatch to include either the original packet so
>>> that each rule can simply examine the contents freely.
>>> - Extend WildcardMatch adding the ICMP source/dst identifier.
>>
>> Here's an idea that came up while we were chatting:
>>
>> Extend WildcardMatch like this:
>> - it has a new field 'icmp_echo_identifier'
>> - when parsing a packet to make a WMatch, if it's an ICMP echo request or
>> reply,
>> fill the icmp_echo_identifier field
>>
>> Now, when we simulate an ICMP traversing a NAT, after we decide the
>> mapping,
>> we have enough expressive power in the WMatch that we can make wildcarded
>> flows that deal with ICMP of a specific identifier.
>>
>> So, all ICMPs will come up to Midolman, but only the first one in the flow
>> will
>> need to be simulated. Subsequent ones will be matched in the Wildcard Flow
>> Table and immediately result in an 'emit' command to the datapath (because
>> as
>> you said we shouldn't install kernel flows or we'll get incorrect behavior
>> - won't be
>> identifier specific).
>>
>> thanks!
>> Pino
>>>
>>>
>>> The last option seems better because it makes it easy to reflect that,
>>> in fact, we're extending the packet parsing capabilities of ODP.
>>>
>>> WildcardMatch would start including "ODP-supported" fields, and
>>> "ODP-unsupported" fields (ICMP id and payload would be part of these).
>>> MM will emit the modified packet, but when it comes to installing new
>>> flows the FlowController will simply ignore those that involve
>>> unsupported matching fields. If further versions of ODP start parsing
>>> unsupported fields, the corresponding flows can start being installed.
>>>
>>> ## Further support for ICMP error messages (#513)
>>>
>>> As Jacob mentions one of the most important ICMP error messages to
>>> support accross NAT would be ICMP Destination Unreachable, esp.
>>> fragmentation required, etc. An ICMP error from Z to A triggered in
>>> response to a TCP packet NAT'ed by R would look like this:
>>>
>>> ICMP(Z,R,dest-unreachable,frag-reqd, (A,x',Z,y'))
>>>
>>> By examining the payload R would be able to use the payload for the
>>> reverse mapping and deliver the ICMP error to A:
>>>
>>> ICMP(Z,A,dest-unreachable,frag-reqd, (A,x,Z,y))
>>>
>>> ## API changes
>>>
>>> Initially it should not necessarily involve any API changes since most
>>> of the work is localized in core code, but since part of the proposal
>>> involves adding the protocol to NAT rules it may be considered to
>>> expose this field also in the API as part of this issue.
>>>
>>> ## References
>>>
>>> [1]: <http://tools.ietf.org/html/rfc3022>
>>> [2]: <http://tools.ietf.org/html/rfc5508#page-6>
>>> [3]: <http://hasenstein.com/linux-ip-nat/diplom/node6.html>
>>> [4]:
>>> <http://superuser.com/questions/135094/how-does-a-nat-server-forward-ping-icmp-echo-reply-packets-to-users#135098>
>>>
>>> Cheers!
>>> /g
>>
>>


More information about the MidoNet-dev mailing list