[MidoNet-dev] Draft proposal: IPsec in midonet

Guillermo Ontañón guillermo at midokura.jp
Wed Feb 13 17:46:30 UTC 2013


Hi devs,

After some discussions and more thinking today, I'm including an updated
proposal below. I have a more clear idea on how I see the IPsec gateways
hooking into the virtual topology, I've elaborated on that and made the
DTOs simpler and no longer require two gateways per connection. You can
skip to the last section, the rest hasn't changed.

Feature description
-------------------

IPSec VPN setup compatible with AWS's VPC private gateway.

The setup should include redundancy/failover.

Amazon VPC connectivity (See [0])
---------------------------------

Since want fail-over, we need Amazon's 'dynamically-routed' flavour of
IPsec. This is what a fully redundant setup looks like:

                        ~~~~~~~~~~~~~~~~~~
                        Amazon VPC Subnets
                        ~~~~~~~~┬~~~~~~~~~
                                │
                           ┌────┴────┐
                           ╎  GW-1   ╎
                           └─┬─────┬─┘
       link-local-address-A  ╎     ╎ link-local-address-B
             BGP-instance-A  ╎     ╎ BGP-instance-B
                             ╎     ╎
                         ┌───┘     └────┐
                         ╎              ╎
                    ┌────┴────┐    ┌────┴────┐
                    ╎ IPsec-A ╎    ╎ IPsec-B ╎
                    └────┬────┘    └────┬────┘
        Public-Address-A ╎              ╎ Public-Address-B
                         ╎              ╎
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                             Internet
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                         ╎              ╎
        Public-Address-C ╎              ╎ Public-Address-D
                    ┌────┴────┐    ┌────┴────┐
                    ╎ IPsec-C ╎    ╎ IPsec-D ╎
                    └────┬────┘    └────┬────┘
                         ╎              ╎
                         └───┐     ┌────┘
                             ╎     ╎
             BGP-instance-C  ╎     ╎ BGP-instance-D
       link-local-address-C  ╎     ╎ link-local-address-D
                           ┌─┴─────┴─┐
                           ╎  GW-2   ╎
                           └────┬────┘
                                ╎
                        ~~~~~~~~┴~~~~~~~~~~
                        Other cloud subnets
                        ~~~~~~~~~~~~~~~~~~~

It works like this:

  - Two IPsec connections are established: IPsec-A - IPsec C, and
    IPsec B - IPsec D respectively.
  - For each IPsec connection pair (A-C, B-D) /30 network of link-local
    addresses is reserved. These addresses (LLA-A,B,C and D) are locally
    bound by each gateway behind the IPsec tunnels.
  - At this point, with the tunnels established, GW-1's traffic
    addressed to LLA-C will always take source address LLA-A and go
    through the A-C tunnel. Both gateways can now run two BGP instances
    each on the link-local-addresses, and packets between the subnets
    behind each gateway can start to flow through either IPsec tunnel.

(implementation note below - skip to next section if not interested)

Note that on this setup IPsec instances need to accept and encrypt
traffic from/to any network on either side, and they do not know that
when they establish the tunnel. Because the Linux kernel ipsec
implementation is policy based (you need Security Associations between
each pair of subnets that may communicate through the tunnel) this poses
a problem. It may be solved in two different ways (needs testing):

  - Statically add SA's localnet-0.0.0.0/0. Meaning that anytime the
    IPsec instance sees a packet coming from one of the known local
    networks the tuple of that network with 0.0.0.0/0 will match and let
    the traffic through. This may be just fine, as we will be fully in
    control of what traffic makes it to the IPsec instance.
  - Have the BGP instance notify the IPsec instance to add the SA's
    dynamically. This is obviously more complicated, specially because
    we are not talking about MM state but linux kernel state inside of a
    VM.

Digression: about the CloudStack implementation
-----------------------------------------------

See [1] for reference. CloudStack supports two types of crnnections:
L2TP and site-to-site. L2TP is meant for single users, not gateways, and
it tunnels l2tp packets make the user's machine be effectively in the
chosen VPC L2 network segment. This mode does not concern us for now.
Site-to-site VPNs in CloudStack appear to be old-fashioned policy based
IPsec tunnels. The full list of remote subnets is needed to configure
the connection and there's no mention of BGP.

However, CloudStack's site-to-site tunnels should still be compatible
with Amazon, ...without failover. Amazon added support for statically-
routed VPNs in September 2012 [2]. Seeing that Amazon extended their
offering with this feature and that it's the only mode that CloudStack
supports, it's probably worth considering supporting this mode too in
Midonet.

Proposal for adding dynamically-routed IPsec to Midonet
-------------------------------------------------------

 - Configuration parameters. (REST API operations to be derived from
   these). To setup a redundant IPsec VPN you need to provide:

    - Two public IPs to talk to the remote IPsec gateways.
    - Public IPs for the gateways on the other side.
    - Two private IPs to talk to a downstream virtual router.
    - Two addresses in different /30 link-local subnets, for BGP.
    - A virtual router. BGP will modify this router's routing table
      and both IPsec gateways should be connected directly to this router.
    - Local network ranges to advertise to the other side via BGP.
    - IKE pre-shared key.
    - IPsec connection parameters: PFS mode, hash function, encryption
      function. Amazon is very strict on this, requiring exactly:
      Group-2, SHA-1, AES-128. IMHO that's a correct philosophy,
      sticking to the common minimum. If midonet is as strict as Amazon
      the user doesn't need to supply this.

 - The REST API would get the following new DTOs:

    - IpsecGatewayDto (abstract)
        - publicIp: IP address
        - peerIp: IP address
        - privateIp: IP address
        - ikePsk: pre-shared-key for IKE exchange
        - dhGroup: read-only or not present, always 2
        - hashAlg: read-only or not present, always 'sha-1'
        - cryptoAlg: read-only or not present, always aes128
        - upPort: InteriorBridgePort, it's created automatically and
          starts in unlinked state. DELETE is disallowed.
        - downPort: InteriorBridgePort, it's created automatically and
          starts in unlinked state. DELETE is disallowed.
        - type (IpsecDynamic|IpsecStatic)

    - DynamicIpsecGatewayDto
        - bgpLocalAsn
        - bgpPeerAsn
        - bgpLocalIp: link-local ip address
        - bgpPeerIp:  localIp's /30 peer
        - localSubnets: list of CIDR blocks to advertise
        - virtualRouter: UUID of a virtual router
        - type = IpsecDymamic

    - StaticIpsecGatewayDto (if/when midonet supports this)
        - localSubnets: list of CIDR blocks
        - remoteSubnets: list of CIDR blocks
        - type = IpsecStatic

 ...and one new top-level URL for the IpsecGatewayDto objects described
 above (all other objects hang from it):

       - /ipsec_gateways?tenant_id=:tenantId - list a tenant's ipsec
         gateways. ApplicationDto would get a new getIpsecGateways()
         method.

       - /ipsec_gateways/:ipsec_id - URL for a IpsecGateway POST,DELETE,GET.


 - The virtual topology would look like this to the user in a typical
   setup. Note that the API would allow for the creation of just one
   gateway or more than two for that matter. Also the up/down ports on
   the IpsecGateway device can be connected anywhere in the virtual
   topology:

                       ┌────────┐
                       ╎  Prov. ├───────────┐
                       ╎ Router ├──────────┐╎
                       └────┬───┘          ╎╎ 2x IpsecGateway.upPort
                            ╎       ┌──────┴────────────────┐
                            ╎       ╎ DynamicIpsecGateway** ╎┐
                            ╎       └┬─────┬────────────────┘╎ 2x
                            ╎        └─────╎┬────────────────┘
                            ╎              ╎╎ 2x IpsecGateway.downPort

                       ┌────┴───┐          ╎╎
   BGP injected ──────>╎ Tenant ├──────────┘╎
       routes          ╎ Router ├───────────┘
                       └──┬───┬─┘
                   ┌──────┘   └─────┐
             ┌─────┴────┐     ┌─────┴────┐
             ╎ Bridge-A ╎ ... ╎ Bridge-N ╎
             └──────────┘     └──────────┘


 ** each IPsec/BGP virtual device would expand to this:

                        DynamicIpsecGateway
                      ┌------------------------------------------┐
                      ╵                                          ╵
               upPort ╵        ┌────────────┐                    ╵
 (InteriorBridgePort) ╵        ╎  Upstream  ╎ ExteriorBridgePort ╵
                ───────────────┤   Bridge   ├──────────┐         ╵
                      ╵        └────────────┘          ╎         ╵
                      ╵                                ╎publicIp ╵
                      ╵                         ┌──────┴──────┐  ╵
                      ╵                         ╎  IPsec/BGP  ╎  ╵
                      ╵                         ╎ VM instance ╎  ╵
                      ╵                         └──────┬──────┘  ╵
             downPort ╵        ┌────────────┐          ╎privateIp╵
 (InteriorBridgePort) ╵        ╎ Downstream ╎          ╎         ╵
                ───────────────┤   Bridge   ├──────────┘         ╵
                      ╵        └────────────┘ ExteriorBridgePort ╵
                      ╵                                          ╵
                      └------------------------------------------┘

Thus, this design proposes that several virtual devices would be
aggregated to offer a single new virtual 'super-device'. In this case
the sub-devices that make up this super-device should not be modifiable
via API calls. A new property for virtual devices 'superDevice' (its
value being the UUID of the parent super-device) could serve to mark
which devices are read-only to the REST API.

A second problem of this design is that an IpsecGateway offers two
InteriorBridgePorts to the outside world. This is clearly a limitation
for two reasons:

    a) they are interior (the anticipated scenario for an IPsec
    gateway). This could be overcome but letting the API create the
    upPort & downPort objects by POSTing a BridgePort of the desired
    type.

    b) bridge ports cannot be connected to bridges. In the topology
    shown above, it would make sense (for simplicity in the l3 setup)
    to connect the downPorts of both IPsecGateways to a bridge and that
    bridge to the tenant router. This is not possible with the proposed
    IpsecGateway device.
    One solution might be the substitution of the two bridges inside the
    IPsecGateway for two new devices of a new kind, whose mission would
    be act as dumb two-port bridge with an exterior port on one side and
    an interior port (linkable to a bridge) on the other side.
    A second solution would be to just offer to physical vif names that
    could be bound to arbitrary exterior ports through API calls, as
    opposed to offering the proposed upPort and downPort.

Links
-----

[0]
http://docs.aws.amazon.com/AmazonVPC/latest/NetworkAdminGuide/Introduction.html
[1]
http://incubator.apache.org/cloudstack/docs/en-US/Apache_CloudStack/4.0.0-incubating/html/Admin_Guide/vpn.html#site-to-site-vpn
[2]
http://aws.typepad.com/aws/2012/09/amazon-vpc-additional-vpn-features.html



On Wed, Feb 13, 2013 at 12:22 PM, Guillermo Ontañón
<guillermo at midokura.jp>wrote:

> Hi Dan,
>
> Good questions, my comments below...
>
>
> On Wed, Feb 13, 2013 at 6:43 AM, Dan Mihai Dumitriu <dan at midokura.com>wrote:
>
>> Hi Guillermo,
>>
>> I was just about to send a message with a similar proposal, and you just
>> beat me to it! :) This looks great.
>>
>> There are a couple of points I want to raise.
>>
>> - since one of our objectives is to connect a private cloud to an AWS
>> VPC, is it correct that our IPSec VPN will have to run in client mode as
>> well?  Actually, I don't know how this stuff is supposed to work.  Is one
>> side a server and the other a client?  Or is the configuration of such a
>> VPN symmetric?
>>
>
> The configuration is symmetric, there's no server and client. Once set up,
> both IPsec gateways should try to contact the other (UDP port 500).
>
>
>> - for managing the IPSec instances, do we want to leverage the VMs of the
>> cloud stack on top of which we're running?  We talked about running them in
>> containers before.  I think I might be coming back around to thinking that
>> VMs are the way to go. :) If the performance is good enough, something
>> which has to be measured.  (CloudStack runs all services in VMs.)
>>
>
> That's the biggest question we have, I was just discussing this with Abel.
> I don't have a final opinion yet, before I do want to do a real test with
> Amazon. Some thoughts:
>
>   * Containers will not be possible if the IPsec SPDB is not per-container
> too. I haven't tried but have been unable to find anyone who has gotten it
> to work.
>   * The downside of using a VM is maintaining the image with updates, etc.
>   * A second downside of the VM is that we'd be redoing some of the work
> in the current BGP feature.
>   * The upside of using a VM is very attractive to me: you have this black
> box with two ports that you can plug anywhere into the virtual topology. In
> this case we'd put BGP and IPsec in the VM, plugged to a tenant router on
> one side and to the provider router (or even a second port of the tenant
> router) on the other.
>   * Unless we do a clever trick (connection through the datapath?) BGP and
> IPsec have to run in the same container/vm/whatever. Because when BGP sends
> packets out of its link-local-address the destination address is on the
> same ip subnet and on other side of the tunnel, so they have to be fed into
> the networking stack that has the IPsec SPDB. I mention this because Abel
> and I have been discussing ways of making the BGP and IPsec instances
> independent, reusing our current BGP setup.
>
>
>> - CloudStack already has an IPSec VPN in the router VM.  We just spoke
>> yesterday about leveraging that to support the feature when running
>> together with MN.  In order to add a BGP option, we have to extend the CS
>> API itself.  To make this fault tolerant, we should probably prepare
>> another image.
>>
>> - On the OpenStack side, in order to consider this a complete feature, we
>> also need to add APIs to quantum.  Does anyone know if there is any ongoing
>> blueprint for a VPC feature?  In addition, we need to prepare a VM with the
>> ipsec code and some agent that we can control from MN.  For the control
>> connection between the MN agent and the agent in the VM, one way to go is
>> to do as CloudStack does, that is to create another interface that is
>> bridged to the host with a link local address.
>>
>
> Sounds like we could create a VM base image with an extensible agent and
> VM-Midonet port that 3rd parties could build on.
>
>
>>
>> Thoughts?
>>
>> Cheers,
>> Dan
>>
>>
>> On Wed, Feb 13, 2013 at 2:41 AM, Guillermo Ontañón <guillermo at midokura.jp
>> > wrote:
>>
>>> Hello devs,
>>>
>>> I'm including below my initial discussion about adding IPsec
>>> support for midonet. It's still a bit rough but enough to
>>> spark discussion. Some details (implementation related mostly)
>>> are still blurry, I'm going to discuss the BGP part with Abel
>>> tomorrow and create a test setup connecting to Amazon to get
>>> those details right.
>>>
>>> Feature description
>>> -------------------
>>>
>>> IPSec VPN setup compatible with AWS's VPC private gateway.
>>>
>>> The setup should include redundancy/failover.
>>>
>>> Amazon VPC connectivity (See [0])
>>> ---------------------------------
>>>
>>> Since want fail-over, we need Amazon's 'dynamically-routed' flavour of
>>> IPsec. This is what a fully redundant setup looks like:
>>>
>>>                         ~~~~~~~~~~~~~~~~~~
>>>                         Amazon VPC Subnets
>>>                         ~~~~~~~~┬~~~~~~~~~
>>>>>>                            ┌────┴────┐
>>>                            ╎  GW-1   ╎
>>>                            └─┬─────┬─┘
>>>        link-local-address-A  ╎     ╎ link-local-address-B
>>>              BGP-instance-A  ╎     ╎ BGP-instance-B
>>>                              ╎     ╎
>>>                          ┌───┘     └────┐
>>>                          ╎              ╎
>>>                     ┌────┴────┐    ┌────┴────┐
>>>                     ╎ IPsec-A ╎    ╎ IPsec-B ╎
>>>                     └────┬────┘    └────┬────┘
>>>         Public-Address-A ╎              ╎ Public-Address-B
>>>                          ╎              ╎
>>>                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>                              Internet
>>>                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>                          ╎              ╎
>>>         Public-Address-C ╎              ╎ Public-Address-D
>>>                     ┌────┴────┐    ┌────┴────┐
>>>                     ╎ IPsec-C ╎    ╎ IPsec-D ╎
>>>                     └────┬────┘    └────┬────┘
>>>                          ╎              ╎
>>>                          └───┐     ┌────┘
>>>                              ╎     ╎
>>>              BGP-instance-C  ╎     ╎ BGP-instance-D
>>>        link-local-address-C  ╎     ╎ link-local-address-D
>>>                            ┌─┴─────┴─┐
>>>                            ╎  GW-2   ╎
>>>                            └────┬────┘
>>>>>>                         ~~~~~~~~┴~~~~~~~~~~
>>>                         Other cloud subnets
>>>                         ~~~~~~~~~~~~~~~~~~~
>>>
>>> It works like this:
>>>
>>>   - Two IPsec connections are established: IPsec-A - IPsec C, and
>>>     IPsec B - IPsec D respectively.
>>>   - For each IPsec connection pair (A-C, B-D) /30 network of link-local
>>>     addresses is reserved. These addresses (LLA-A,B,C and D) are locally
>>>     bound by each gateway behind the IPsec tunnels.
>>>   - At this point, with the tunnels established, GW-1's traffic
>>>     addressed to LLA-C will always take source address LLA-A and go
>>>     through the A-C tunnel. Both gateways can now run two BGP instances
>>>     each on the link-local-addresses, and packets between the subnets
>>>     behind each gateway can start to flow through either IPsec tunnel.
>>>
>>> (implementation note below - skip to next section if not interested)
>>>
>>> Note that on this setup IPsec instances need to accept and encrypt
>>> traffic from/to any network on either side, and they do not know that
>>> when they establish the tunnel. Because the Linux kernel ipsec
>>> implementation is policy based (you need Security Associations between
>>> each pair of subnets that may communicate through the tunnel) this poses
>>> a problem. It may be solved in two different ways (needs testing):
>>>
>>>   - Statically add SA's localnet-0.0.0.0/0. Meaning that anytime the
>>>     IPsec instance sees a packet coming from one of the known local
>>>     networks the tuple of that network with 0.0.0.0/0 will match and let
>>>     the traffic through. This may be just fine, as we will be fully in
>>>     control of what traffic makes it to the IPsec instance.
>>>   - Have the BGP instance notify the IPsec instance to add the SA's
>>>     dynamically. This is obviously more complicated, specially because
>>>     we are not talking about MM state but linux kernel state inside of a
>>>     VM.
>>>
>>> Digression: about the CloudStack implementation
>>> -----------------------------------------------
>>>
>>> See [1] for reference. CloudStack supports two types of crnnections:
>>> L2TP and site-to-site. L2TP is meant for single users, not gateways, and
>>> it tunnels l2tp packets make the user's machine be effectively in the
>>> chosen VPC L2 network segment. This mode does not concern us for now.
>>> Site-to-site VPNs in CloudStack appear to be old-fashioned policy based
>>> IPsec tunnels. The full list of remote subnets is needed to configure
>>> the connection and there's no mention of BGP.
>>>
>>> However, CloudStack's site-to-site tunnels should still be compatible
>>> with Amazon, ...without failover. Amazon added support for statically-
>>> routed VPNs in September 2012 [2]. Seeing that Amazon extended their
>>> offering with this feature and that it's the only mode that CloudStack
>>> supports, it's probably worth considering supporting this mode too in
>>> Midonet.
>>>
>>> Proposal for adding dynamically-routed IPsec to Midonet
>>> -------------------------------------------------------
>>>
>>>  - Configuration parameters. (REST API operations to be derived from
>>>    these). To setup a redundant IPsec VPN you need to provide:
>>>
>>>     - Two public IPs
>>>     - Two addresses in different /30 link-local subnets, for BGP.
>>>     - A virtual router. BGP will modify this router's routing table
>>>       and both IPsec gateways will be connected directly to this router.
>>>       User-visible parameters related to this connection to the virtual
>>>       router should be kept to a minimum. It's not yet clear how the
>>>       ipsec instances and bgp will be plugged to the vrouter.
>>>     - Public IPs for the gateways on the other side.
>>>     - Local network ranges to advertise to the other side via BGP.
>>>     - IKE pre-shared key.
>>>     - IPsec connection parameters: PFS mode, hash function, encryption
>>>       function. Amazon is very strict on this, requiring exactly:
>>>       Group-2, SHA-1, AES-128. IMHO that's a correct philosophy,
>>>       sticking to the common minimum. If midonet is as strict as Amazon
>>>       the user doesn't need to supply this.
>>>
>>>  - This a sketch of what new REST API DTOs there would be:
>>>     - IpsecPairingDto. With fields:
>>>         - publicIp
>>>         - peerIp
>>>         - dhGroup (read-only if only 2 is allowed)
>>>         - hashAlg (read-only if only sha-1 is allowed)
>>>         - cryptoAlg (read-only if only aes128 is allowed)
>>>     - BgpIpsecPairingDto. Fields:
>>>         - localAsn
>>>         - remoteAsn
>>>         - localIp (must be link-local)
>>>         - peerIp (must be in the same /30 network as localIp, it could
>>>           be inferred from it)
>>>     - DynamicVpnDto
>>>         - virtualRouter
>>>         - ipsecPairingA
>>>         - ipsecPairingB
>>>         - bgpPairingA
>>>         - bgpPairingB
>>>         - localSubnets
>>>     - StaticVpnDto (when/if midonet supports this type of connection)
>>>         - ipsecPairing
>>>         - localSubnets
>>>         - remoteSubnets
>>>
>>>  - The virtual topology would look like this to the user:
>>>
>>>                                           ╎ Public IP (interior vport)
>>>
>>>                                     ┌─────┴────┐
>>>                             ╵       ╎ IPsec VM ╎ X 2
>>>                             ╵       ╎ instance ╎
>>>                             ╵       └─────┬────┘
>>>                        ┌────┴───┐         ╎
>>>    BGP injected ──────>╎ Tenant ╎         ╎
>>>        routes          ╎ Router ├─────────┘
>>>                        └──┬───┬─┘
>>>                    ┌──────┘   └─────┐
>>>              ┌─────┴────┐     ┌─────┴────┐
>>>              ╎ Bridge-A ╎ ... ╎ Bridge-N ╎
>>>              └──────────┘     └──────────┘
>>>
>>>
>>> Links
>>> -----
>>>
>>> [0]
>>> http://docs.aws.amazon.com/AmazonVPC/latest/NetworkAdminGuide/Introduction.html
>>> [1]
>>> http://incubator.apache.org/cloudstack/docs/en-US/Apache_CloudStack/4.0.0-incubating/html/Admin_Guide/vpn.html#site-to-site-vpn
>>> [2]
>>> http://aws.typepad.com/aws/2012/09/amazon-vpc-additional-vpn-features.html
>>>
>>>
>>>
>>> Regards,
>>>
>>> -- Guillermo Ontañón
>>>
>>
>>
>
>
> --
> -- Guillermo Ontañón
>



-- 
-- Guillermo Ontañón
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.midonet.org/pipermail/midonet-dev/attachments/20130213/098ae5c8/attachment-0001.html>


More information about the MidoNet-dev mailing list