[MidoNet-dev] MidoNet: L2 gateway (connecting physical L2 into the virtual topology)

Dan Mihai Dumitriu dan at midokura.com
Wed Feb 13 06:36:32 UTC 2013

On Tue, Feb 12, 2013 at 11:24 PM, Pino de Candia <gdecandia at midokura.com>wrote:

> On Tuesday, February 12, 2013 at 9:53 AM, Dan Mihai Dumitriu wrote:
> Make it so Number One.
> One? I think I'm employee number 4 or 5 - not counting the founders.


> Ryu, Tomoe, when you guys have some free time (after the Grizzly commit
> deadline) could you give an opinion about whether the bonding and vlan
> adapters might be good extensions to the Quantum?
> Also, here are the MidoNet API changes I'm proposing to implement this
> model. For Bonding Adapters:
>    - PortBonds will be a top level URI (obtained from
>    ApplicationDto.getPortBonds - probably at path /port_bonds).
>    - You create a new PortBond by doing a POST of a PortBondDto to
>    ApplicationDto.getPortBonds. PortBondDto just has a name and an owner.
>    Later we may add fields like 'forwardingMode' (for choosing between
>    round-robin, xor and others) and 'enableLACP' for changing from static to
>    dynamic link aggregation.
>    - PortBondDto.upPort is the URI of the PortBond's interior uplink
>    port. This port is created automatically (unlinked) when the PortBond is
>    created. Although the URI is similar to other virtual port URIs (e.g. in
>    /ports), attempting to DELETE it will result in an HTTP error.
>    - If you do a GET on PortBondDto.upPort, the result is an
>    InteriorPortDto - which should now be made an explicit and instantiate-able
>    part of the port hierarchy.
>    - PortBondDto.downPorts is the URI for the bond's down-ports. GET on
>    this URI will return a list of ExteriorPortDto objects - which should now
>    be made an explicit and instantiate-able part of the port hierarchy. POST
>    on the 'downPorts' URI will create a new ExteriorPortDto. ExteriorPorts may
>    be bound to host-interfaces.
> For VLAN Adapters:
>    - VlanDemuxes will be a top level URI (obtained from
>    ApplicationDto.getVlanDemuxes - probably at path /vlan_demuxes).
>    - You create a new VlanDemux by doing a POST of a VlanDemuxDto to
>    ApplicationDto.getVlanDemuxes.VlanDemuxDto just has a name and an owner.
>    - VlanDemuxDto.downPort (or call it muxPort?) is the URI of the single
>    Interior OR Exterior virtual port that carries mixed VLAN traffic into/from
>    the adapter. You can POST an ExteriorPortDto or an InteriorPortDto to this
>    URI. However, if you want to change the downPort (e.g. to change from
>    interior to exterior), you must first call DELETE on the current
>    VlanDemuxDto.downPort, and then POST a new one.
>    - VlanDemuxDto.upPorts (or should we call it demuxPorts?) is the URI
>    for the vlan adapter's up-ports which carry traffic without VLAN tags. GET
>    on this URI will return a list of VlanPortDto objects. POSTing a
>    VlanPortDto to this URI adds an up-port to the adapter. VlanPortDto is a
>    subclass of InteriorPortDto with a new vlan-tag field that specifies the
>    VLAN tag that is stripped/added to packets that egress/ingress this port.
>    If 'vlan-tag' is set to NULL, the POST will fail (with an appropriate HTTP
>    error) if this adapter already has a VlanPortDto with a null vlan-tag.
>    - A VlanPortDto may not be linked to the VlanDemuxDto.downPort of any
>    vlan adapter.
> As always, feedback is appreciated. Please let me know if I missed
> anything.
> thanks,
> Pino
> On Tue, Feb 12, 2013 at 4:54 PM, Pino de Candia <gdecandia at midokura.com>wrote:
>  I got a bounce-back from midonet-dev about my last message. Forwarding to
> dev at midokura.
> --
> Pino de Candia
> Software Engineer, Midokura.com
> On Monday, February 11, 2013 at 4:04 PM, Pino de Candia wrote:
>  Hi Folks,
> I chatted with Dan after he read my notes, and we think we have a strawman
> proposal for changes to the MidoNet virtual topology model that make sense
> for the L2 gateway. Dan, please fill in any details I missed, or correct
> anything I garbled.
> We'd like to introduce two new virtual devices - VLAN Adapter and Bonding
> Adapter. I've sketched them in SyncSpace and uploaded to Google Drive (and
> attached below):
> https://docs.google.com/a/midokura.jp/file/d/0B6AGTYMz0KDReUdEbzZoMmhHQXM/edit?usp=sharing
> A bonding adapter does for MidoNet what the Linux Bonding Driver does for
> Linux (see
> http://en.wikipedia.org/wiki/Link_Aggregation_Control_Protocol#Linux_bonding_driver
> ):
> It creates a single logical interface from two or more slaves. In the
> diagram, the vports on the right are exterior bond ports, and they need to
> be bound to physical interfaces on servers (not necessarily all on the same
> server). The diagram shows that the bonding adapter's left port is an
> interior port (only one of these is allowed). It can be 'linked' (similar
> to how we link interior router ports) to any one of: an interior router
> port, an interior bridge port, a VLAN adapter down-port (the port on the
> right-hand side of the vlan-adapter diagram). Packets ingressing the
> bonding adapter's single interior port will be forwarded out of one of the
> down-ports (exterior ports), depending on the mode of the bonding driver.
> We will implement a subset of Linux Bonding Driver's modes: only XOR to
> start. Packets ingressing any of the exterior ports will be forwarded out
> the up-port (interior port) respecting the usual link aggregation ordering
> (i.e. don't re-order packets ingressing on the same link, but you can use
> any order to deliver packets ingressing on different links).
> The bonding adapter initially will not implement LACP - we reserve that
> for future work. If one of the links fail, we can detect that if the
> CARRIER state changes, which means that traffic will be lost if the
> exterior port is not directly connected to a switch (i.e. if there's some
> intermediate repeater that may still be up despite the switch's failure).
> Outgoing traffic (left-to-right in the diagram) will be load-balanced over
> the exterior ports that are considered UP.
> -----
> The VLAN Adapter (see diagram) does for MidoNet what vconfig does for
> Linux (see http://linux.die.net/man/8/vconfig). Ports on the left (any
> number allowed) are interior virtual ports each one of which can be linked
> to any one router or bridge port. Each left-port (up-port) has a vlan-tag
> in its configuration. There is only one right-port (down-port) which may be
> an exterior port (bound to a host's network interface) or an interior port
> bound to an interior bonding adapter port (the bonding adapter's
> left/uplink port). Since the vlan adapter's right port is the equivalent of
> a trunked switch port (carries multiple vlan tags) we see no point to
> linking it to interior router or bridge ports (since those are
> vlan-agnostic).
> One of the vlan adapter's left ports is allowed to have a NULL vlan tag -
> meaning that it does not strip or add vlan tags to packets that go through
> it. Each of the other left ports must have a vlan tag that is unique within
> the vlan adapter.
> Traffic ingressing one of the left/uplink ports of the vlan adapter will
> have the port's configured vlan tag added to the ethernet header and will
> then be forwarded out the down/right port. Traffic ingressing the
> right/down port will be inverse demultiplexed according to the vlan tag:
> - if a packet has no vlan tag, it should be forwarded without modification
> out the left-port that has the NULL vlan tag, or dropped if no such port is
> configured.
> - if a packet has vlan tag X, the tag should be stripped and it should
> egress the left-port that has the X vlan tag configured, or dropped if no
> such port is configured.
> ----
> How does an admin connect a physical switch into MidoNet's virtual
> topology in order to connect a vbridge VB1 to vlan100 of the physical
> network?
> Without link resilience/redundancy:
>    - connect the physical switch port (trunked) to MidoNet server's eth5
>    and set eth5 to UP.
>    - create a Vlan Adapter (the right/down port should be automatically
>    created, perhaps it has the same UUID as the VLAN adapter). Set the
>    right/down port of the adapter to 'exterior' mode, then bind it to eth5 (in
>    the sense of vport-interface-binding).
>    - Create a left/up vport on the adapter and set its vlan tag to 100.
>    Call this port P1.
>    - Create an interior port on VB1 and link it to P1.
>    - Traffic starts flowing between VB1 and vlan100.
> Later if the admin discovers that some other virtual bridge VB2 needs to
> be connected to vlan200 of the same physical segment:
>    - Create another left/up vport on the adapter and set its vlan tag to
>    200. Call this port P2.
>    - Create an interior port on VB2 and link it to P2.
>    - Traffic starts flowing between VB2 and vlan200.
> What about avoiding loops? Each virtual bridge will only be allowed to
> link to one vlan adapter up-port.
> What about connecting virtual routers? Works exactly like for virtual
> bridges, but virtual routers are allowed to link to any number of vlan
> adapter up-ports.
> Later, the admin decides that resilience would be nice.
>    - Create a Bonding Adapter. The up/left single interior port is
>    automatically created.
>    - Unbind the VLAN Adapter's down-port from eth5 and set it to
>    'interior' mode. Link it to the bonding adapter's left/up port.
>    - Create a right/down exterior port on the bonding adapter. Bond this
>    to eth5 (in the sense of vport-interface-binding).
>    - Connect a free port on the same physical switch to an interface on
>    another MidoNet server, let's call that eth6, but remember it's on a
>    DIFFERENT server than eth5.
>    - Create another right/down exterior port on the bonding adapter. Bond
>    this new vport to eth6.
>    - Configure the two physical switch ports to be in the same port
>    aggregation group (static link aggregation).
> The admin has eliminated the MidoNet server and the cable as SPOFs (single
> points of failure). The physical switch is still a SPOF. To fix that we
> need to:
>    1. implement LACP in the Bonding Adapter, and the admin needs to use
>    MLAG on his physical switches. We don't do MLAG, but our NIC slaves already
>    exist on different physical servers even though they're perceived to be on
>    the same logical L2 device by the LACP peer. This works for virtual routers
>    and virtual bridges.
>    2. OR allow linking a virtual bridge to more than one VLAN adapter,
>    and implement STP in the virtual bridge. But this only provides resilience
>    for the connections between virtual bridges and physical L2s.
> thanks,
> Pino
> On Monday, February 11, 2013 at 12:11 PM, Pino de Candia wrote:
>  Hi Folks,
> over the weekend I did some reading on link aggregation (see wikipedia and
> http://standards.ieee.org/getieee802/download/802.1AX-2008.pdf) and this
> morning I had a chat with Abel so I wanted to put some thoughts in writing.
> Use case: connect MidoNet virtual bridges and routers to physical L2
> segments. Let's focus on just connecting to L2 segments in the cloud's data
> center as opposed to the tenant's network, because today we don't have a
> VPN solution. Let's also leave aside the discussion of connecting physical
> L2 segments to MidoNet virtual routers, this presents similar issues, with
> the exception of bridging loops/STP.
> Note that today a cloud administrator can already connect a physical
> segment to the virtual topology - but it requires a lot of thinking and
> manual configuration. Before jumping into feature specification in MidoNet,
> let's talk about what a cloud admin would do as things stand today:
>    - Because of some higher level requirements, the cloud admin decides
>    he wants to connect a MidoNet virtual bridge VB1 to a physical segment that
>    e.g. has some legacy databases - let's call this physical segment L2-DB,
>    and the databases are on vlan 100 (assume switches in L2-DB are all
>    symmetric, all carrying the same set of VLANs).
>    - Find a physical switch in L2-DB that has at least one free port and
>    is close enough to run a line into a physical server with a free port
>    that's already running MidoNet or where we can install MidoNet. Task done:
>    cabling and MidoNet agent ready (running, in a tunnel-zone, tunnel
>    interface ready). The server port is eth5 and the admin manually sets it to
>    UP.
>    - How should the switch port be configured? We decide it's not going
>    to be trunked, it's going to be dedicated to VLAN100 which is the vlan the
>    databases are in. Remember, packets arriving at the server already have
>    their vlan tag stripped.
>    - Go to MidoNet's GUI and add a vport on VB1. Bind this new vport to
>    eth5 on the server.
>    - A few seconds later the MN agent on the server learns about the
>    binding and does the setup to hook eth5 into the virtual topology. Packets
>    start flowing between VMs on VB1 and the databases. Woot!
> So far, so good. Now a few things can happen:
>    1. The admin wakes up in the middle of the night thinking "Oh, boy,
>    what about some redundancy/resilience of the connection between VB1 and
>    L2-DB?
>    2. The admin is asked to connect some other MidoNet virtual bridge,
>    VB2, to the same vlan, vlan100 in L2-DB.
>    3. The admin is asked to connect some other MidoNet virtual bridge,
>    VB2, to a different vlan, vlan200, in L2-DB.
>    4. The admin ia asked to connect VB1 to another VLAN on the same
>    physical segment.
>    5. The admin is asked to connect VB1 to a VLAN on a different physical
>    segment.
> Scenarios 4 and 5 are not realistic today because MN virtual bridges don't
> handle VLAN tags. You can send VLAN-tagged packets into the virtual bridge
> (we used to have code to drop these packets, but I don't remember if that
> made it into Caddo), but MAC-learning is not done per VLAN-tag.
> Alternatively, you can send VLAN-stripped packets from different VLANs into
> the virtual bridge, and hope that the packets don't interfere (e.g. the two
> vlans aren't using the same L3 address range). Basically, if you want a VM
> to be on multiple vlans, today you have to give the VM multiple vnics on
> different vbridges. I don't know whether this makes a case for VLAN support
> in our virtual bridge.
> Scenario 2 doesn't make sense in MidoNet today. We don't allow connecting
> two MidoNet virtual bridges, the reasoning is that you can make your
> vbridges as large as you want (as many ports as you want) so no need to
> connect vbridges. Scenario 2 would essentially connect two MN virtual
> bridges via vlan100 (and risk loops). So the admin's reply to scenario 2
> is: "any device or VM in VB2 that needs to be in VLAN100 should be given an
> interface connected to a new vport in VB1. VB1 is already in vlan100".
> Scenario 3: that's reasonable. How to do it? Should we repeat the process
> we followed for vlan100 for vlan200: choosing a server and physical switch
> to connect, do the cabling, configuring the switch port and server port?
> That's annoying. When doing the setup for vlan100, I should have put the
> physical switch port in trunk mode, then I wouldn't need any new cabling
> today, and I wouldn't need to go down to the data center. Ok, this time
> I'll do it right:
>    - go to the data-center, put the physical port in trunk mode and go
>    back to the office.
>    - Log into the server, put eth5 in trunk mode (I don't know whether
>    this happens automatically or not) and then make a virtual interface off of
>    eth5 for vlan100 named eth5.100 (eth5.100 strips/adds the vlan tag on
>    packets ingressing/egressing).
>    - Destroy the current vport binding on eth5 and replace it by binding
>    the same vport to eth5.100.
>    - Now create sub-interface eth5.200, create a new vport on VB2 and
>    bind the vport to eth5.200.
>    - A few minutes later, packets are flowing between VB1 and the
>    databases, and between VB2 and whatever's in vlan200. Woot! What's more,
>    the admin is pleased that he can easily bridge any other vlan carried by
>    L2-DB into the virtual topology.
> For setting up vlan tagging/stripping in Linux see any of:
>    - https://wiki.archlinux.org/index.php/VLAN
>    - http://linux.die.net/man/8/vconfig
>    - http://unixfoo.blogspot.com.es/2007/12/linux-vlan-configuration.html
> What about Scenario 1? We're becoming more and more reliant on that single
> link. We have 3 SPOFs: the physical switch, the cable, and the physical
> server.
> A. First, let's eliminate the cable SPOF. Today, the admin can manually
> set up link aggregation between the physical switch and the server. We'll
> do our best: configure the switch for LACP, verify that the appropriate
> kmod is loaded on the physical server (running Linux), configure LACP on
> the server, run another cable between the physical switch and the server.
> eth5 on the server now has to be a logical interface that is sitting on top
> of eth6 and eth7 which are the aggregated server ports connected to the
> switch.
> B. What about eliminating the server SPOF by running a cable from the
> switch to another physical server? We can't put another vport on VB1
> connected to vlan100. Why? We would be creating a L2 loop. Our virtual
> bridge doesn't implement STP.
> C. What about eliminating the physical switch SPOF? Assuming a few of the
> switches in L2-DB are by the same vendor, they might support multi-chassis
> link-aggregation (MLAG). In that case, run a line from another physical
> switch to the single physical server. On the switch side, configure MLAG,
> on the server-side, configure normal LACP (I'm guessing this should work,
> the server side shouldn't care about the other side, because the switches
> will advertise themselves as a single system - note that this is really
> only a guess). Since this eliminates the cable SPOF, don't bother running
> two lines from the first switch to the server (as we did in A).
> -----------
> Epilogue: the admin is really worried about the server SPOF AND bridging
> loops. MidoNet delivers features to deal with them: link aggregation and
> STP.
> Bridging loops: we should support STP. What I don't know yet, is:
> - Exactly what flavor of STP?
> - Is there a suitable flavor of per-VLAN STP that can work with our
> VLAN-agnostic bridges? Does per-vlan STP automatically imply the virtual
> bridges need to be vlan-aware?
> - What about Shortest path bridging (802.1aq)? This is very new - I assume
> the cloud will have devices that don't support this - so I would punt.
> For now, I'm going to assume we can keep our virtual bridges
> VLAN-agnostic.
> What about link aggregation? I'm going to assume that these aggregated
> links are trunked - they're going to carry multiple VLANs. Also, I'm going
> to assume that there's no Linux implementation of MLAG (to aggregate links
> across different Linux servers) - which would allow us to do most of the
> work in Linux vs. writing MidoNet code. Therefore, we're going to have to
> implement LACP inside MidoNet. That will be equivalent to a proprietary
> MLAG because our virtual bridges are distributed across different physical
> servers, and that will eliminate the MidoNet server SPOF in the admin's
> original configuration.
> What does doing LACP in MidoNet imply? Well, LACP has to run at a layer
> below VLANs and STP (this is just my intuition, needs verification). Since
> we're implementing LACP in MidoNet to work across servers, we already knew
> that we couldn't use the Linux bonding driver to handle LACP. But now I
> also suspect that we won't be able to strip the vlan tags in Linux -
> because I think that needs to happen after the LACP negotiation (above it
> in the protocol stack). The same goes for the BPDUs (Bridge Protocol Data
> Units) used in STP. This is all very vague, and we have to figure out the
> exact interaction with LACP, but my intuition is that LACP has to happen
> lower in the protocol stack, so doing it in MidoNet means we also have to
> push VLAN handling into MidoNet.
> Who should handle VLANs and LACP in the MidoNet virtual device model? We
> have a few choices:
>    - Make the virtual bridges aware of VLAN and LACP.
>    - Create a new concept - a meta-bridge. A meta-bridge (needs better
>    name) is the equivalent of physical switch. It may contain multiple virtual
>    bridges in the same way that a physical bridge contains (or supports)
>    multiple vlans.
> Why do we need another layer of bridge software? Because multiple virtual
> bridges may be using the same aggregated links - so no single virtual
> bridge can be responsible for the LACP negotiation on behalf of the others
> (which would be a weird model). And we really do want to share/re-use those
> aggregated links between multiple vbridges/vlans.
> -----------
> Finally, and for completeness, note that we haven't
> ------------
> As always, feedback is appreciated. I tried not to go too deep into
> implementation - just enough to understand how much work certain features
> imply. Let's try to keep the focus of this thread on defining the feature,
> not the implementation.
> thanks,
> Pino
> Attachments:
>  - Adapters.png
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.midonet.org/pipermail/midonet-dev/attachments/20130213/afb6acfb/attachment-0001.html>

More information about the MidoNet-dev mailing list