[MidoNet-dev] MidoNet: L2 gateway (connecting physical L2 into the virtual topology)

Pino de Candia gdecandia at midokura.com
Tue Feb 12 07:54:58 UTC 2013


I got a bounce-back from midonet-dev about my last message. Forwarding to dev at midokura. 

-- 
Pino de Candia
Software Engineer, Midokura.com


On Monday, February 11, 2013 at 4:04 PM, Pino de Candia wrote:

> Hi Folks,
> 
> I chatted with Dan after he read my notes, and we think we have a strawman proposal for changes to the MidoNet virtual topology model that make sense for the L2 gateway. Dan, please fill in any details I missed, or correct anything I garbled.
> 
> We'd like to introduce two new virtual devices - VLAN Adapter and Bonding Adapter. I've sketched them in SyncSpace and uploaded to Google Drive (and attached below):
> https://docs.google.com/a/midokura.jp/file/d/0B6AGTYMz0KDReUdEbzZoMmhHQXM/edit?usp=sharing
> 
> A bonding adapter does for MidoNet what the Linux Bonding Driver does for Linux (see http://en.wikipedia.org/wiki/Link_Aggregation_Control_Protocol#Linux_bonding_driver):
> It creates a single logical interface from two or more slaves. In the diagram, the vports on the right are exterior bond ports, and they need to be bound to physical interfaces on servers (not necessarily all on the same server). The diagram shows that the bonding adapter's left port is an interior port (only one of these is allowed). It can be 'linked' (similar to how we link interior router ports) to any one of: an interior router port, an interior bridge port, a VLAN adapter down-port (the port on the right-hand side of the vlan-adapter diagram). Packets ingressing the bonding adapter's single interior port will be forwarded out of one of the down-ports (exterior ports), depending on the mode of the bonding driver. We will implement a subset of Linux Bonding Driver's modes: only XOR to start. Packets ingressing any of the exterior ports will be forwarded out the up-port (interior port) respecting the usual link aggregation ordering (i.e. don't re-order packets ingressing on the 
same link, but you can use any order to deliver packets ingressing on different links).
> 
> The bonding adapter initially will not implement LACP - we reserve that for future work. If one of the links fail, we can detect that if the CARRIER state changes, which means that traffic will be lost if the exterior port is not directly connected to a switch (i.e. if there's some intermediate repeater that may still be up despite the switch's failure). Outgoing traffic (left-to-right in the diagram) will be load-balanced over the exterior ports that are considered UP.
> 
> -----
> The VLAN Adapter (see diagram) does for MidoNet what vconfig does for Linux (see http://linux.die.net/man/8/vconfig). Ports on the left (any number allowed) are interior virtual ports each one of which can be linked to any one router or bridge port. Each left-port (up-port) has a vlan-tag in its configuration. There is only one right-port (down-port) which may be an exterior port (bound to a host's network interface) or an interior port bound to an interior bonding adapter port (the bonding adapter's left/uplink port). Since the vlan adapter's right port is the equivalent of a trunked switch port (carries multiple vlan tags) we see no point to linking it to interior router or bridge ports (since those are vlan-agnostic).
> 
> One of the vlan adapter's left ports is allowed to have a NULL vlan tag - meaning that it does not strip or add vlan tags to packets that go through it. Each of the other left ports must have a vlan tag that is unique within the vlan adapter.
> 
> Traffic ingressing one of the left/uplink ports of the vlan adapter will have the port's configured vlan tag added to the ethernet header and will then be forwarded out the down/right port. Traffic ingressing the right/down port will be inverse demultiplexed according to the vlan tag:
> - if a packet has no vlan tag, it should be forwarded without modification out the left-port that has the NULL vlan tag, or dropped if no such port is configured.
> - if a packet has vlan tag X, the tag should be stripped and it should egress the left-port that has the X vlan tag configured, or dropped if no such port is configured.
> 
> ----
> 
> How does an admin connect a physical switch into MidoNet's virtual topology in order to connect a vbridge VB1 to vlan100 of the physical network?
> 
> Without link resilience/redundancy:
> connect the physical switch port (trunked) to MidoNet server's eth5 and set eth5 to UP.
> create a Vlan Adapter (the right/down port should be automatically created, perhaps it has the same UUID as the VLAN adapter). Set the right/down port of the adapter to 'exterior' mode, then bind it to eth5 (in the sense of vport-interface-binding).
> Create a left/up vport on the adapter and set its vlan tag to 100. Call this port P1.
> Create an interior port on VB1 and link it to P1.
> Traffic starts flowing between VB1 and vlan100.
> 
> Later if the admin discovers that some other virtual bridge VB2 needs to be connected to vlan200 of the same physical segment:
> 
> Create another left/up vport on the adapter and set its vlan tag to 200. Call this port P2.
> Create an interior port on VB2 and link it to P2.
> Traffic starts flowing between VB2 and vlan200.
> 
> What about avoiding loops? Each virtual bridge will only be allowed to link to one vlan adapter up-port.
> 
> 
> What about connecting virtual routers? Works exactly like for virtual bridges, but virtual routers are allowed to link to any number of vlan adapter up-ports.
> 
> Later, the admin decides that resilience would be nice.
> Create a Bonding Adapter. The up/left single interior port is automatically created.
> Unbind the VLAN Adapter's down-port from eth5 and set it to 'interior' mode. Link it to the bonding adapter's left/up port.
> Create a right/down exterior port on the bonding adapter. Bond this to eth5 (in the sense of vport-interface-binding).
> Connect a free port on the same physical switch to an interface on another MidoNet server, let's call that eth6, but remember it's on a DIFFERENT server than eth5.
> Create another right/down exterior port on the bonding adapter. Bond this new vport to eth6.
> Configure the two physical switch ports to be in the same port aggregation group (static link aggregation).
> 
> The admin has eliminated the MidoNet server and the cable as SPOFs (single points of failure). The physical switch is still a SPOF. To fix that we need to:
> implement LACP in the Bonding Adapter, and the admin needs to use MLAG on his physical switches. We don't do MLAG, but our NIC slaves already exist on different physical servers even though they're perceived to be on the same logical L2 device by the LACP peer. This works for virtual routers and virtual bridges.
> OR allow linking a virtual bridge to more than one VLAN adapter, and implement STP in the virtual bridge. But this only provides resilience for the connections between virtual bridges and physical L2s.
> 
> thanks,
> 
> 
> Pino
> 
> 
> On Monday, February 11, 2013 at 12:11 PM, Pino de Candia wrote:
> 
> > Hi Folks, 
> > 
> > over the weekend I did some reading on link aggregation (see wikipedia and http://standards.ieee.org/getieee802/download/802.1AX-2008.pdf) and this morning I had a chat with Abel so I wanted to put some thoughts in writing.
> > 
> > Use case: connect MidoNet virtual bridges and routers to physical L2 segments. Let's focus on just connecting to L2 segments in the cloud's data center as opposed to the tenant's network, because today we don't have a VPN solution. Let's also leave aside the discussion of connecting physical L2 segments to MidoNet virtual routers, this presents similar issues, with the exception of bridging loops/STP.
> > 
> > Note that today a cloud administrator can already connect a physical segment to the virtual topology - but it requires a lot of thinking and manual configuration. Before jumping into feature specification in MidoNet, let's talk about what a cloud admin would do as things stand today:
> > Because of some higher level requirements, the cloud admin decides he wants to connect a MidoNet virtual bridge VB1 to a physical segment that e.g. has some legacy databases - let's call this physical segment L2-DB, and the databases are on vlan 100 (assume switches in L2-DB are all symmetric, all carrying the same set of VLANs).
> > Find a physical switch in L2-DB that has at least one free port and is close enough to run a line into a physical server with a free port that's already running MidoNet or where we can install MidoNet. Task done: cabling and MidoNet agent ready (running, in a tunnel-zone, tunnel interface ready). The server port is eth5 and the admin manually sets it to UP.
> > How should the switch port be configured? We decide it's not going to be trunked, it's going to be dedicated to VLAN100 which is the vlan the databases are in. Remember, packets arriving at the server already have their vlan tag stripped.
> > Go to MidoNet's GUI and add a vport on VB1. Bind this new vport to eth5 on the server.
> > A few seconds later the MN agent on the server learns about the binding and does the setup to hook eth5 into the virtual topology. Packets start flowing between VMs on VB1 and the databases. Woot!
> > 
> > So far, so good. Now a few things can happen:
> > 
> > The admin wakes up in the middle of the night thinking "Oh, boy, what about some redundancy/resilience of the connection between VB1 and L2-DB?
> > The admin is asked to connect some other MidoNet virtual bridge, VB2, to the same vlan, vlan100 in L2-DB.
> > The admin is asked to connect some other MidoNet virtual bridge, VB2, to a different vlan, vlan200, in L2-DB.
> > The admin ia asked to connect VB1 to another VLAN on the same physical segment.
> > The admin is asked to connect VB1 to a VLAN on a different physical segment.
> > 
> > Scenarios 4 and 5 are not realistic today because MN virtual bridges don't handle VLAN tags. You can send VLAN-tagged packets into the virtual bridge (we used to have code to drop these packets, but I don't remember if that made it into Caddo), but MAC-learning is not done per VLAN-tag. Alternatively, you can send VLAN-stripped packets from different VLANs into the virtual bridge, and hope that the packets don't interfere (e.g. the two vlans aren't using the same L3 address range). Basically, if you want a VM to be on multiple vlans, today you have to give the VM multiple vnics on different vbridges. I don't know whether this makes a case for VLAN support in our virtual bridge.
> > 
> > 
> > Scenario 2 doesn't make sense in MidoNet today. We don't allow connecting two MidoNet virtual bridges, the reasoning is that you can make your vbridges as large as you want (as many ports as you want) so no need to connect vbridges. Scenario 2 would essentially connect two MN virtual bridges via vlan100 (and risk loops). So the admin's reply to scenario 2 is: "any device or VM in VB2 that needs to be in VLAN100 should be given an interface connected to a new vport in VB1. VB1 is already in vlan100".
> > 
> > Scenario 3: that's reasonable. How to do it? Should we repeat the process we followed for vlan100 for vlan200: choosing a server and physical switch to connect, do the cabling, configuring the switch port and server port? That's annoying. When doing the setup for vlan100, I should have put the physical switch port in trunk mode, then I wouldn't need any new cabling today, and I wouldn't need to go down to the data center. Ok, this time I'll do it right:
> > go to the data-center, put the physical port in trunk mode and go back to the office.
> > Log into the server, put eth5 in trunk mode (I don't know whether this happens automatically or not) and then make a virtual interface off of eth5 for vlan100 named eth5.100 (eth5.100 strips/adds the vlan tag on packets ingressing/egressing).
> > Destroy the current vport binding on eth5 and replace it by binding the same vport to eth5.100.
> > Now create sub-interface eth5.200, create a new vport on VB2 and bind the vport to eth5.200.
> > A few minutes later, packets are flowing between VB1 and the databases, and between VB2 and whatever's in vlan200. Woot! What's more, the admin is pleased that he can easily bridge any other vlan carried by L2-DB into the virtual topology.
> > 
> > For setting up vlan tagging/stripping in Linux see any of:
> > https://wiki.archlinux.org/index.php/VLAN
> > http://linux.die.net/man/8/vconfig
> > http://unixfoo.blogspot.com.es/2007/12/linux-vlan-configuration.html
> > 
> > 
> > 
> > What about Scenario 1? We're becoming more and more reliant on that single link. We have 3 SPOFs: the physical switch, the cable, and the physical server.
> > 
> > A. First, let's eliminate the cable SPOF. Today, the admin can manually set up link aggregation between the physical switch and the server. We'll do our best: configure the switch for LACP, verify that the appropriate kmod is loaded on the physical server (running Linux), configure LACP on the server, run another cable between the physical switch and the server. eth5 on the server now has to be a logical interface that is sitting on top of eth6 and eth7 which are the aggregated server ports connected to the switch.
> > 
> > B. What about eliminating the server SPOF by running a cable from the switch to another physical server? We can't put another vport on VB1 connected to vlan100. Why? We would be creating a L2 loop. Our virtual bridge doesn't implement STP.
> > 
> > C. What about eliminating the physical switch SPOF? Assuming a few of the switches in L2-DB are by the same vendor, they might support multi-chassis link-aggregation (MLAG). In that case, run a line from another physical switch to the single physical server. On the switch side, configure MLAG, on the server-side, configure normal LACP (I'm guessing this should work, the server side shouldn't care about the other side, because the switches will advertise themselves as a single system - note that this is really only a guess). Since this eliminates the cable SPOF, don't bother running two lines from the first switch to the server (as we did in A).
> > 
> > -----------
> > Epilogue: the admin is really worried about the server SPOF AND bridging loops. MidoNet delivers features to deal with them: link aggregation and STP.
> > 
> > Bridging loops: we should support STP. What I don't know yet, is:
> > - Exactly what flavor of STP?
> > - Is there a suitable flavor of per-VLAN STP that can work with our VLAN-agnostic bridges? Does per-vlan STP automatically imply the virtual bridges need to be vlan-aware?
> > 
> > - What about Shortest path bridging (802.1aq)? This is very new - I assume the cloud will have devices that don't support this - so I would punt.
> > 
> > For now, I'm going to assume we can keep our virtual bridges VLAN-agnostic. 
> > 
> > What about link aggregation? I'm going to assume that these aggregated links are trunked - they're going to carry multiple VLANs. Also, I'm going to assume that there's no Linux implementation of MLAG (to aggregate links across different Linux servers) - which would allow us to do most of the work in Linux vs. writing MidoNet code. Therefore, we're going to have to implement LACP inside MidoNet. That will be equivalent to a proprietary MLAG because our virtual bridges are distributed across different physical servers, and that will eliminate the MidoNet server SPOF in the admin's original configuration.
> > 
> > What does doing LACP in MidoNet imply? Well, LACP has to run at a layer below VLANs and STP (this is just my intuition, needs verification). Since we're implementing LACP in MidoNet to work across servers, we already knew that we couldn't use the Linux bonding driver to handle LACP. But now I also suspect that we won't be able to strip the vlan tags in Linux - because I think that needs to happen after the LACP negotiation (above it in the protocol stack). The same goes for the BPDUs (Bridge Protocol Data Units) used in STP. This is all very vague, and we have to figure out the exact interaction with LACP, but my intuition is that LACP has to happen lower in the protocol stack, so doing it in MidoNet means we also have to push VLAN handling into MidoNet.
> > 
> > Who should handle VLANs and LACP in the MidoNet virtual device model? We have a few choices:
> > Make the virtual bridges aware of VLAN and LACP.
> > Create a new concept - a meta-bridge. A meta-bridge (needs better name) is the equivalent of physical switch. It may contain multiple virtual bridges in the same way that a physical bridge contains (or supports) multiple vlans.
> > 
> > Why do we need another layer of bridge software? Because multiple virtual bridges may be using the same aggregated links - so no single virtual bridge can be responsible for the LACP negotiation on behalf of the others (which would be a weird model). And we really do want to share/re-use those aggregated links between multiple vbridges/vlans.
> > 
> > 
> > -----------
> > Finally, and for completeness, note that we haven't
> > 
> > 
> > ------------
> > As always, feedback is appreciated. I tried not to go too deep into implementation - just enough to understand how much work certain features imply. Let's try to keep the focus of this thread on defining the feature, not the implementation.
> > 
> > thanks,
> > Pino
> > 
> 
> 
> Attachments: 
> - Adapters.png
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.midonet.org/pipermail/midonet-dev/attachments/20130212/4540f1e3/attachment-0001.html>


More information about the MidoNet-dev mailing list