[MidoNet-dev] population of the Mac-learning table - api proposal

Dan Mihai Dumitriu dan at midokura.com
Wed Feb 20 01:21:45 UTC 2013

On Tuesday, February 19, 2013 at 9:25 PM, Rossella Sblendido wrote:

 Hi Joe,

thanks for chiming in, we do need to know the integration perspective!
My comments inline...

On 2/19/13 10:34 AM, Mills, Joseph wrote:

Hi Rosella, Pino,

 I know I am late to the game (sorry!), but I just wanted to chime in from
an integration perspective. Some comments:

In the cloudstack integration, it is obvious where the changes would go.
Just POST when we prepare() the NIC, and DELETE when we release() the NIC.
Though I can't think of a scenario where a PUT would be used. If I
understand correctly, this would be a case of changing the port ID
associated with a given MAC address. Maybe this is not needed?

I'm not so familiar with CloudStack anyway a good case for a PUT would be
VM migration....the MAC stays the same but the port ID changes.

Actually, when migrating the port does NOT change. Only the mapping of
vport to host changes.

 Checking errors:
If we are trying to POST something where the MAC already exists, this means
we are allocating ports to MACs incorrectly. Is there any way to verify
that we aren't double-assigning besides looping through all the mac entries?

We can return an error, we can verify internally if the MAC has already
been allocated and return an error.

 Idempotent DELETEs:
If we try a DELETE on a mac/port that has already been DELETED, will that
return an error, or just return success?

We can discuss this but I think DELETE when the entry is not there should
return success. We should return an error when we tried to delete an entry
and we didn't succeed.

 No flooding on bridges:
 Regarding Pino's suggestion to completely eliminate flooding on the
bridge, adding extra rules to stop BCAST and MCAST may not be necessary
(though I may just not know enough to think of a case where it is needed).
In the integration code, we would create and delete mac/port entries
strictly when we create and destroy NICs. So there should not be any VM
hooked up to the bridge that does not have a permanent entry installed. Is
there a case where the bridge will know all the mac/port mappings of the
VMs attached but still send out BCAST/MCAST? Or was this in expectation of
someone hooking up a VM outside of the integration code?

 If the MacLearningTable of the bridge is updated through the api, we don't
need broadcast for arp anymore. That's why we can filter BCAST MCAST coming
from the VMs. This is good because it's more efficient and more secure. We
can prevent malicious packets (e.g. arp poisoning) from entering the


On Wed, Feb 13, 2013 at 7:51 PM, Pino de Candia <gdecandia at midokura.com>wrote:

 Ciao Rossella,

 although there hasn't been a lot of discussion, let's assume these REST
API changes are good enough for now. Please write up the current proposal
in the wiki so that we don't have to search e-mail and parse an entire
thread to understand what the current thinking is.

 Also, let's kick off the backend design discussion.


Pino de Candia
Software Engineer, Midokura.com

   On Wednesday, February 6, 2013 at 1:51 PM, Pino de Candia wrote:

  I have a question going back to requirements. Is our goal to provide the
API client with a mechanism that:
A) allows them to reduce flooding on the bridge?
B) allows them to completely eliminate flooding on the bridge?

 I think we should be aiming for B - I'd like to hear other opinions. My
feeling is that saying this to customers is a selling point: "you can
configure your bridges so that they'll never flood any traffic".

 After writing up this e-mail, I think B can be achieved without any
additional work. But please check my reasoning because if something's
missing I'd still like to achieve B.

 Here's how we can enable B:
1) allow customers to pre-seed the mac-table with permanent entries - they
don't expire and can't be displaced by learned entries.
2) normal mac-learning is still enabled (if the API client doesn't
pre-seed, hosts on the bridge can still receive traffic as long as they
occasionally send a packet so that the learned mac-port entry is not
3) client uses a post-bridging chain rule on the Bridge to drop anything
whose L2 destination is BCAST.
4) client uses a post-bridging chain rule on the Bridge to drop anything
whose L2 destination is a MCAST or is an unlearned MAC (the packet would be
flooded in both cases).

 Only #1 requires an API change. Everything else is already supported.

 #4 can be accomplished, but we haven't tried it, by using a rule with:
- a condition that matches outPortIds={BridgeId} - note that the "flood"
action is encoded by setting the outPortId to the BridgeId.
- a DROP action.

 #3 and #4 use post-bridging rules so that the Proxy ARP feature gets a
chance before the packet is discarded.


 On Wednesday, February 6, 2013 at 1:10 PM, Rossella Sblendido wrote:

  Hi Pino,

thanks a lot for your feedback. My comments inline.


On 2/6/13 11:55 AM, Pino de Candia wrote:

 Hi Rossella,

 thanks for starting this discussion so promptly.

On Wednesday, February 6, 2013 at 1:34 AM, Rossella Sblendido wrote:

  Hi devs,

I'd like to start the discussion regarding the population of the
Mac-learning table for a bridge using the API.
You can have a look at the requirements
I'm copying them at the bottom to facilitate the discussion.

Here is my proposal:

*API changes*:

   1. We need to have a new DTO to represent the MAC-vPortId association,
   something like:

        public class DtoMacPort {
                private UUID portId;
                private MAC mac;
                // getters
                // setters


   1. A GET against bridges/bridge_id/macPort will return a
   Collection<DtoMacPort> representing all the entries of the Mac-learning
   table of the bridge whose id is bridge_id.

  Is that actual run-time Mac-table contents at that moment in time? Or is
it just the list of pre-seeded DtoMacPort?

The actual run-time Mac-table content.

   1. A PUT of a DtoMacPort object against bridges/bridge_id/macPort will
   add a new entry to the Mac-learning table, if there's already an entry with
   the same Mac, this will be overwritten.

  PUT does not return a DTO to the client. Therefore there's no way of
passing the DTO's URI back to the client (in the HATEOAS style) - the
client could use a template to generate the URI, but I don't like that
being the only option.

 Instead, can we say that the first time the client wants to set the vport
for a specific MAC, they do a POST. This will return a DTO that includes
the URI of the object, which will be needed for update and deletion. If you
do a POST and there's already a mac-port entry for that mac, you get an

Pino, my idea was to use bridges/bridge_id/macPort as URI. The main DTO is
a Collection<DtoMacPort> and we do a PUT or DELETE to add or remove and
item from the collection.

 Subsequently, if the client wants to change a mac's vport, they set the
new vport UUID in the dto and PUT it to the server.

 If a client does not know if this is the first time the mac's vport has
been set (doesn't know whether to use PUT or POST), then they have two
- GET as in #1 (lists all the DtoMacPort)
- Do a GET on a template /bridges/bridge_id/macPort/_mac_. We're using
templates in other places for similar reasons. The templates are provided
inside DTOs (so that we can evolve them). Where should this template be

   1. A DELETE of a DtoMacPort object against bridges/bridge_id/macPort
   will remove the entry whose MAC=DtoMacPort.getMac() from the Mac-learning
   table of the bridge whose id is bridge_id

  The DELETE will be against the DtoMacPort.getURI() right? And the URI is
probably going to be /bridges/bridge_id/macPort/mac.

 However, this has the following consequence, any DTO (even outdated) can
be used to DELETE the mac-port entry for that mac.

We can add a check for that.

  Do we need a method to check if a specific MAC is already in the map?
 *ZK changes*:

We store the MAC/vPortId association in ZK under:


We use a ReplicatedMap, whose key is the Mac and value is the UUID of the
port. ReplicatedMap as the name says is a map that is stored both in memory
and in ZK. We load it when we initialize it and we keep it in sync with ZK.
In our code any entry (Mac1, inPort1) in the Mac-learning table gets
deleted when the flows count whose sourceMac = Mac1  and ingressPort =
inPort1 reaches 0. My interpretation of permanent entries is that we don't
modify them taking into account the flows count, hopefully I'm not wrong.
To make the entry permanent I'd add a boolean in the ReplicatedMap,
isPermanent. If isPermanent=true then the entry cannot be removed.

I would hold off on the discussion of the API back-end implementation so we
can focus on the API design. The API design is of interest to a wider
audience than the back-end implementation, so I would have the latter
discussion in a separate thread (and only after the API design has been


 Caveat: I think it's reasonable that during the API design discussion
someone says "That API won't work because there's no way the back-end can
support it." But it becomes a question of feasibility, not converging on a
specific back-end design.



*Feature description*
The API allows populating the virtual bridge's mac-learning table with
permanent entries. This is used to avoid flooding the bridge for use-cases
where the vport/mac pairs are known in advance.

*Core team responsibilities*
- Propose API changes to expose the MAC-learning table for viewing and
- Design ZK changes (should be backward compatible)
- Implement API server changes and Midolman changes.

*Integration team responsibilities*
- Modify OpenStack integration to pass OS vport/mac pair information as
Mac-table entries at instance launch.

*GUI/management team*
- Define existing page that need to change (none?)
- Define new pages - one to display the mac-table contents?
- Implement the new pages.

*QA team responsibilities*
- Normal vetting of a software version. The changes are not user-visible
because this is an optimization.

- Agree on REST API changes by Feb. 11
- Core team delivers implementation prototype by Feb. 18
- Core team completes testing by Feb. 25.
- Integration team completes changes to Grizzly/Diyari by Feb. 25.
- QA team approves by March 1.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.midonet.org/pipermail/midonet-dev/attachments/20130220/c2899454/attachment-0001.html>

More information about the MidoNet-dev mailing list