[MidoNet-dev] Feature Proposal: Tunnel Health checks

Guillermo Ontañón guillermo at midokura.jp
Tue Feb 19 15:31:34 UTC 2013


On Tue, Feb 19, 2013 at 4:21 PM, Navarro, Galo <galo at midokura.jp> wrote:

> Hi Guillermo, thanks for the quick feedback! Some comments below
>
> >> - TunnelPorts become active on each side of the tunnel, the TunnelDoc
> >>   becomes aware of local ports and starts taking care of them.
> >> - Regularly, for each cared-for tunnel the TunnelDoc:
> >>     - Sends a packet to the other peer
> >>     - Logs variation on RX value of the PortStats on the tunnel's local
> >> port.
> >>     - If variation = 0, increment a "no-increment" counter
> >>     - If "no-increment" counter > threshold, trigger alert message for
> >>       lack of connectivity on the REVERSE direction of the tunnel (e.g.:
> >>       if the TunnelDoc at A spots no RX, the alert refers to loss of
> >>       connectivity from B to A).
> >>     - Implement whatever corrective measures upon receiving the alert
> >>       (typically, the DatapathController could recreate the tunnel)
>
> > This is not a lot of extra traffic, but the number of tunnels does grow
> > quadratically with the number of MM agents. I propose a slight variation
> on
> > the above to avoid sending traffic on non-idle tunnels, along the lines
> of
> > what is done by IPsec's dead peer detection:
> >
> > http://www.ietf.org/rfc/rfc3706.txt
> >
> > Basically, from the POV of view of one of the nodes, it looks like this:
> >
> >    * Monitor idleness (by looking at RX as you outline above) and do
> nothing
> > and consider the tunnel healthy while idleness doesn't go above a certain
> > threshold.
> >    * When the tunnel becomes idle, send an "are-you-there" packet to the
> > Peer (we could just use the tunnel-key for this).
> >    * When an "are-you-there" packet is received, reply to it with an Ack.
>
> This is definitely better. I messed up copypastes badly but the idea
> was basically what you explain, the "send packet to another peer"
> would be conditioned to several cycles without increment on the
> "no-data-increment" counter.
>


But I think that for this to work you need the 'ack' reply, would it be
included? Otherwise a host may be receiving traffic (non-idle) but not
sending, and would never send any 'are-you-there' packets to the other side
because its RX is increasing.


>
> >> The counters described above would be stored in a common data structure
> >> in Cassandra so that API clients could easily retrieve the list of
> >> failing tunnels as described above.
> >
> > I wonder why Cassandra and not Zookeeper?
>
> Just because as far as I could see metrics are stored in Cassandra,
> but if it makes more sense to have them in ZK that's perfectly ok for
> me.
>
> Thanks!
>
> /g
>



-- 
-- Guillermo Ontañón
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.midonet.org/pipermail/midonet-dev/attachments/20130219/56d59f71/attachment.html>


More information about the MidoNet-dev mailing list