[MidoNet-dev] Feature Proposal: Tunnel Health checks
abel at midokura.com
Tue Feb 19 15:48:30 UTC 2013
what would be the maximum time to detect a tunnel is down?
On Tue, Feb 19, 2013 at 4:31 PM, Guillermo Ontañón <guillermo at midokura.jp>wrote:
> On Tue, Feb 19, 2013 at 4:21 PM, Navarro, Galo <galo at midokura.jp> wrote:
>> Hi Guillermo, thanks for the quick feedback! Some comments below
>> >> - TunnelPorts become active on each side of the tunnel, the TunnelDoc
>> >> becomes aware of local ports and starts taking care of them.
>> >> - Regularly, for each cared-for tunnel the TunnelDoc:
>> >> - Sends a packet to the other peer
>> >> - Logs variation on RX value of the PortStats on the tunnel's local
>> >> port.
>> >> - If variation = 0, increment a "no-increment" counter
>> >> - If "no-increment" counter > threshold, trigger alert message for
>> >> lack of connectivity on the REVERSE direction of the tunnel
>> >> if the TunnelDoc at A spots no RX, the alert refers to loss of
>> >> connectivity from B to A).
>> >> - Implement whatever corrective measures upon receiving the alert
>> >> (typically, the DatapathController could recreate the tunnel)
>> > This is not a lot of extra traffic, but the number of tunnels does grow
>> > quadratically with the number of MM agents. I propose a slight
>> variation on
>> > the above to avoid sending traffic on non-idle tunnels, along the lines
>> > what is done by IPsec's dead peer detection:
>> > http://www.ietf.org/rfc/rfc3706.txt
>> > Basically, from the POV of view of one of the nodes, it looks like this:
>> > * Monitor idleness (by looking at RX as you outline above) and do
>> > and consider the tunnel healthy while idleness doesn't go above a
>> > threshold.
>> > * When the tunnel becomes idle, send an "are-you-there" packet to the
>> > Peer (we could just use the tunnel-key for this).
>> > * When an "are-you-there" packet is received, reply to it with an
>> This is definitely better. I messed up copypastes badly but the idea
>> was basically what you explain, the "send packet to another peer"
>> would be conditioned to several cycles without increment on the
>> "no-data-increment" counter.
> But I think that for this to work you need the 'ack' reply, would it be
> included? Otherwise a host may be receiving traffic (non-idle) but not
> sending, and would never send any 'are-you-there' packets to the other side
> because its RX is increasing.
>> >> The counters described above would be stored in a common data structure
>> >> in Cassandra so that API clients could easily retrieve the list of
>> >> failing tunnels as described above.
>> > I wonder why Cassandra and not Zookeeper?
>> Just because as far as I could see metrics are stored in Cassandra,
>> but if it makes more sense to have them in ZK that's perfectly ok for
> -- Guillermo Ontañón
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the MidoNet-dev