[MidoNet-dev] Feature Proposal: Tunnel Health checks

Pino de Candia gdecandia at midokura.com
Thu Feb 28 01:47:52 UTC 2013


On Thursday, February 28, 2013 at 1:51 AM, Dan Mihai Dumitriu wrote:
> I'd love to have RTT on all the tunnels - I really think that would be awesome.  Is it not feasible for some reason? :)
>  

I think it's feasible - I just wanted to make sure this requirement didn't get dropped.  
>  
>  
> On Thu, Feb 28, 2013 at 8:02 AM, de Candia, Giuseppe <gdecandia at midokura.com (mailto:gdecandia at midokura.com)> wrote:
> > Just to make sure that we didn't miss something - was Dan asking for tunnel RTT? Providing that would require constantly sending hello messages between every pair of hosts even when they're successfully forwarding packets to each other and tracking the delay of the replies.  
> > Dan, can you clarify?
> > -- Pino
> > On Feb 21, 2013 11:23 AM, "Navarro, Galo" <galo at midokura.com (mailto:galo at midokura.com)> wrote:
> > > Hi Dan, thanks for the feedback.
> > >  
> > > In practice performance is't not really going to be that bad if we
> > > consider that nodes will only try to send "are-you-there" messages and
> > > report "mute peers" if the peer is alive.
> > >  
> > > The worst case O(n^2) scenario would require that all nodes are alive
> > > yet isolated from every other node, but somehow having connectivity to
> > > Zookeeper (!) so all seem alive to each other but mute and they keep
> > > pinging each other with O(n^2) messages.
> > >  
> > > In practice this is very unlikely. The "real" worst case should be
> > > bound by O(M * N), where M is the number of *live* cut-off peers, N
> > > the total number of peers. All peers will just send messages to the M
> > > live-but-mute peers; and also also reporting M-sized lists of mute
> > > peers to the monitoring storage).
> > >  
> > > Of course M will vary depending on the topology, but will generally be
> > > a fraction of N (e.g.: one or more subnets with outbound traffic
> > > blocked).
> > >  
> > > Let me know if I'm messing the numbers!
> > >  
> > > Cheers!
> > > /g
> > >  
> > > On 21 February 2013 08:26, Dan Mihai Dumitriu <dan at midokura.com (mailto:dan at midokura.com)> wrote:
> > > > I think I prefer having metrics for each tunnel RTT.  Yes, it's O(N^2)
> > > > messages and state, which is not great.  However, MN could really know and
> > > > show the state of the underlay network to the operators, on a pairwise
> > > > basis.  If there is path diversity by each host being multihomed, MN can
> > > > even make a decision about which tunnel to use, based on the tunnel
> > > > 'health'.
> > > >
> > > >
> > > > On Thu, Feb 21, 2013 at 2:25 AM, de Palol, Marc <marc at midokura.jp (mailto:marc at midokura.jp)> wrote:
> > > >>
> > > >> comments inline
> > > >>
> > > >>> Hi Marc, thanks for the feedback.
> > > >>>
> > > >>> If I understand your suggestion correctly, the problem I see is that
> > > >>> the ephemeral node based solution  it will signal problems on the
> > > >>> hosts, but not on the wire. Two peers A, B might be alive, have
> > > >>> conectivity with ZK, the tunnels be technically alive and working
> > > >>> A->B, but failing B->A. In this situation, ephemeral nodes wouldn't be
> > > >>> deleted. Does that make sense?
> > > >>>
> > > >>
> > > >> I was not talking about having about having the host itself in zk, this
> > > >> way we would have the problem you mention, I was thinking of having a
> > > >> connection to zk (as an ephemeral node) tied to the tunnel itself.
> > > >> But after reading your e-mail I think that the  "hey, there is no incomin
> > > >> data on our tunnels" solution you are proposing is far easier to understand
> > > >> + implement. This zk solution was a bit of an overkill.
> > > >>
> > > >>>
> > > >>> On a side note, and more directed to @Adam:
> > > >>>
> > > >>> Pino and I were discussing yesterday if we really need that much
> > > >>> granularity in the diagnostic.
> > > >>>
> > > >>> Our current proposal gives value under the assumption that each host
> > > >>> depends on specific network conditions and therefore it's important to
> > > >>> report exactly what host, what tunnel, and what direction seems to be
> > > >>> dead.
> > > >>>
> > > >>> But in practise, a bunch of hosts will depend on the same network
> > > >>> conditions (e.g.: all are in the same subnet). And therefore, if the
> > > >>> subnet is unreachable from outside all tunnels will be dead, so
> > > >>> reporting the problem for each tunnel is simply redundanty.
> > > >>>
> > > >>> With this in mind, it may be enough to implement a much simpler
> > > >>> proposal whereby every MM agent would simply report if it's receiving
> > > >>> inbound data on tunnel ports. When they don't, having all hosts on the
> > > >>> subnet saying "hey, there is no incomin data on our tunnels" is
> > > >>> probably good enough to know where to investigate. For example, would
> > > >>> this have been enough in the Netflix PoC?
> > > >>>
> > > >>> What do you think?
> > > >>>
> > > >>> Thanks!
> > > >>> /g
> > > >>>
> > > >>> On 19 February 2013 23:12, de Palol, Marc <marc at midokura.jp (mailto:marc at midokura.jp)> wrote:
> > > >>> > Hi all,
> > > >>> >
> > > >>> > I agree that the results will need to be stored in Cassandra, for what
> > > >>> > Galo
> > > >>> > said, the metrics are there and the GUI already knows how to get them.
> > > >>> >
> > > >>> > About this 'are you there' problem. I wonder if we could use
> > > >>> > zookeeper's
> > > >>> > ephemeral nodes. This nodes exist in zookeeper as long as the session
> > > >>> > who
> > > >>> > created them still exists. We could create a znode for every tunnel,
> > > >>> > tied to
> > > >>> > the session. If a tunnel disappears or stops working the ephemeral node
> > > >>> > disappears (don't know how, we should see the details here). There
> > > >>> > could be
> > > >>> > some watchers set in place to notify the responsible for the tunnel
> > > >>> > recreation.
> > > >>> >
> > > >>> >
> > > >>> > On Tue, Feb 19, 2013 at 5:27 PM, Navarro, Galo <galo at midokura.jp (mailto:galo at midokura.jp)>
> > > >>> > wrote:
> > > >>> >>
> > > >>> >> Just to clarify after talking w. Guillermo:
> > > >>> >>
> > > >>> >> Even though we don't need to active listen for an ACK received (for
> > > >>> >> the reasons explained before), we do need to implement a mechanism to
> > > >>> >> receive and reply to "are-you-there" messages received on one side of
> > > >>> >> the tunnel.
> > > >>> >>
> > > >>> >> /g
> > > >>> >>
> > > >>> >> On 19 February 2013 16:50, Navarro, Galo <galo at midokura.jp (mailto:galo at midokura.jp)> wrote:
> > > >>> >> > On 19 February 2013 16:31, Guillermo Ontañón <guillermo at midokura.jp (mailto:guillermo at midokura.jp)>
> > > >>> >> > wrote:
> > > >>> >> >> On Tue, Feb 19, 2013 at 4:21 PM, Navarro, Galo <galo at midokura.jp (mailto:galo at midokura.jp)>
> > > >>> >> >> wrote:
> > > >>> >> >>>
> > > >>> >> >>> Hi Guillermo, thanks for the quick feedback! Some comments below
> > > >>> >> >>>
> > > >>> >> >>> >> - TunnelPorts become active on each side of the tunnel, the
> > > >>> >> >>> >> TunnelDoc
> > > >>> >> >>> >>   becomes aware of local ports and starts taking care of them.
> > > >>> >> >>> >> - Regularly, for each cared-for tunnel the TunnelDoc:
> > > >>> >> >>> >>     - Sends a packet to the other peer
> > > >>> >> >>> >>     - Logs variation on RX value of the PortStats on the
> > > >>> >> >>> >> tunnel's
> > > >>> >> >>> >> local
> > > >>> >> >>> >> port.
> > > >>> >> >>> >>     - If variation = 0, increment a "no-increment" counter
> > > >>> >> >>> >>     - If "no-increment" counter > threshold, trigger alert
> > > >>> >> >>> >> message
> > > >>> >> >>> >> for
> > > >>> >> >>> >>       lack of connectivity on the REVERSE direction of the
> > > >>> >> >>> >> tunnel
> > > >>> >> >>> >> (e.g.:
> > > >>> >> >>> >>       if the TunnelDoc at A spots no RX, the alert refers to
> > > >>> >> >>> >> loss
> > > >>> >> >>> >> of
> > > >>> >> >>> >>       connectivity from B to A).
> > > >>> >> >>> >>     - Implement whatever corrective measures upon receiving the
> > > >>> >> >>> >> alert
> > > >>> >> >>> >>       (typically, the DatapathController could recreate the
> > > >>> >> >>> >> tunnel)
> > > >>> >> >>>
> > > >>> >> >>> > This is not a lot of extra traffic, but the number of tunnels
> > > >>> >> >>> > does
> > > >>> >> >>> > grow
> > > >>> >> >>> > quadratically with the number of MM agents. I propose a slight
> > > >>> >> >>> > variation
> > > >>> >> >>> > on
> > > >>> >> >>> > the above to avoid sending traffic on non-idle tunnels, along
> > > >>> >> >>> > the
> > > >>> >> >>> > lines
> > > >>> >> >>> > of
> > > >>> >> >>> > what is done by IPsec's dead peer detection:
> > > >>> >> >>> >
> > > >>> >> >>> > http://www.ietf.org/rfc/rfc3706.txt
> > > >>> >> >>> >
> > > >>> >> >>> > Basically, from the POV of view of one of the nodes, it looks
> > > >>> >> >>> > like
> > > >>> >> >>> > this:
> > > >>> >> >>> >
> > > >>> >> >>> >    * Monitor idleness (by looking at RX as you outline above)
> > > >>> >> >>> > and do
> > > >>> >> >>> > nothing
> > > >>> >> >>> > and consider the tunnel healthy while idleness doesn't go above
> > > >>> >> >>> > a
> > > >>> >> >>> > certain
> > > >>> >> >>> > threshold.
> > > >>> >> >>> >    * When the tunnel becomes idle, send an "are-you-there"
> > > >>> >> >>> > packet to
> > > >>> >> >>> > the
> > > >>> >> >>> > Peer (we could just use the tunnel-key for this).
> > > >>> >> >>> >    * When an "are-you-there" packet is received, reply to it
> > > >>> >> >>> > with an
> > > >>> >> >>> > Ack.
> > > >>> >> >>>
> > > >>> >> >>> This is definitely better. I messed up copypastes badly but the
> > > >>> >> >>> idea
> > > >>> >> >>> was basically what you explain, the "send packet to another peer"
> > > >>> >> >>> would be conditioned to several cycles without increment on the
> > > >>> >> >>> "no-data-increment" counter.
> > > >>> >> >>
> > > >>> >> >>
> > > >>> >> >>
> > > >>> >> >> But I think that for this to work you need the 'ack' reply, would
> > > >>> >> >> it be
> > > >>> >> >> included? Otherwise a host may be receiving traffic (non-idle) but
> > > >>> >> >> not
> > > >>> >> >> sending, and would never send any 'are-you-there' packets to the
> > > >>> >> >> other
> > > >>> >> >> side
> > > >>> >> >> because its RX is increasing.
> > > >>> >> >
> > > >>> >> > But note that A is only monitoring *incoming* connectivity (B->A).
> > > >>> >> > This is because once the packet leaves A it's agent can tell that
> > > >>> >> > something is broken in the line, but not in what direction (is PING
> > > >>> >> > lost bc. A->B is cut, or ACK lost because B->A is cut?). We need to
> > > >>> >> > report health of each direction.
> > > >>> >> >
> > > >>> >> > So, A doesn't care about A->B. It only asserts that data is arriving
> > > >>> >> > from B. With this in mind, once A's agent sends the "are-you-there"
> > > >>> >> > message it doesn't really need to pay attention to the ACK.
> > > >>> >> >
> > > >>> >> > From the other side, B will do the same in reverse. If the
> > > >>> >> > "are-you-there" never arrives because A->B is broken, B will notice
> > > >>> >> > the static rx count and start a health check of the A->B direction.
> > > >>> >> >
> > > >>> >> > Does that make sense?
> > > >>> >> > /g
> > > >>> >
> > > >>> >
> > > >>
> > > >>
> > > >
>  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.midonet.org/pipermail/midonet-dev/attachments/20130228/6b44d8f3/attachment-0001.html>


More information about the MidoNet-dev mailing list