[MidoNet-dev] Feature Proposal: Tunnel Health checks

Dan Mihai Dumitriu dan at midokura.com
Thu Feb 28 00:51:34 UTC 2013


I'd love to have RTT on all the tunnels - I really think that would be
awesome.  Is it not feasible for some reason? :)


On Thu, Feb 28, 2013 at 8:02 AM, de Candia, Giuseppe <gdecandia at midokura.com
> wrote:

> Just to make sure that we didn't miss something - was Dan asking for
> tunnel RTT? Providing that would require constantly sending hello messages
> between every pair of hosts even when they're successfully forwarding
> packets to each other and tracking the delay of the replies.
>
> Dan, can you clarify?
>
> -- Pino
> On Feb 21, 2013 11:23 AM, "Navarro, Galo" <galo at midokura.com> wrote:
>
>> Hi Dan, thanks for the feedback.
>>
>> In practice performance is't not really going to be that bad if we
>> consider that nodes will only try to send "are-you-there" messages and
>> report "mute peers" if the peer is alive.
>>
>> The worst case O(n^2) scenario would require that all nodes are alive
>> yet isolated from every other node, but somehow having connectivity to
>> Zookeeper (!) so all seem alive to each other but mute and they keep
>> pinging each other with O(n^2) messages.
>>
>> In practice this is very unlikely. The "real" worst case should be
>> bound by O(M * N), where M is the number of *live* cut-off peers, N
>> the total number of peers. All peers will just send messages to the M
>> live-but-mute peers; and also also reporting M-sized lists of mute
>> peers to the monitoring storage).
>>
>> Of course M will vary depending on the topology, but will generally be
>> a fraction of N (e.g.: one or more subnets with outbound traffic
>> blocked).
>>
>> Let me know if I'm messing the numbers!
>>
>> Cheers!
>> /g
>>
>> On 21 February 2013 08:26, Dan Mihai Dumitriu <dan at midokura.com> wrote:
>> > I think I prefer having metrics for each tunnel RTT.  Yes, it's O(N^2)
>> > messages and state, which is not great.  However, MN could really know
>> and
>> > show the state of the underlay network to the operators, on a pairwise
>> > basis.  If there is path diversity by each host being multihomed, MN can
>> > even make a decision about which tunnel to use, based on the tunnel
>> > 'health'.
>> >
>> >
>> > On Thu, Feb 21, 2013 at 2:25 AM, de Palol, Marc <marc at midokura.jp>
>> wrote:
>> >>
>> >> comments inline
>> >>
>> >>> Hi Marc, thanks for the feedback.
>> >>>
>> >>> If I understand your suggestion correctly, the problem I see is that
>> >>> the ephemeral node based solution  it will signal problems on the
>> >>> hosts, but not on the wire. Two peers A, B might be alive, have
>> >>> conectivity with ZK, the tunnels be technically alive and working
>> >>> A->B, but failing B->A. In this situation, ephemeral nodes wouldn't be
>> >>> deleted. Does that make sense?
>> >>>
>> >>
>> >> I was not talking about having about having the host itself in zk, this
>> >> way we would have the problem you mention, I was thinking of having a
>> >> connection to zk (as an ephemeral node) tied to the tunnel itself.
>> >> But after reading your e-mail I think that the  "hey, there is no
>> incomin
>> >> data on our tunnels" solution you are proposing is far easier to
>> understand
>> >> + implement. This zk solution was a bit of an overkill.
>> >>
>> >>>
>> >>> On a side note, and more directed to @Adam:
>> >>>
>> >>> Pino and I were discussing yesterday if we really need that much
>> >>> granularity in the diagnostic.
>> >>>
>> >>> Our current proposal gives value under the assumption that each host
>> >>> depends on specific network conditions and therefore it's important to
>> >>> report exactly what host, what tunnel, and what direction seems to be
>> >>> dead.
>> >>>
>> >>> But in practise, a bunch of hosts will depend on the same network
>> >>> conditions (e.g.: all are in the same subnet). And therefore, if the
>> >>> subnet is unreachable from outside all tunnels will be dead, so
>> >>> reporting the problem for each tunnel is simply redundanty.
>> >>>
>> >>> With this in mind, it may be enough to implement a much simpler
>> >>> proposal whereby every MM agent would simply report if it's receiving
>> >>> inbound data on tunnel ports. When they don't, having all hosts on the
>> >>> subnet saying "hey, there is no incomin data on our tunnels" is
>> >>> probably good enough to know where to investigate. For example, would
>> >>> this have been enough in the Netflix PoC?
>> >>>
>> >>> What do you think?
>> >>>
>> >>> Thanks!
>> >>> /g
>> >>>
>> >>> On 19 February 2013 23:12, de Palol, Marc <marc at midokura.jp> wrote:
>> >>> > Hi all,
>> >>> >
>> >>> > I agree that the results will need to be stored in Cassandra, for
>> what
>> >>> > Galo
>> >>> > said, the metrics are there and the GUI already knows how to get
>> them.
>> >>> >
>> >>> > About this 'are you there' problem. I wonder if we could use
>> >>> > zookeeper's
>> >>> > ephemeral nodes. This nodes exist in zookeeper as long as the
>> session
>> >>> > who
>> >>> > created them still exists. We could create a znode for every tunnel,
>> >>> > tied to
>> >>> > the session. If a tunnel disappears or stops working the ephemeral
>> node
>> >>> > disappears (don't know how, we should see the details here). There
>> >>> > could be
>> >>> > some watchers set in place to notify the responsible for the tunnel
>> >>> > recreation.
>> >>> >
>> >>> >
>> >>> > On Tue, Feb 19, 2013 at 5:27 PM, Navarro, Galo <galo at midokura.jp>
>> >>> > wrote:
>> >>> >>
>> >>> >> Just to clarify after talking w. Guillermo:
>> >>> >>
>> >>> >> Even though we don't need to active listen for an ACK received (for
>> >>> >> the reasons explained before), we do need to implement a mechanism
>> to
>> >>> >> receive and reply to "are-you-there" messages received on one side
>> of
>> >>> >> the tunnel.
>> >>> >>
>> >>> >> /g
>> >>> >>
>> >>> >> On 19 February 2013 16:50, Navarro, Galo <galo at midokura.jp> wrote:
>> >>> >> > On 19 February 2013 16:31, Guillermo Ontañón <
>> guillermo at midokura.jp>
>> >>> >> > wrote:
>> >>> >> >> On Tue, Feb 19, 2013 at 4:21 PM, Navarro, Galo <
>> galo at midokura.jp>
>> >>> >> >> wrote:
>> >>> >> >>>
>> >>> >> >>> Hi Guillermo, thanks for the quick feedback! Some comments
>> below
>> >>> >> >>>
>> >>> >> >>> >> - TunnelPorts become active on each side of the tunnel, the
>> >>> >> >>> >> TunnelDoc
>> >>> >> >>> >>   becomes aware of local ports and starts taking care of
>> them.
>> >>> >> >>> >> - Regularly, for each cared-for tunnel the TunnelDoc:
>> >>> >> >>> >>     - Sends a packet to the other peer
>> >>> >> >>> >>     - Logs variation on RX value of the PortStats on the
>> >>> >> >>> >> tunnel's
>> >>> >> >>> >> local
>> >>> >> >>> >> port.
>> >>> >> >>> >>     - If variation = 0, increment a "no-increment" counter
>> >>> >> >>> >>     - If "no-increment" counter > threshold, trigger alert
>> >>> >> >>> >> message
>> >>> >> >>> >> for
>> >>> >> >>> >>       lack of connectivity on the REVERSE direction of the
>> >>> >> >>> >> tunnel
>> >>> >> >>> >> (e.g.:
>> >>> >> >>> >>       if the TunnelDoc at A spots no RX, the alert refers to
>> >>> >> >>> >> loss
>> >>> >> >>> >> of
>> >>> >> >>> >>       connectivity from B to A).
>> >>> >> >>> >>     - Implement whatever corrective measures upon receiving
>> the
>> >>> >> >>> >> alert
>> >>> >> >>> >>       (typically, the DatapathController could recreate the
>> >>> >> >>> >> tunnel)
>> >>> >> >>>
>> >>> >> >>> > This is not a lot of extra traffic, but the number of tunnels
>> >>> >> >>> > does
>> >>> >> >>> > grow
>> >>> >> >>> > quadratically with the number of MM agents. I propose a
>> slight
>> >>> >> >>> > variation
>> >>> >> >>> > on
>> >>> >> >>> > the above to avoid sending traffic on non-idle tunnels, along
>> >>> >> >>> > the
>> >>> >> >>> > lines
>> >>> >> >>> > of
>> >>> >> >>> > what is done by IPsec's dead peer detection:
>> >>> >> >>> >
>> >>> >> >>> > http://www.ietf.org/rfc/rfc3706.txt
>> >>> >> >>> >
>> >>> >> >>> > Basically, from the POV of view of one of the nodes, it looks
>> >>> >> >>> > like
>> >>> >> >>> > this:
>> >>> >> >>> >
>> >>> >> >>> >    * Monitor idleness (by looking at RX as you outline above)
>> >>> >> >>> > and do
>> >>> >> >>> > nothing
>> >>> >> >>> > and consider the tunnel healthy while idleness doesn't go
>> above
>> >>> >> >>> > a
>> >>> >> >>> > certain
>> >>> >> >>> > threshold.
>> >>> >> >>> >    * When the tunnel becomes idle, send an "are-you-there"
>> >>> >> >>> > packet to
>> >>> >> >>> > the
>> >>> >> >>> > Peer (we could just use the tunnel-key for this).
>> >>> >> >>> >    * When an "are-you-there" packet is received, reply to it
>> >>> >> >>> > with an
>> >>> >> >>> > Ack.
>> >>> >> >>>
>> >>> >> >>> This is definitely better. I messed up copypastes badly but the
>> >>> >> >>> idea
>> >>> >> >>> was basically what you explain, the "send packet to another
>> peer"
>> >>> >> >>> would be conditioned to several cycles without increment on the
>> >>> >> >>> "no-data-increment" counter.
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >> But I think that for this to work you need the 'ack' reply,
>> would
>> >>> >> >> it be
>> >>> >> >> included? Otherwise a host may be receiving traffic (non-idle)
>> but
>> >>> >> >> not
>> >>> >> >> sending, and would never send any 'are-you-there' packets to the
>> >>> >> >> other
>> >>> >> >> side
>> >>> >> >> because its RX is increasing.
>> >>> >> >
>> >>> >> > But note that A is only monitoring *incoming* connectivity
>> (B->A).
>> >>> >> > This is because once the packet leaves A it's agent can tell that
>> >>> >> > something is broken in the line, but not in what direction (is
>> PING
>> >>> >> > lost bc. A->B is cut, or ACK lost because B->A is cut?). We need
>> to
>> >>> >> > report health of each direction.
>> >>> >> >
>> >>> >> > So, A doesn't care about A->B. It only asserts that data is
>> arriving
>> >>> >> > from B. With this in mind, once A's agent sends the
>> "are-you-there"
>> >>> >> > message it doesn't really need to pay attention to the ACK.
>> >>> >> >
>> >>> >> > From the other side, B will do the same in reverse. If the
>> >>> >> > "are-you-there" never arrives because A->B is broken, B will
>> notice
>> >>> >> > the static rx count and start a health check of the A->B
>> direction.
>> >>> >> >
>> >>> >> > Does that make sense?
>> >>> >> > /g
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.midonet.org/pipermail/midonet-dev/attachments/20130228/21e387a2/attachment-0001.html>


More information about the MidoNet-dev mailing list