Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even fast LACP needs three seconds and that's on the same collision domain.

How does BGP actually detect a link is down? Keep alive default is 30s but that can be changed. If you set it to say one second, is that wise? Once a link is down, that fact will propagate at the speed of BGP and other routing protocols. Recovery will need a similar propagation.

Depending on where the link is, a second can be a "life time" these days or not. It really depends on the environment what an appropriate heart beat interval might be.

Also, given that BGP is TCP based, it might have to interact with other lower level link detection protocols.

 help



BFD or Ethernet-OAM is the standard here.

It can get a bit hardware dependant but getting <50ms failovers from software based BFD in BIRD or FRR is fairly easy, and I've tested down to < 1ms before with hardware based BFD echo. ~50ms is the point at which a user making a traditional VOIP call won't notice the path switch.

You can get NIC's for computers (like most Nvidia/Meallanox or higher end Broadcom/Intel NIC's that do hardware BFD, and its obviously included in higher end networking kit.

You then link the BGP routes to the health of the BFD session for which that path is the next hop, and you get super quick withdrawls.


I.e. bird detects interface failure but this affects only your side of decision making. For bidirectional failure detection you do BFD with BGB. BFD default timers are 3 times 30 ms, iirc.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: