From the article:<p>> <i>With the current “default free zone” containing around 1,000,000 routes</i><p>Back in ~1998 I was tasked with building a route collector/looking glass machine for an internet exchange point (sadly defunct). I remember the day we switched the collector on and acquired "all the routes", there were ~98,000 of them, you could've knocked me over with a feather. It was like looking into the Total Perspective Vortex. Having been out of that game for many years now I'd no idea we were up to 1M routes...wow. One of the RIPE conferences I attended back then there was much concern about the rapidly increasing size of the global routing table and whether vendors could build hardware powerful enough to keep up.<p>For anyone interested the route collector was built on FreeBSD (3.0 I think) and Zebra[0].<p>And finally, what cracking blog, especially stuff like this:<p><a href="https://blog.benjojo.co.uk/post/eve-online-bgp-internet" rel="nofollow">https://blog.benjojo.co.uk/post/eve-online-bgp-internet</a><p>[0]: <a href="https://en.wikipedia.org/wiki/GNU_Zebra" rel="nofollow">https://en.wikipedia.org/wiki/GNU_Zebra</a>
This reminds me of when YouTube was down for a lot of the world when Pakistan banned YouTube and one of the country's telecom company forgot to switch off their BGP route (if that is what the correct terminology would be).[0] Half as Interesting made a nice YouTube video about it.[1]<p>[0] <a href="https://www.cnet.com/news/how-pakistan-knocked-youtube-offline-and-how-to-make-sure-it-never-happens-again/" rel="nofollow">https://www.cnet.com/news/how-pakistan-knocked-youtube-offli...</a><p>[1] <a href="https://www.youtube.com/watch?v=K9gnRs33NOk" rel="nofollow">https://www.youtube.com/watch?v=K9gnRs33NOk</a>
Thinking out loud: When I read the BGP spec, I got the feeling that it was optimized for reduced churn. As the Internet routing table size increased and increase in CPU power of routers was an uncertainty, the architects of the Internet wanted to avoid extra BGP exchanges.<p>However, now it seems like the Internet is facing new challenges and a different trade-off might make sense. Why not add a "valid until" attribute on each route? The originating router would have to re-announce a new route every 24 hours. Failure to propagate the update at any point would automatically withdraw it. Of course, re-announcing 1M routes every day might be a lot, but at this point it feels worth considering.
I wonder if a robust consensus algorithm might be a better investment than a timeout. I would imagine there are other bugs in BGP implementations so having a routing table that's going to trend towards eventual consistency regardless of the starting point might be a more robust solution than just focusing on this one corner case. Might be a more intrusive change though & hard to get middleware to roll out such a change?
Nice article on the basic functionalities of the Internet backbone. I really like the animation explaining this article with nice pictures. In short, BGP has a bug that potentially created a huge outage in August 2020. The proposed fix is to imrove the BGP protocol with a new feature. It's not easy because, it's the backbone of internet. Let's see where this will go.
So I keep coming into situations where I think this is the problem that's occurring (a stuck route). While I'd certainly love to be able to diagnosis this, would it even matter? There's no recourse that I can take as an end user is there?
... hm, how come withdraw (and announce) messages are not ACKed in-band? or maybe they are, but due to explicit demonic of certain routers (and/or ASes) they still don't take effect?