[This article was originally posted by Andrew Yager on LinkedIn]
The last week hasn’t been great for the global internet community from a routing security standpoint. From the news that Google’s traffic was accidentally routed through Russia (and Nigeria) to Telstra accidentally black-holing customer traffic yesterday morning, you could be forgiven for wondering how the internet functions at all.
In a conversation with one of my colleagues on Wednesday, not familiar with how global BGP exchanges worked, he commented that he just didn’t understand how Tier 1 Service providers could allow traffic to run along unplanned/unexpected paths.
The answer is many fold. When BGP (the standard that Service Providers use to exchange information about who has which IP addresses, and the paths that they take around the internet) it was envisaged to support a small number of service providers. While it could scale, the actual scale has gone well beyond the original design and planning. The recent introduction of things such as RPKI security on routes and signed updates by the address registries have also attempted to improve the situation; but the adoption of RPKI is still very low.
Ultimately, though, the issue boils down to the fact that we are lazy and too trusting – and at the same time, we have no real method to not be too trusting. When two networks connect, it is generally considered the responsibility of the larger network to “filter” any IP address information coming from their smaller counterpart. This prevents accidental route leaks, or other information. But the smaller network generally “trusts” the larger network to provide them a correct and accurate view of the internet. The smaller ISP has no specific way to verify whether that view of the internet is or isn’t right, because outside of that there is no “source of truth” that they can rely on in real time.
When two equal sized networks connect, there is a range of strategies used. But usually, they trust each other equally. Certainly in the case of the Google traffic incident, it appears as though there was an equal trust between the two parties involved.
So what can we do?
Well, to start, we need to stop trusting as much. We need networks to do a better job of preventing unexpected advertisements and agree on standards and methods to do this between providers. We need better software managing our routing tables – software that looks for and understands route and path updates, and when they are suspicious actively interact with engineers to investigate.
Imagine a world where, when a more specific prefix announcement occurs, the AS Path is verified, and in real-time geographically mapped. If you could see that the traffic to Google’s prefixes was now moving through a different geographic zone through potentially untrusted countries, you could very quickly make an informed decision to not accept that route. You could even build a set of pre-determined rules to help this.
Of course, this would require routing software and hardware manufactures to greatly improve their intelligence regarding their routes and services. Given the performance and power challenges that already face most of our hardware routing platforms, I don’t see this as very likely.