With an average attack to mitigation time of less than 15 seconds, we’re really proud of our DDOS protection solution.
At the recent AusNOG conference I gave a 5 minute lightning talk outlining some of the tools and techniques we use to help protect our network and our customers networks.
Our solution is made possible by our global network. Our distributed architecture allows us to identify and mitigate attack traffic close to the source of attacks quickly. Each time we add a new point of presence or edge router we increase the number of points we can identify attack traffic, reduces our time to mitigation, and enhances our overall solution.
Over the last 10 years we’ve worked with many different DDOS identification and protection solutions, including outsourcing the protection to third parties and relying on expensive software solutions. The problem with each of these solutions is that they were inflexible, didn’t scale well, and didn’t meet the needs of our evolving customer and carrier network.
We’ve opted for a solution that does not use “inline” appliances from large third party vendors, but rather relies on “flow” data, combined with well established third party tools (specifically FastNetMon and Elastiflow) to allow us to build our detection and response engine. We aggregate this data using standard tools like Grafana, Clickhouse, MongoDB and OpenSearch to help us track what traffic flows through our network, and have developed rules that help us identify and catch attack traffic.
This graph shows an example attack being launched against our network against one specific IP address. You can see the attack increasing on the right hand side of the graph. As it reached a predefined threshold, our mitigation kicks in.
Our DDOS Solution algorithms help build a set of firewall rules that get deployed at our network edge. This process finds the most specific set of conditions that match the attack traffic and then craft a rule to cause this traffic to be rate limited or dropped.
We use BGP FlowSpec to craft and distribute these rules through our network, with filters to help avoid “bad things” being done because of erroneously applied rules.
Of course, a lot of work has gone into defining, establishing and tuning the rules that mitigations are built upon. We continually analyse attack traffic, and evolve these. We’re currently doing research with the GateKeeper project to look at implementing additional protections for more sophisticated attacks.
The last key piece of our puzzle is our monitoring.
When an attack occurs, we generate a series of alerts in a number of places. The most useful of these is our notifications via Slack, our inter-company messaging tool. These messages show us what happened in real time, and provide us the opportunity to take additional administrative actions (e.g. remove the block or implement further actions). The source code for this integration is available on GitHub.
We also make extensive use of the amazing Elastiflow, which provides real time intelligence and reporting on network traffic, as well as a number of security descriptors and information used by our threat intelligence team, security operations teams, and IT teams. In addition, this data can be exposed to customers to allow them to analyse their own traffic, network and security risks, and make decisions about their networks.
Each of these technologies allow us to create a truly flexible market-leading DDOS protection solution, with industry leading protection times.