Imagine if you were trying to fly from New York to San Francisco, but your plane was routed through an airport in Asia. And a bunch of other planes were sent that way too, so your flight was backed up and your journey took much longer than expected. That’s basically what happened to some of our users today for about an hour, starting at 7:48 am Pacific time.
An error in one of our systems caused us to direct some of our web traffic through Asia, which created a traffic jam. As a result, about 14% of our users experienced slow services or even interruptions. We’ve been working hard to make our services ultrafast and “always on,” so it’s especially embarrassing when a glitch like this one happens. We’re very sorry that it happened, and you can be sure that we’ll be working even harder to make sure that a similar problem won’t happen again. All planes are back on schedule now.
Your web sites go down. Circuits fail. Network engineers goof router configs. And mostly you don’t worry about making the nightly news. But if you happen to be Google and your content constitutes up to 5% of all Internet traffic, people notice. Network engineers around the world email traceroutes to each other. IRC channels fill with speculation (”definitely was a DDoS attack”, “no, a worm”, “it was ISP xxx’s fault!”). And end users Twitter (a lot).
So what does it look like when 5% of the Internet disappears on an otherwise uneventful Thursday morning? The below graph shows average traffic across 10 tier1/2 ISPs in North America from Google’s network (ASN 15169). Outage began roughly at 10:15am and lasted through 12:15pm EDT.