The System is Down…The System Is Down…

Facebook, Instagram and WhatsApp are all down right now. Botched upgrade? Misconfigured router? Expired signing certificate? Who knows? I’m just going to assume its a problem with their latest SuppressAllPostsQuestioningTheHolyVaccineMarrative.yml file. But it’s a reminder of how deeply interconnected all online systems are these days, and how many different things can go wrong at different layers.

Expect a sudden burst of productivity from American companies.

And just in case you didn’t get the reference:

Edited to add: Additional detail:

Facebook—and apparently all the major services Facebook owns—are down today. We first noticed the problem at about 11:30 am Eastern time, when some Facebook links stopped working. Investigating a bit further showed major DNS failures at Facebook…

DNS—short for Domain Name System—is the service which translates human-readable hostnames (like arstechnica.com) to raw, numeric IP addresses (like 18.221.249.245). Without working DNS, your computer doesn’t know how to get to the servers that host the website you’re looking for.

The problem goes deeper than Facebook’s obvious DNS failures, though. Facebook-owned Instagram was also down, and its DNS services—which are hosted on Amazon rather than being internal to Facebook’s own network—were functional. Instagram and WhatsApp were reachable but showed HTTP 503 (no server is available for the request) failures instead, an indication that while DNS worked and the services’ load balancers were reachable, the application servers that should be feeding the load balancers were not.

A bit later, Cloudflare VP Dane Knecht reported that all BGP routes for Facebook had been pulled. (BGP—short for Border Gateway Protocol—is the system by which one network figures out the best route to a different network.)

With no BGP routes into Facebook’s network, Facebook’s own DNS servers would be unreachable—as would the missing application servers for Facebook-owned Instagram, WhatsApp, and Oculus VR.

DNS—short for Domain Name System—is the service which translates human-readable hostnames (like arstechnica.com) to raw, numeric IP addresses (like 18.221.249.245). Without working DNS, your computer doesn’t know how to get to the servers that host the website you’re looking for.

The problem goes deeper than Facebook’s obvious DNS failures, though. Facebook-owned Instagram was also down, and its DNS services—which are hosted on Amazon rather than being internal to Facebook’s own network—were functional. Instagram and WhatsApp were reachable but showed HTTP 503 (no server is available for the request) failures instead, an indication that while DNS worked and the services’ load balancers were reachable, the application servers that should be feeding the load balancers were not.

A bit later, Cloudflare VP Dane Knecht reported that all BGP routes for Facebook had been pulled. (BGP—short for Border Gateway Protocol—is the system by which one network figures out the best route to a different network.)

With no BGP routes into Facebook’s network, Facebook’s own DNS servers would be unreachable—as would the missing application servers for Facebook-owned Instagram, WhatsApp, and Oculus VR.

Speculation is that Facebook engineers have locked themselves out of their own network, meaning someone with physical access to the servers will have to fix things…

Edited to add 2: Krebs offers more details:

Facebook and its sister properties Instagram and WhatsApp are suffering from ongoing, global outages. We don’t yet know why this happened, but the how is clear: Earlier this morning, something inside Facebook caused the company to revoke key digital records that tell computers and other Internet-enabled devices how to find these destinations online.

Doug Madory is director of internet analysis at Kentik, a San Francisco-based network monitoring company. Madory said at approximately 11:39 a.m. ET today (15:39 UTC), someone at Facebook caused an update to be made to the company’s Border Gateway Protocol (BGP) records. BGP is a mechanism by which Internet service providers of the world share information about which providers are responsible for routing Internet traffic to which specific groups of Internet addresses.

In simpler terms, sometime this morning Facebook took away the map telling the world’s computers how to find its various online properties. As a result, when one types Facebook.com into a web browser, the browser has no idea where to find Facebook.com, and so returns an error page.

In addition to stranding billions of users, the Facebook outage also has stranded its employees from communicating with one another using their internal Facebook tools. That’s because Facebook’s email and tools are all managed in house and via the same domains that are now stranded.

“Not only are Facebook’s services and apps down for the public, its internal tools and communications platforms, including Workplace, are out as well,” New York Times tech reporter Ryan Mac tweeted. “No one can do any work. Several people I’ve talked to said this is the equivalent of a ‘snow day’ at the company.”

Developing…

Edited to add 3: Seeing reports that Gmail is down for some people. It’s not down for me. I just tested and it’s working fine.

Updated to add 4: Facebook appears to be back up, but is way wonky…

Tags: , , , , , , ,

3 Responses to “The System is Down…The System Is Down…”

  1. Howard says:

    One of your article quotes is repeated. Love the strong bad video.

    [Fixed. – LP]

  2. Howard says:

    Doug Madory! I remember that name! Used to write for Renesys. Their blog articles were fantastic.

  3. Just A Bot says:

    I’m buyin’ you a pizza.

Leave a Reply