You’ve probably noticed that two days ago Facebook, Instagram, and Whatsapp went down. Your experience may have been just mildly annoying if you use Facebook just to procrastinate. But for many others, this was a serious issue. In this post, I want to explain to you how it happened, why it may have happened, and why this is important.
Let’s start with how the entire Facebook infrastructure went down. The internet is made of a bunch of connected computers (it’s a “net”). When you type “facebook.com” into your browser, your computer needs to find the computer owned by Facebook. To do so, it uses some sort of map. With the right path to get to that computer, you start communicating with it: Facebook sends you content, you send it back likes and whatnot. The same concept applies to anything that uses the internet: from apps like Instagram to Wi-Fi dishwashers.
Two days ago, Facebook basically took away the map telling the world’s computers how to find its various online properties. When billions of people opened the Facebook app, their phones and laptops didn’t know where to find the right computer to talk to. If you want to dig deeper into the technicalities, I recommend this tweet.
You’ve probably noticed that Whatsapp and Instagram were down too. Well, that’s because FB owns them, and runs them on the same computers (you know…“scalability”).
Technically, you could just put that “computer map” back up and running in minutes. There were a few things that made this case a bit tougher, though.
Do you know what also runs on the same computers? Facebook itself. I mean, the company stuff. Internal messaging, emails, company badges: they all need the same computers to work.
This meant that when a FB employee wanted to enter FB buildings using a FB badge, a FB smart lock read the badge and tried to interact with FB computers to check whether it was authorized to enter the building. The smart lock couldn’t find FB computers, so it didn’t open the locks.
FB employees were basically left out of their own buildings. They couldn’t even email each other, or…well…use Facebook or Whatsapp.
In short, it was a big mess. It was even harder for the millions (or billions) of people that rely on Facebook products to communicate or run their businesses. For many parts of the world, Facebook is the internet.
It’s worth it to try to understand why this happened.
The simplest explanation is human error. Basically, some engineer at Facebook fucked up. Now, Facebook is a company worth ~3 times the GDP of Denmark, whose motto for the last 7 years has been “move fast with stable infrastructure”. The company manages to keep online half of the world population (literally), and it disappears from the internet because “someone fucked up”?
It wouldn’t be the first time that a large company makes some terrible mistakes and pulls down some “internet computer maps” from the internet. Fastly is a company that manages some of these “maps”. On June 8, it had a similar outage that brought down parts of Amazon, Paypal, Reddit, etc. In that case, though, the problem was caused by a software bug: a glitch in their complex infrastructure. It looks like in Facebook’s case, someone just uploaded the wrong configuration.
How likely is it that a company like Facebook does a mistake like this? A critical component of the infrastructure must have redundancies to make sure that what happened doesn’t happen. But it happened, so multiple checks and systems (and people) must have failed at once.
Possible. But very unlikely.
Let’s assume the unlikely didn’t happen. If the wrong configuration wasn’t uploaded by mistake, the only other option is that it was uploaded on purpose. Why, though?.
I’m not a conspiracy theorist, but hey, this whole mess did happen just hours after an employee went to the “60 minutes” show talking about all the crap happening in the company. She had released thousands of internal documents just days before, leading to a series of articles that talk shit about everything happening in the company. And is testifying before congress as I write this.
So, again, I’m not a conspiracy theorist. But let’s assume I was, I could have lots of fun saying stuff like:
- Facebook took everything down to make some cleanup of some internal documents before being audited
- Facebook wanted to show the world the damage that would happen without the company as a threat to regulation
- Someone maliciously tried to take the company down to show regulators that FB needs to be regulated
I could go on and on, but these are just all weird ideas I’m puking on my keyboard. The point I want to make is that this is probably far more complex than “someone fucked up and I couldn’t see cat memes”.
We’ll probably never know exactly what happened, but I’d argue that that’s not necessary. What we need to focus on is that this is another proof of how society is deeply intertwined with Facebook products, and (at the moment) we’re all vulnerable to whatever they decide to do.
Especially in some parts of the world where Facebook is the internet, we should think of it more like a utility than a product. For some utilities, governments step in to manage them and make sure they’re available to everyone. Do you think a government could do the same with Facebook?
Unfortunately, I don’t think so. There are just a handful of organizations with the resources, infrastructure, and people to keep such a complex system running. None of them are governments.
So, unfortunately, for now, we all need to accept that we are dependent on a company that has questionable ethical behavior. One thing I don’t think is questionable though: Facebook is too embedded in society to be run the way it’s run today. Something (or someone) needs to change.