Whether talking to your friends, doing assignments, watching television or keeping up with the news, we’re basically online 24/7 nowadays, and sometimes you can forget how much you rely on the internet.
But we can be given a sharp shock when a website we depend on goes down—not least when you realise how many of your lifelines are connected to that one server.
The problems are exacerbated by service providers increasingly owning more and more of the internet, so that if one company suffers a problem, it can have a major knock-on effect for the internet as a whole.
Over the past year, there’s been a number of website outages that have left us scrambling. Here are seven of the worst…
October 2021: Facebook Outage
Being off Facebook for a few hours, especially if you connect with your friends through Messenger, can be a bit of a pain. But given that Facebook owns WhatsApp and Instagram, an outage at the social network can be catastrophic for communication.
That’s exactly what happened on October 4, when a “faulty configuration change” prevented Facebook’s 3.5 billion users from accessing its apps and products for nearly six hours.
Anybody using Facebook to log in to third-party apps was also blocked from these platforms. Twitter reportedly struggled to cope with the traffic as millions flocked to the app in lieu of other communication options.
This was even worse for Facebook because staff were using the same network to access the network remotely. Facebook Workplace, the internal communications platform used by its staff, was also down, while access cards used by staff to enter the offices were reportedly dependent on the internal systems working.
Facebook advertises where its servers are to the internet using the Border Gateway Protocol, but a “faulty configuration change” meant that it stopped telling routers where its data centres were, and it appeared to the routers that they didn’t exist.
In a statement, it added: “We also have no evidence that user data was compromised as a result of this downtime.”
June 2021: Fastly Outage
We sometimes joke that a celebrity’s actions can “break the internet” …it turns out all you actually need is a failure at a content delivery network.
On June 8, some of the world’s biggest news websites including CNN, the New Yotrk Times and CNN were taken down, as well as Amazon, Target, Twitch, Spotify, Pinterest, Hulu, Reddit and the U.K. government website, when cloud service provider Fastly suffered an outage.
Fastly, which improves load times for websites and operates a server network to prevent the effects of traffic overloads, suffered an outage due to a bad software update, saying it had identified a service configuration that triggered disruptions across its servers.
The outage lasted approximately an hour.
November 2020: Amazon Web Services
On November 25, 2020, Amazon Web Services, one of the world’s most widely used cloud computing services, suffered an outage that wreaked havoc online.
News sites like The Washington Post, product providers like Adobe and workplace tools like Trello were knocked out due to the outage, which lasted a number of hours.
People also struggled to use Amazon owned products like Ring, Alexa and iRobot.
AWS later explained that the outage was caused by its attempts to add new servers to its network.
A massive computing network in Northern Virginia began to fail after AWS started to make “a relatively small addition of capacity” to the system just before 6 a.m. Eastern time, with the new capacity and an “operating system configuration” setting off a series of errors that caused an outage in the network of servers.
December 2020: Google Services
On December 14, 2020, many realised how much they depend on Google services when the company suffered a worldwide outage that lasted about 45 minutes.
While its search engine was fine, the outage affected Gmail, Google Calendar and YouTube, meaning those at work and those just chilling out were equally as affected. It also blocked those who logged in to third-party apps using Google sign-ins.
Perhaps even more bizarre is that some were left sitting in the dark and the cold as their Google Home apps and Nest services were left inoperable.
The issue was caused by a failure in the company’s authentication tools, which manage how users log in to services run by both Google and third-party developers.
A Google spokesperson blamed “an internal storage quota issue.”
Essentially, Google’s internal tools failed to allocate enough storage space for services that handle authentication. The system should have automatically made more storage available, but it didn’t which meant the system crashed.
August 2020: CenturyLink/Level 3
Okay, so it was not quite within the last year, but it would be remiss not to include it.
On August 30, 2020, a control plane failure at the major global internet service provider (ISP) CenturyLink/Level 3 left it out of action for five hours.
The ISP is supposed to keep other sites up and running, but as it peers with providers and enterprises including Cloudflare, Google, the PlayStation Network, Xbox Live, Discord, Hulu and OpenTable, this led to a huge worldwide outage being felt by millions of internet users across the United States and Europe.
Cloudflare said its outage was caused by a “third-party transit provider incident,” and CenturyLink later explained there was an IP outage involving firewall and BGP (border gateway protocol, or the routing protocol for the internet) routing that impacted Content Delivery Networks (CDN).
In an in-depth analysis sent to customers, CenturyLink said that an improperly configured flowspec was part of an unsuccessful effort to block unwanted traffic on behalf of a customer.
The outage reportedly led to a 3.5 percent drop in global internet traffic, making it one of the biggest internet outages ever recorded.
January 2021: Slack Outage
Perhaps in another year, a work messaging system going down wouldn’t have been that much of a big deal. But in 2021, when millions are working from home, it’s a disaster.
On January 4, messaging service Slack suffered a major outage in the U.S., the U.K., Japan, India and Germany, on the first day back to work and school for millions,
The outage began about 10 a.m. ET and service was still sporadic two and a half hours later. There was tentative improvement until most issues were resolved about 3 p.m.
Slack pinned the source of the issue on network scaling issues by the AWS Transit Gateway, which didn’t scale fast enough to accommodate the spike in demand for Slack’s services as millions returned to work and school after the holidays.
A statement shared to customers read: “Around 6:00 a.m. PST we began to experience packet loss between servers caused by a routing problem between network boundaries on the network of our cloud provider.”
July 2021: Bug at Akamai
On July 22, sites like Amazon, UPS, Airnbnb, the PlayStation Network, Steam and FedEx went down, all thanks to an outage with the Akamai Edge domain name system (DNS) service.
People trying to access these sites, plus others including American Express, Delta Airlines and Home Depot, were met with a DNS error message.
Akamai said the outage, which lasted about an hour, was caused by a bug triggered by a software update.
A statement read: “Upon rolling back the software configuration update, the services resumed normal operations. Akamai can confirm this was not a cyberattack against Akamai’s platform.”
The company said it would be reviewing its update processes in light of the issue.