[FREE ACCESS] Intelligence Explainer: Why Half the Internet Went Down Yesterday
Ujasusi Blog Originals | Intelligence Explainers
What Happened? The Basics đ
On 20 October 2025, a major AWS outage caused widespread internet disruptions, taking down dozens of popular apps, games, and websites including Snapchat, Roblox, Fortnite, Duolingo, and Ring. Users began reporting problems at approximately 8 AM UK time (midnight PT), with Downdetector showing massive, simultaneous spikes in outage reports for services relying on Amazonâs cloud infrastructure.
Downdetector received 6.5 million outage reports globally, with more than 1 million from the United States in the first two hours. This represented one of the largest internet disruptions in recent history.
Scale and Impact: How Many Services Were Affected? đ
The scope of disruption was extensive. Major platforms confirmed to be experiencing issues included:
Social Media & Communication: Snapchat, Reddit, and Signal
Gaming & Entertainment: Roblox, Fortnite, Epic Games Store, Clash Royale, Clash of Clans, Rocket League, Dead by Daylight, VRChat, Tom Clancyâs Rainbow Six Siege, and PlayStation Network
Productivity & Education: Canva, Duolingo, and Canvas by Instructure
Amazonâs Own Services: Amazon.com, Amazon Alexa, Ring, and Amazon Prime Video
Finance & Streaming: Venmo, Robinhood, Chime, Coinbase, and Crunchyroll
Additionally, cryptocurrency exchange Coinbase, AI firm Perplexity, and US airlines Delta and United also reported issues. In the United Kingdom, banking customers of Lloyds, Bank of Scotland, and Halifax reported login difficulties.
The Technical Root Cause: What Broke? đ§
Primary Failure Point: DynamoDB and DNS Issues
Amazon confirmed it was investigating âincreased error ratesâ and delays affecting multiple services, specifically citing Amazon DynamoDB and Amazon Elastic Computer Cloud (EC2). These core services provide the database and computing power that thousands of companies rent to run their own applications, which explains the cascading, widespread impact.
More specifically, DNS failures prevented applications from finding the correct address for AWSâs DynamoDB APIâa cloud database relied upon to store user information and other critical data.
How DNS Works (Simple Explanation)
DNS operates like a phonebook for the internet. When you type a website address into your browser, DNS converts that human-readable name into the correct numerical IP address, allowing your device to locate the correct server. If DNS fails, applications cannot find the services they need, even if those services are technically functioning.
Secondary Failure: EC2 and Health Monitoring Systems
AWS identified that the root cause originated from an underlying subsystem that monitors the health of its network load balancers used to distribute traffic across several servers. The issue originated from within the âEC2 internal network,â where EC2 refers to Amazonâs âElastic Compute Cloudâ service, which provides on-demand cloud capacity within AWS for businesses to run virtual servers to develop, launch and host applications.
Where Did It Happen? The Geographic Concentration Problem đ
The issue appeared to stem from problems at Amazonâs massive data centre facilities in North Virginia, a critical hub for the global internet.
More precisely, US-EAST-1âAWSâs northern Virginia clusterâcontributed to a major internet meltdown for the second time in four years. Steps to resolve EC2 system-related issues resulted in some early signs of recovery across a few data centers, with the company taking similar measures at remaining locations.
Why Did So Many Services Break at Once? The Dependency Chain Explained đ
The outage affected such a vast array of seemingly unrelated services because they all rely on AWS as their underlying infrastructure provider. This represents what is known as a single point of failure.
The simultaneous failure of these unrelated services pointed directly to a foundational infrastructure problem, with the AWS status page confirming it as the root cause.
Think of it like this: if all the roads in a country depended on a single bridge, and that bridge collapsed, the entire countryâs transportation system would fail simultaneously. Similarly, when AWSâs core services (DynamoDB and EC2) became unavailable, every application relying on them failed simultaneously.
Timeline: How Long Was the Outage? âąď¸
AWS issued a statement at 1:26 AM ET confirming âsignificant error rates for requests.â At just after 2 AM ET, the service said it believed it had identified the root cause, adding âWe are working on multiple parallel paths to accelerate recovery.â
After more than nine hours of disruptions, many applications were gradually coming back online in the afternoon in the US, but AWS acknowledged that elevated errors were still affecting several services.
Amazon confirmed that the root causeâa DNS-related issue in its US-EAST-1 regionâwas mostly resolved by 6:35 AM ET, with most AWS operations having âreturned to normal,â though some customers might continue experiencing throttled requests or minor delays.
Services That Remained Problematic After Initial Recovery đ´
Recovery was not uniform across all platforms. Whilst some apps like Reddit and Roblox had largely stabilised according to Downdetector, others including Snapchat and Duolingo were showing a resurgence in issues seen earlier in the day.
Lambda, one of AWSâs computing services, was experiencing errors due to issues with an internal subsystem, with AWS stating âWe are taking steps to recover this internal Lambda system.â
Additionally, customers were complaining of lingering difficulties using services such as digital wallet Venmo and video calling site Zoom.
Why Was This Different from Previous Outages? đ
It was the largest internet disruption since last yearâs CrowdStrike malfunction hobbled technology systems in hospitals, banks and airports, highlighting the vulnerability of the worldâs interconnected technologies.
The CrowdStrike incident in 2024 demonstrated similar systemic fragility, where a faulty software update affected millions of computers globally. The AWS outage demonstrates that even without malicious action, infrastructure failures can have comparable scale to cyberattacks.
Key Lessons: What Does This Tell Us? đĄ
1. Cloud Infrastructure Concentration
AWS powers over 30 percent of the global cloud market and hosts 4 million+ customers. This concentration means that a single outage at one provider creates global-scale disruption.
2. Lack of Redundancy
Software developers need to build better fault tolerance into their code. AWS provides tools developers can use to protect themselves in the event of a problem at one of its sprawling network of data centers. Developers can also create backups with other cloud providers. When people cut costs and cut corners to try to get an application up, and then forget that they skipped that last step and didnât really protect against an outage, those companies are the ones who really ought to be scrutinised later.
3. The Need for Decentralisation
The internet was originally designed to be decentralised and resilient, yet today so much of the online ecosystem is concentrated in a small number of cloud regions, such that when one of those regions experiences a fault, the impact is immediate and widespread.
Current Status (As of 21 October 2025) â
Amazon said the issue affecting AWS is now âfully mitigated.â
Canva, one of the many services powered by AWS, was hit hard during the outage but has since returned online for most users. The company reported that its teams had restored functionality to the majority of accounts by noon and monitored for any lingering issues. Some users may still notice occasional slow loading times when opening designs or uploading media, but the platformâs core features are up and running again.
Please consider becoming a paid subscriber
You can also donate.