On June 8, early risers on the east coast of the United States woke up to a broken internet. A wide range of websites including Amazon, Reddit, and the New York Times were unavailable. A Drudge Report headline, for those who could access it, blared, “GLOBAL WEB OUTAGE — MEDIA, GOVT WEBSITES HIT.” The brief but sweeping website outage was attributable to a single bug affecting services run by Fastly, a cloud service provider. Alongside competitors like Cloudflare and Akamai, Fastly offers customers a sort of internet fast lane—reducing website loading times by storing a copy of in-demand content on a highly optimized network of servers distributed throughout the world.
A handful of these “edge computing providers” support many of the world’s most visited websites, and they have, for the most part, made using the internet snappier and less frustrating. With the help of edge computing, a user, whether in Seoul or Manhattan, can watch New York Times videos seamlessly. But these advantages come with distinct risks. When the accessibility of major sites depends on the performance of a single provider, that provider becomes a single point of failure for a broad swath of the internet. A difficult trade-off emerges, whereby websites utilizing edge computing tend to work better the vast majority of the time—but when they fail, they fail en masse. A comparatively stronger basket now holds all the eggs.
In a recent study, we and our co-authors, Samantha Bates, Shane Greenstein, Jordi Weinstock, and Yunhan Xu, examine structural risks associated with the emergence of a small handful of dominant cloud providers. The study offers one window into trends in “internet entropy,” which is how distributed versus concentrated the hosting of various major online destinations has become. The internet was designed from its very beginnings to be radically decentralized and, therefore, robust to the failure of individual components. If a router goes down, packets can typically find an alternate route to their destination. If Bank of America’s online checking deposit service crashes, it doesn’t take Capital One’s down with it. But as more and more websites, each acting reasonably, entrust their hosting and networking to a handful of cloud service providers, that design paradigm is eroding.
To track the declining entropy of the internet, the study scrutinizes trends in how an index of the world’s 1,000 highest-traffic websites implemented one critical networking technology—how users can find them via the Domain Name System (DNS)—between November 2011 and July 2018. The DNS enables web browsers to translate between human-friendly domain names (like www.google.com) and the machine-friendly IP addresses (like 126.96.36.199) that enable them to locate and access websites’ servers. Websites designate special servers that correlate numeric IP addresses—needed to reach them—with their widely advertised domain names, which is how people try to find them. Websites can delegate this task, if they choose, to a cloud service provider. Like Fastly’s content delivery functions, DNS is a mission-critical service—a website without functioning DNS servers is unreachable for most users.
Our data, illustrated in Figure 1, reflects a massive, ongoing shift away from self-hosting of DNS, and toward reliance on external cloud service providers. While only about 32.9 percent of websites relied exclusively on a cloud service provider to manage their DNS in 2011, nearly 66 percent were doing so by 2018.
Figure 1. Percentage of domains relying on self-hosted DNS, externally hosted DNS, or both over time.
This shift toward external DNS hosting solutions accords with the emergence of a concentration of dominant cloud service providers responsible for handling a growing fraction of the market for DNS services. The lines in Figure 2 chart what economists call “concentration ratios”—the percentages of market share captured by the top one, four and eight biggest providers, respectively—across the timespan we studied. In November 2011, the eight biggest external DNS providers collectively hosted DNS for just over 24 percent of the 1,000 highest-traffic websites. By July 2018, this proportion had risen to about 61.6 percent.
Figure 2. Percentage of overall market share (CR) held by the top one, four and eight providers over time.
Taken together, these two trends—the move toward external DNS hosting and the concentration of the external DNS hosting market—amount to a huge reduction in internet entropy. A once-distributed system is now being channeled in increasing measure through the infrastructure of a small cadre of cloud service providers. This matters, for much the same reason that websites’ reliance on a few dominant edge computing providers does: When a major external DNS provider goes down, so do the many sites that rely on it. And the threat of mass DNS outages is not theoretical—a distributed-denial-of-service (DDoS) attack on DNS provider Dyn in October 2016 caused catastrophic outages for websites across the United States. (“INTERNET BLACKOUT,” screamed Drudge.)
Unfortunately, websites have not used the tools at their disposal to mitigate the risk of future mass outages. The DNS protocol explicitly contemplates the use of backup DNS servers that kick in automatically when primary servers go down, maintaining a site’s reachability. Despite a slight uptick in the practice following the Dyn attack (reflected in Figure 3), the use of secondary DNS—which could enable websites to “diversify” their DNS across multiple providers, or self-host backup servers—remains marginal. It’s not entirely clear why, but this tendency could have to do with cloud hosting provider lock-in, resource limitations, a lack of general awareness, or some combination of those and other factors.
Figure 3. Percentage of domains using one, two or three or more DNS providers over time.
In the case of DNS, the path forward in the face of declining internet entropy seems relatively clear. While the use of secondary DNS might not restore entropy per se, it has the potential to substantially increase resilience to error and attack on an internet-wide basis. External DNS providers should encourage the use of secondary DNS, making it as easy as possible to configure a backup provider. In most instances, provisioning a secondary DNS should be relatively inexpensive.
The fact that secondary DNS is a no-brainer for many websites makes it all the more striking that adoption has more or less flatlined since the months following the Dyn attack. That so many websites have failed to adopt such an easy and effective mitigation strategy speaks to an all-too-common phenomenon in the world of cybersecurity (and, more broadly, engineering)—unpatched, poorly designed, or otherwise neglected systems force those tasked with maintaining and defending them into a reactive rather than proactive posture. Even well-resourced organizations struggle to learn from the mistakes of others, leaving them to retread the same dangerous paths. Indeed, our analysis found that more than half of the websites that added a secondary DNS provider in the wake of the Dyn attack were those that had borne its brunt—sites that relied exclusively on Dyn alone at the time of the attack. Similarly undiversified websites that had dodged the bullet by choosing a different provider diversified at dramatically lower rates.
And with industry-wide DNS resilience seemingly still out of reach, it’s no comfort that contending with declining internet entropy may prove even harder across internet technologies beyond DNS—including the far more complex and expensive edge computing services that Fastly provides. There’s likely no putting the cat back into the bag and “re-decentralizing” the web—though some are nobly trying. Even so, asking dominant cloud service providers to “do better” likely won’t be enough. Bugs, cyberattacks, and human error are problems for even the most sophisticated and fastidious of operations. Companies and other organizations that rely on consistent website uptime should look for ways to hedge against the failure of the cloud service providers they rely on, including by building redundancy into their systems and developing contingency plans that account for provider failure.
Ultimately, markets alone may not be up to the task of formulating and driving the adoption of new best practices for this era of declining internet entropy. That’s particularly the case for cybersecurity, where those who build services must keep pace with those interested in breaking them. Comprehensive cybersecurity legislation, long a chimera in the U.S. and elsewhere, remains a distant prospect for now, but the government might get the ball rolling by revisiting its own procurement policies. In a promising show of attention, Congress passed a law in December 2020 to establish baseline security standards for “Internet of Things” devices procured for federal use.
There’s a long and uncertain road ahead on which to acclimate to the internet’s new, more centralized architecture.