The Forgotten Preconditions for a Well-Functioning Internet

09/27/2021
by   Geoff Goodell, et al.
0

For decades, proponents of the Internet have promised that it would one day provide a seamless way for everyone in the world to communicate with each other, without introducing new boundaries, gatekeepers, or power structures. What happened? This article explores the system-level characteristics of the Internet that helped it to succeed as well as it has, including trade-offs intrinsic to its design as well as the system-level implications of certain patterns of use that have emerged over the years that undermine those characteristics or limit their effectiveness. We compile some key observations about such patterns, toward the development of a general theory of why they emerged despite our best efforts, and we conclude with some suggestions on how we might mitigate the worst outcomes and avoid similar experiences in the future.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/10/2022

Designing Microservice Systems Using Patterns: An Empirical Study on Quality Trade-Offs

The promise of increased agility, autonomy, scalability, and reusability...
04/19/2017

Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection

Linguistically diverse datasets are critical for training and evaluating...
10/07/2019

Designing Interfaces to Help Stakeholders Comprehend, Navigate, and Manage Algorithmic Trade-Offs

Artificial intelligence algorithms have been applied to a wide variety o...
05/18/2021

Da Vinci – Architecture-Driven Business Solutions

This document has emerged out of Origin's past experiences with architec...
12/10/2014

Statistical Patterns in Written Language

Quantitative linguistics has been allowed, in the last few decades, with...
09/13/2021

Topics Emerged in the Biomedical Field and Their Characteristics

This study aims to reveal what kind of topics emerged in the biomedical ...
05/06/2022

Domain-Level Detection and Disruption of Disinformation

How, in 20 short years, did we go from the promise of the internet to de...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A preponderance of articles and books have emerged in recent years to decry pernicious abuse of the infrastructure of the Internet. Unsurprisingly, much of the argument is about surveillance capitalism. We know that companies such as Facebook undermine human autonomy by creating indexable records of individual persons and generating tremendous revenue along the way [1]. We also know that companies such as Microsoft provide Internet services such as Microsoft Teams to draw ordinary individuals and businesses into walled gardens and observe their habits and activities [2]. At the same time, we know that some other businesses have come to depend upon the data harvesting ecosystem, and we worry that addressing the harms of surveillance capitalism might necessarily entail collateral damage [3]. These arguments about business motivations and economic imperatives are powerful and moving, and we might hope to mitigate their negative externalities with the right set of changes to law and regulation, applied carefully. But beneath these troubling practices, the Internet has a deeper flaw that is seldom discussed, which has several manifestations, including access, naming, trust, and reputation. Although this flaw is structural, and even technical, it offers insight into a foundational problem that is about humanity as much as it is about technology.

2 Access: The Internet after a half-century

We might say that the Internet is a global institution. However, this is misleading. The Internet is not really institutional, and we might say that the Internet is successful precisely because it is not institutional. There are no particular rules beyond the protocol specifications and no global institutions with the power to enforce the few rules that do exist, which indeed are often broken. Unlike many global systems, including decentralised systems such as most cryptocurrency networks, the Internet does not require global consensus to function. The Internet has never been globally consistent, and achieving consistency would be theoretically impossible. There is no authority that mandates changes to the protocol, and as of 2021, the world is currently in the process of migrating the data plane of the Internet from a version originally developed in 1981 to a version originally developed in 1995. As a “best effort” communication medium, the Internet has no mechanisms to ensure quality of service, or indeed any quality of service at all. Nonetheless, I shall argue that all of these characteristics are features, not bugs, in the design of the Internet, and are essential contributing factors to the success of its design.

Every device connected to the Internet speaks Internet Protocol, either the 1981 version or the 1995 version, and has an address (an IP address). The fundamental unit of aggregation in the control plane of the Internet is the autonomous system, or AS. Autonomous systems communicate with each other via a protocol called Border Gateway Protocol, or BGP, the first version of which was developed in 1989. The purpose of BGP is to exchange reachability information about autonomous systems [4]. There are over four hundred thousand autonomous systems, and every IP address that is globally addressable can be recognised as part of a block (or prefix) of addresses, which in turn is advertised via BGP by an autonomous system. The operators of autonomous systems determine, as a matter of local policy, which advertisements to accept, which not to accept, and which to prioritise over others. Reconciliation is pairwise: if the operators of an AS receive an advertisement from a neighbouring AS that does not look right, or if their customers have a problem reaching certain addresses advertised by a neighbouring AS, then they can raise the matter with their peers.

There are five regional information registries that manage the allocation of specific IP address prefixes and autonomous system numbers. However, these registries are not really authorities. Their role is convenience: to prevent collisions among the numbers that different parties are using, not to adjudicate the allocation of some scarce resource. Specific numbers do not confer any particular advantage, and address prefixes, while limited, are generally not the factor that limits the ability of an autonomous system to send and receive traffic. Furthermore, registries have no ability to enforce the usage of the numbers in their registries; ultimately, autonomous systems will route traffic as they please, and allow for the creation of arbitrary patterns of connection.

So far, so good. However, there is no mechanism to ensure that all IP addresses are equal, and some are more important than others. In practice, IP addresses are controlled by gatekeepers, and a carrier can assign an arbitrary address to a device connected to a local network, and use network address translation to route global traffic to and from that device without enforcing any particular rules about the addresses on the local network. As a result, IP addresses have mostly become the concern of Internet carriers and service providers rather than end-users. The ability to differentiate customers on the basis of whether they can be reachable for new connections allows carriers to offer their services at different price points, with businesses that maintain servers subsidising ordinary individuals with personal computers and mobile devices. Thus, the first manifestation of the structural flaw of the Internet is exposed: On the basis of reachability alone, the Internet has already been divided into first-class and second-class citizens. Unsurprisingly, the primary motivating factor for this phenomenon is revenue for carriers, not the scarcity of unique addresses. This state of affairs might be seen as an unintended consequence of a primary incentive for carriers to provide service, and in fact, many ordinary users with mobile and broadband connections are indeed assigned globally unique addresses that are purposefully blocked by their carriers from receiving new connections.

For those customers with the good fortune to have Internet devices that are reachable, the fact that they generally want a way for others to know how to reach them introduces another risk of unintended consequences, which we shall explore next.

3 Naming: The trouble with ignoring context

Although the management of AS numbers and IP address prefixes is relatively peaceful, the management of human-meaningful names for Internet hosts is much more contentious. Human-meaningful names are problematic when there is an expectation that everyone respects the same naming convention. Who gets to decide what names we use, and what makes a decision legitimate? A commonly recognised principle called Zooko’s triangle holds that the names we use cannot be universally recognised and human-meaningful without being managed by a common authority [5]. Similar arguments have been made throughout human history, with memorable parables ranging from the Tower of Babel to Humpty Dumpty. Notwithstanding the validity of these longstanding arguments, the Domain Name System, or DNS [6], represents yet another effort to achieve exactly this kind of global agreement and becomes a second manifestation of the structural flaw of the Internet. Perhaps the authors of the original specification in 1983 had not anticipated the number of Internet devices that would eventually require names and how important those names would become.

DNS is hierarchically managed, with a set of globally agreed root servers that delegate authority to a set of top-level domains, which in turn delegate authority to registrars and others who facilitate registration of names by individuals, institutions, businesses, and other organisations, usually for a fee. The names themselves refer to IP addresses and other metadata that are used to access Internet services. A name is said to be fully qualified if it references a specific path of delegation back to the root; an example of a fully qualified name is ‘www.ntt.com.’, which indicates delegation from the root authority to the registrar of ‘com.’ to the registered authority for ‘ntt.com.’. But what organisation gets to call itself ntt.com? Short, pithy, and prestigious names are scarce, and in practice they demand a hefty premium in the global marketplace for names, if they are traded at all.

Inexorably, the globally recognised allocation of names to registrants introduces a power dynamic and raises the question of fairness. After all, why are some parties privileged to the exclusion of others? There are various initiatives, such as Namecoin [7], that seek to reject the authority of DNS entirely and replace it with a transparent marketplace, but such attempts ultimately introduce more problems than they solve. Can any system that is first-come, first-served can ever be fair? Should the best name always go to the highest bidder? It would seem that both the paternalistic hand of a trusted authority and the invisible hand of the marketplace fall short of furnishing a solution that works for everyone, precisely because any solution that demands global consensus from a process must also assume global and durable acceptance of the outcome of that process.

4 Trust: But without verification

Since Internet routing is fundamentally decentralised, how can one be sure that the other party to a conversation, as identified by its IP address or domain name, is authentic? This is a question of security, and we know that we can use cryptography not only to protect our conversations from eavesdropping but also to verify the authenticity of a conversation endpoint using digital certificates [8]. This is straightforward if we can directly and securely exchange public keys with everyone that we might want to engage in a conversation via the Internet. In practice, however, people use the Internet to engage with many different kinds of actors and seldom exchange certificates directly. When was the last time your bank advised you to verify the fingerprint of its public key? Thus, we have identified a third structural flaw of the Internet: Its design has fundamentally ignored well-established human mechanisms and institutions for creating trust, and a popular shortcut has undermined those mechanisms and institutions.

Most modern operating systems and web browsers come pre-installed with a set of so-called trust anchors that serve as trusted third parties to verify the public keys associated with particular IP addresses or domain names. In principle, users could add or remove trust anchors from this list according to their personal preferences, but almost nobody actually does. Moreover, since Internet services must present their certificates to web browsers and other client software, the question facing the operators of those services is: Which trust anchors are the web browsers and other applications running on end-user devices likely to accept? The operators then seek signatures from the trust anchors that are commonly shipped with end-user software and present those signatures in the form of certificates. Since obtaining, storing, and presenting certificates from certificate authorities carry operational costs (and sometimes economic costs, although the emergence of free alternatives [9] has changed this), the operators are strongly motivated to be parsimonious. Thus, we have an implicit agreement between site operators and software distributors about the correct set of trust anchors to use, and as a result, those trust anchors become powerful gatekeepers.

What happens when a trust anchor fails? Routine breaches of privileged information such as private keys take place from time to time and do not constitute a theoretical question. For example, consider the well-publicised compromise of Comodo in March 2011 [10]. Because certificate authorities are trusted by default by widespread client software, the stakes of a breach are high. The compromise of private keys of certificate authorities enabled the Stuxnet worm, which was discovered in 2010 and had been used in attacks against an Iranian nuclear facility [11], and a subsequent, successful attack on DigiNotar, another certificate authority, allowed the interception of Internet services provided by Google [12]. If vesting so much power in a small number of globally trusted institutions and businesses seems like a dangerous idea, that’s because it is.

5 Reputation: Good intentions and slippery slopes

DNS is a formal system with formal rules and institutionally trusted authorities, and the secure verification of Internet services also carries the weight of trusted institutions. However, just as not all power is institutional, not all structural shortcomings of the Internet can be characterised as concerns about the weaknesses and illegitimacy of trusted authorities. A mafia family might be expected to serve the interests of a local community by providing protection, and whether such protection is legitimate might be a matter of perspective, perhaps even ethically debatable. Unsurprisingly, such behaviour is endemic to the Internet, particularly given its common role as a venue for anti-social and threatening behaviour. The mechanisms that enable such behaviour to emerge and flourish represent a fourth structural flaw of the Internet.

Let’s start with a benign example, which relates to e-mail spam. No one really likes to receive unsolicited messages from dodgy actors, which is a system-level consequence of the common practice of using the same e-mail address across different contexts, and a topic for a separate article. A technical approach to mitigating spam is to require senders to prove that they are the rightful owners of the domain names they use to represent themselves; this approach forms the essence of the Sender Policy Framework, or SPF [13], as well as DomainKeys Identified Mail, or DKIM [14], which was introduced in 2007. Notice that this approach further entrenches the authority of DNS system operators. One might imagine that it would be impossible to compel all of the mail servers on the Internet to stop sending mail without valid SPF or DKIM signatures, and indeed both the specification for SPF and the specification for DKIM advise mail servers not to reject messages ‘solely’ on the basis of an inability to authenticate the sender [13, 14]

. However, one mail server operator, Google, implemented a policy that stretched the limits of this recommendation by routinely classifying mail sent to users of its popular Gmail service that did not include a valid SPF or DKIM header as spam 

[15]. As a result, many mail servers were forced to implement SPF or DKIM because their users wanted to be able to send mail to Gmail users. And so, Google had successfully managed to twist the arms of mail server operators around the world.

A somewhat less benign example involves the creation of blacklists of IP addresses on the basis of their reputation, ostensibly for the purpose of mitigating attacks on servers. The idea of using routing information, such as IP address, as a convenient way to judge the reputation of a sender is not new [16]. Consider the case of SORBS, an Australian organisation that aims to reduce the preponderance of spam through the publication of a list of IP addresses associated with spam or abuse. Although this site does not directly engage in filtering traffic, many e-mail server operators configure their servers to consult this list as part of their routine operation and flag messages as spam or reject them outright if they are received from a mail server whose IP address appears on the SORBS list. SORBS generally requires the operators of servers with blacklisted IP addresses to explicitly request expungment, subject to the discretion of SORBS staff [17], and for years, such operators were required to give money to charity as well [18]. A related example involves the practice of using similar blacklists to restrict access to websites. For example, the popular web infrastructure provider Cloudflare offers its customers an option to block IP addresses associated with public VPNs and anonymising proxy networks such as Tor [19], and Akamai features a security module called “Kona Site Defender” that includes a service called “Client Reputation” that categorises IP addresses as being malicious or not on the basis of their interaction with Akamai services around the world [20], which its customers often use to block access to their websites. A direct system-level consequence of this pattern of blacklisting is that Internet users who share IP addresses with each other, and especially Internet users who rely upon anonymity networks for their privacy, are denied access to much of the Internet. A second-order system-level consequence is that Internet users are forced to relinquish their privacy consider their reputation, the public face of whatever they do with the Internet, and their relationship with their Internet carrier, as a prerequisite to using the Internet.

In all such cases, extortionate behaviour is enabled by the separation of the function of maintaining the blacklist from the choice to use the blacklist, the perceived benefit of the blacklist on the part of its users, and the lack of accountability with respect to policy on the part of server operators that use the blacklist to the users of their own services.

6 Conclusion: Protecting our infrastructure from ourselves

All of our distributed systems, however good they are, are susceptible to abuse of the power that has been vested in the systems themselves. In particular, systems based on the Internet reflect, at minimum, the unfairness and power dynamics that are intrinsic to the Internet itself.

At the same time, it is worth considering the original design of the Internet as a reminder that it is possible to build a system that avoids vesting too much power in in its operators. This concept is generally described as the end-to-end principle and achieved widespread appreciation well before the protocols described earlier in this article were developed [21]. We find that the Internet is not majoritarian per se, but weaknesses in its design have allowed certain actors to develop and exercise power over others. But, how do we undo the dangerous power structures that have become part of the de facto architecture of the Internet? Moreover, how do we ensure that the systems we design in the future do not become centralised in the first place?

We note that global consensus and, more generally, the interest in establishing an authoritative global view of something, is exactly where we see the greatest risk of centralisation in practice. Part of the interest in an authoritative global view derives from the convenience of not having to worry about whether we share context with our neighbours, although it is perhaps when there is a perception of risk that the tendency to embrace an authoritative global view is most pronounced. Under such circumstances, we collectively retreat to the safety of big, dominant actors that can exercise their control and, in so doing, help us avoid the responsibility of decision-making and the potential blame that might follow [22]. Our observation should be a warning sign for those who seek to build systems that rely upon global consensus. Do systems that require global consensus grind to a halt when anyone disagrees, or do they implicitly ignore the wishes of the minority? Put another way, who or what is the arbiter, and what is to prevent that arbiter from being compromised by those who disagree from us?

With respect to the Internet, there is no easy solution, although we can imagine its characteristics. Identity must be non-hierarchical and separate from routing, so that reputations are not earned and lost based upon how users are connected to the network. Internet routers must not be able to infer who is speaking to whom, or what kind of information is being communicated, from the traffic they carry. (Perhaps this implies the need for a second, oblivious layer of routing between the existing network layer and the end-to-end protocols that depend upon it [23].) The names used at the technical level must be self-certifying, to prevent the accumulation and exercise of illegitimate authority by third-party actors, and should be opaque, to reduce the likelihood that people would confuse them with meaningful names or concepts. (Perhaps a system should encourage users to do what humans have always done, and assign their own names for the parties with whom they interact via the Internet [24].) Users of the Internet should assume different identities for their different relationships, to mitigate the risk of both spam and blacklisting at the same time. We should eliminate the assumption that metadata exposed to the system will not be misused, as well as the assumption of paternalistic goodwill on the part of powerful actors and service providers. Above all, it would behoove us to consider the fundamental trade-off intrinsic to all global systems and pursue a principle of parsimony: How can we provide the system with enough power to do what we need it to do, without providing it with enough power to profit at our expense?

Acknowledgements

We acknowledge the continued support of Professor Tomaso Aste, the Centre for Blockchain Technologies at University College London, and the Systemic Risk Centre at the London School of Economics.

References