July 30, 2020

A Unified Theory of Decentralization

All networks begin as only one thing; one neuron, one cell, one chip, one computer, or one user. One entity alone is not a network, but it is the starting point for understanding the unified theory of decentralization. One entity is fully sovereign, it has no connections to anything else that might influence or control it. One entity in isolation is empowered to act however it wants to strive for whatever results it seeks.

When one entity connects to another however, then the behavior of one affects the other. Some form of agreement must be struck between them that dictates what is allowed and what isn’t. In computer networks these agreements go by lots of different names: access control lists, community standards, etc. It is these operating agreements that users follow, or submit to, that have a profound effect on the value, utility, and autonomy of the overall system. To better understand decentralization you must first think of operating agreements as either coming from the top down or the bottom up. Distributed systems originally implied a bottom-up system but, with mobile app networks, that’s no longer true.

The term “network” usually conjures up the mental picture of many individual entities connected by some communication mechanism and working together to accomplish a given task. The term “distributed system” describes networks where the primary functions of the network are performed by the nodes in the network and not a set of central servers; email is a distributed system; Spotify is not.

VENN DIAGRAM (Distributed > Decentralized)

There is a distinct difference between a distributed system and a decentralized system. All decentralized systems are distributed systems, but not all distributed systems are decentralized.

It is common to hear people say that “decentralized” describes what a distributed system is not instead of what it is. However, when using the word “decentralized” they typically mean something more than just the organization of the network. To them it implies a partitioning of the services, governance, and overall power structure to prevent any one entity, or user, from controlling others in the system. It then follows that a fully decentralized system — among many other things — atomizes the power structure to the smallest possible unit and distributes it out to the edges where it is under direct user control. A single user or node is sovereign in this kind of power structure. Two users connected are still sovereign if neither user can dictate rules upon the other; this is decentralized. The two users lose their “user sovereignty” if they must submit to “community guidelines” that prevent them from saying certain words or sharing certain ideas; this is not decentralized.

For the last few years, the term “self-sovereign” has been used to describe a system that is fully decentralized. Since sovereignty is a consequence of the underlying system structure and not of the user themselves, I prefer the term “user sovereignty” as it more accurately describes a system’s design and the principles it strives to uphold and how those shape the bottom-up operating agreement for the network.

This shift in thinking suggests that the term “decentralized,” in the realm of distributed systems, is defined as the following:

Decentralization is the direction in which user sovereignty increases.

It applies not only to the governance but also to the structure and function of the nodes and network. Moving towards decentralization increases user sovereignty and is the harder thing to accomplish since it goes against all authoritarian impulses. Censorship is only possible when a system sacrifices user sovereignty to build a permissioned publishing platform like Twitter and Facebook. Enforcing “community guidelines” that dictate what content can be published is only possible by greatly reducing user sovereignty. Fully decentralized systems do nothing to control what the users can publish but instead give users the ability to filter what and who they are exposed to; that is decentralized; that increases user sovereignty.

All distributed systems experience significant “centralization pressure” because centralization is profitable. Just a cursory glance at Facebook, Twitter, and Github, makes it seem like the more centralized a system is, the greater the potential for profit. And so it is! There is a mountain of money to be made by turning users into captives and farming them like domesticated animals; gathering their data to sell and carving up their attention to sell to the highest bidder. The cost to the user is their sovereignty with side effects that often reach into the real world. If you say the wrong thing on Twitter, you will be banned and you may also lose your job and/or bank account as well. In China, the social credit system is fully centralized with zero user sovereignty. Its goal is to keep the Chinese people in virtual jail cells that are self-enforced out of fear of real world, social consequences.

Another source of centralization pressure is convenience. This is the theory behind walled gardens such as America Online (AOL) in the early days of the internet and Facebook and Github today. By controlling all aspects of the user experience and keeping them within the walled garden, the user experience is streamlined and optimized to lower barriers of entry and to reach the widest possible audience. This seems like a good thing until you realize the costs to the users in terms of sovereignty and freedom.

The good news is that centralization isn’t the only source of convenience. As you will see in the discussion farther down, decentralized solutions for the problems of distributed systems are more robust and designed to operate in the worst of conditions. This means that many pieces of a fully decentralized and user sovereign system are automatic from the users’ perspective. This greatly increases convenience. In Konstantin Ryabitsev’s article titled “Patches carved into developer sigchains” he describes a fictitious developer collaboration tool that uses a trivial discovery and introduction solution as well as an automatic coherence mechanism that synchronizes files across developers’ machines. This tool could easily be implemented using centralized services such as Github but what Konstantin envisions is a server-less and decentralized system that maintains the same level of convenience.

Decentralization will likely create magically convenient tools due to their automatic and anti-fragile nature.

This gives rise to a theory that as tools become more decentralized and user sovereign the convenience level increases and software tools are so automatic and work anywhere, under any conditions, that they are almost magical. Is this Taleb’s antifragile theory at work in decentralized systems? Maybe. Regardless, I’m almost certain that a tool that streamlines discovery and automates connecting, reconnecting, and coherence to stay in sync will be a very useful and convenient tool indeed.

User sovereignty matters. It matters as much as our right to speak freely and to gather and peaceably protest our government. In a decentralized system, users are free to join and leave at will and take their data with them in a portable format. They have absolute control over what data is shared with others and the system as a whole as well as the ability to completely delete their data at any time. This includes the meta data such as with whom and when they connected in the system. To give into centralization pressure is lazy and immoral. Decentralization requires conviction and virtue. To centralize shows disregard for the users a system serves. All systems architects should start with the goal of maximum user sovereignty first, then make smart and conscientious compromises to decentralization only when absolutely necessary and be fully transparent about the cost to users’ sovereignty.

The Nine Problems of Distributed Systems

Up until now, “decentralized” was an adjective applied to many distributed systems that aren’t fully decentralized (e.g., Git, Secure Scuttlebutt, Bitcoin, etc.) To better understand the difference between distributed and decentralized, we must break down distributed systems into the functional pieces that all distributed systems must possess to function. Each functional piece solves a particular problem. In all, there are nine different fundamental problems and each has at least one novel solution, but most have many.

To be fully decentralized is to maximize user sovereignty in a system’s solutions for the nine problems. All solutions fall somewhere along the spectrum between fully centralized to fully decentralized and for a distributed system to be called “decentralized,” it must solve all of them using decentralized solutions. Choosing a centralized solution for just one of the nine problems causes a loss of user sovereignty and moves the distributed system away from being fully decentralized. Similarly, if a solution to any of the nine problems is left out of the system design, this also reduces the system’s decentralization; Bitcoin in particular suffers from this kind of reduction in user sovereignty. It only takes one centralized solution — or in Bitcoin’s case, non-solution — to open up an opportunity for a corporation or government to “capture” the community of users for financial and/or strategic control reasons. In worst case scenarios, “corporate capture” can present an existential threat to the independence of the system and sovereignty of the users by tying them to an all-encompassing centralized platform that serves as a gatekeeper for user access.

The nine problems of distributed systems are:

Discovery
Introduction
Coherence
Public Services
Trust
Privacy
Coordination
Membership
Persistent State

What follows is a brief discussion of each one. This document does not cover the different solutions to problems, centralized or decentralized. The purpose is to present an overview of the problem so that we can build a new way of looking at distributed systems settling on a new unified theory of decentralization.

Discovery

All distributed systems start with just one node and one user. Until there are two users that are connected, we can’t start to call it a distributed system. When new users wish to join a distributed system by connecting to other users, they have to solve the discovery problem. Finding the IP address or the domain name or the user name of another user to connect to has lots of solutions. The most common and easiest solution is to use a centralized server where users get the information needed to initiate connections to other users. This is the model that Twitter and Facebook and nearly all social platforms use. Oddly enough, this is also the system that Git users use via Github and Secure Scuttlebutt users use via public “pub” servers. Stranger still is the fact that Bitcoin uses hard-coded IP addresses to Bitcoin seed nodes that act like centralized servers for discovery purposes. Building a fully decentralized discovery solution is an ongoing research topic. There are a few solutions but they are difficult to use and some have privacy issues. For instance DNS queries are public and not encrypted.

Introduction

Once users have connected on the network level, they need to exchange (cryptographic) credentials with each other to establish their identities. This is the introduction problem. Is the introduction pseudonymous or public? If the credentials exchanged are tied to actual people or organizations, how are those credentials verified? If the credentials are pseudonymous, how will users be identified in subsequent connections? The solution for introduction in a distributed system has many critical consequences that affect the solutions for other problems like trust, privacy, coordination, and membership. The introduction problem may be the hardest of all of the problems simply based on the observation that most distributed systems don’t provide a solution and rely on external, out-of-band services for introduction.

Coherence

After two users have discovered each other and are introduced, the connection between them is closed eventually, and one or both users will go offline. Very few end users stay online all of the time. Of those few who do, only a tiny fraction keep a static address or other stable means for connecting to them again. The coherence problem focuses on how users reconnect with each other after they go offline and wish to rejoin the network again.

The world today is made up of frequently disconnected and mobile users who move around the Internet topologically. Their IP addresses change often as well as their fire-walled status. It is common for a user to be behind a firewall at home and work. But while commuting, the user might be using a non-fire-walled IPv6 connection via a mobile device. Solutions for the coherence problem must accommodate this reality and keep users connected despite the constant churn and chaos of their network status.

Public Services

The reason users join a distributed system is to take advantage of the public services provided to them and to have access to the other users. A public service is presented the same way to every user. Whether it is creating or consuming content, all distributed systems exist to provide public services to users. Facebook exists to communicate photos and messages between friends and family. Providing these in a decentralized way is a very difficult problem to solve while protecting user sovereignty. Existing solutions have various trade-offs with efficiency and privacy. One such solution is query flooding, which was common in early p2p file sharing systems. It doesn’t scale well but does a good job of preserving a user’s privacy. Later designs routed queries and began trading user privacy for efficiency.

Trust

Trust in systems relies upon the solution for introduction and, possibly, on the public services presentation if authentication is done using a distributed identity solution. In short, the trust problem comes down to being certain of whom you are talking (i.e. authentication) and the data you are receiving is both private and unmodified (i.e. confidentiality and authenticity). Combining those two creates trust. Trust in an interaction between users is a function of how well you trust the other user and the risk inherent in the transaction. This perfectly models how trust is handled in human-scale social networks and is intuitive for even novice users.

To solve the trust problem in a decentralized way, the current state of the art is to use a hybrid top-down/web-of-trust model where relationships are pairwise and distributed but utilize verifiable credentials from trusted institutions to give trust to the authentication portion. A new and exciting development is the use of public proofs-of-work such as a user’s contributions to an open source software project as the basis for trust. This opens up the possibility of anonymous users to use their consistent and high quality contributions to well known and trusted projects as the trust anchor for their authentication. This is the same as saying, “you don’t need to know who I am, however, I can prove to you that I am the same user that has been the maintainer of a notable part of the Linux kernel project for years.” Authentication by reputation. This maps nicely to the familiar social trust we all rely upon in everyday life.

Another key aspect of any trust solution will be to achieve “trans-contextual value transfer” using verifiable containers. My friend Timothy Ruff recently wrote a series of articles on how verifiable credentials are really verifiable containers, are like shipping containers, and give us a universal way to bridge digital trust domains for the first time ever. This solution leverages the public key infrastructure capabilities of blockchain anchoring of key material and novel cryptographic constructs to create a universal, verifiable, private and trustworthy chain-of-custody for all data transmitted in and across systems. It allows not only the recipient but any intermediary holders of data (i.e. “Travel Rule”) to independently verify that the data came from the issuer, the data has not been changed, and the issuer has not revoked the data the container contains. This is important for creating digital versions of human-scale trust systems like licensing, certifications, affidavits, and notarization.

Privacy

Privacy is probably the easiest problem to solve in a decentralized system. This problem has received the most attention in the last 20 years leading to the Tor Project and I2P and mix nets along with end-to-end encryption and zero-round-trip, perfect-forward secrecy protocols like Noise.

By layering solutions at each level of the OSI stack and using cryptography and zero-knowledge proofs pervasively, it is easy to prevent correlation and tracking methods used to de-anonymize users from their traffic. Any decentralized system that takes privacy seriously must prevent IP packet tracking through the use of onion routing and/or mix nets. Ultimately it must rely on pairwise identifiers to prevent other users from colluding to track and de-anonymize the users they are talking to. Then the whole system must be designed to never store or transmit any personal information and, instead, use zero-knowledge proofs and verifiable claims to implement authentication and authorization based on what a user is, not who they are. Any system that traffics knowledge about a user can ultimately disclose that information and compromise the user’s privacy, threatening their sovereignty.

Coordination

The coordination problem has three parts: communication, collaboration and corroboration. Communication is whether the users of the system conduct all system functions using the main communication channels of the system? In a lot of distributed systems, the answer is often “no” because of the authentication piece of the trust problem. Systems such as Bitcoin require that some communication happens outside of the main transaction and block sharing network.

When two users of Bitcoin wish to transfer bitcoins, the recipient must communicate to the sender the destination bitcoin address. This out-of-band (OOB) communication presents many technical problems for users. The challenges are serious enough that the authors of Bitcoin invented a special binary-to-text encoding system called Base58Check to minimize the opportunity for errors when sending and transcribing bitcoin addresses. There is also the challenge of man-in-the-middle attacks leading to the misdirection of bitcoins by substituting their address for the one sent by the real recipient.

Requiring OOB communication to use a system opens it to significant centralization pressure by ceding control over that part of the communication regime to an outside solution provider. Outside solutions providers usually invest in centralized solutions for automating and streamlining the OOB portions of system communication. This is why Coinbase exists. This is also partially why Github exists. Relying on centralized third parties to enable the full capabilities of a distributed system, such as Bitcoin (Coinbase) and Git (Github), hurts the overall decentralization of the system. It ultimately affects its independence and limits its resistance to attack because these central systems are really in control of user access.

Collaboration is whether the nodes can work together to provide a public service such as search. Whether the service is a search function or packet routing, it is difficult to design a fully decentralized solution for collaboration without affecting privacy. This is an area of active research.

The last part of coordination is corroboration. Corroboration is whether the nodes share data with each other that supports decentralized solutions for other problems. Reputation systems fall under this part of coordination. How a reputation system is designed directly affects trust, privacy, membership, and potentially even coherence and discovery. There has been some research in decentralized corroboration. However, most systems designers find the problem too difficult and instead build centralized solutions.

Membership

Like coordination, the membership problem has many facets to it. If a system is designed for user sovereignty then participation is entirely at the discretion of the user. Fully decentralized systems have no way of preventing an arbitrary user from participating in the system. Therefore, they allow users to create cliques that have isolated, private communication and interactions. No non-member can participate in the clique or even observe that the clique exists.

Membership isn’t just about group formation and protection. It also deals with preventing the correlation of the nodes in a group. The fact that a group of users are associated and communicate is often many times more valuable to an attacker than the information communicated. Fully decentralized systems allow for the formation of these groups without disclosing to any observers who is connecting to whom. Alice must be able to join with Bob and Charlie in such a way that Malory cannot observe the group formation, nor can she enumerate all of the members of the group.

Persistent State

The true value of any distributed system is to maintain some persistent state for the system as a whole. For instance, distributed file systems such as Tahoe-LAFS store a set of files spread out over a number of nodes such that the failure of a subset of nodes does not affect the availability of the data. Bitcoin distributes a copy of the Bitcoin blockchain to all of the nodes in the network and therefore has a fully decentralized persistent state solution. To be fully decentralized, all nodes in a network must be able to reproduce the full persistent state set. When applied to distributed systems other than blockchain, it is possible to create “metastability” where no one node is online all of the time but enough nodes are part of the system that the probability of at least one node being online at any given time approaches certainty. Then if that one node contains all of the persistent state of the system it will appear to the users of the system as if there is a centralized server somewhere that is always online.

Conclusion

Those are the nine problems. So how do they fit into the principles of user sovereignty? For any distributed system the programmers/architects have to decide how they will solve each one. There are a number of solutions for these problems with some being more centralized and power-asymmetric and others that are decentralized and user-sovereign. For a system to be fully user-sovereign it has to have a decentralized solution for every problem.

There is one other thing that is interesting about this model: it predicts economic/business opportunities. When a popular distributed system does not have a solution for one of the problems, it creates an opening for a corporation to capture the users and make money. Companies may follow the principles of user sovereignty and offer paid edge services that won’t threaten the overall system or violate the users’ trust. However, companies may also build centralized walled gardens and threaten the autonomy of the users and the entire ecosystem that depends on the system.

If you are concerned about the latter, design for the former.

‍