The Self-Sovereign Identity community has been saying ‘any day now’ for 10 years. Why has it stalled? Because many in the industry are greedy cargo cultists. They aren’t seeking answers to the why and how of decentralization and user sovereignty but are merely arranging cryptography on the sands of e-commercehoping for a legitimate use case to fly over and drop money upon them.
Absolute privacy on the internet is impossible, today. Why? Because nobody with the power to create it, wants it. The primary revenue model for the internet is based on surveillance making most investors aligned against privacy. The worry over privacy being used to shield criminals and fraudsters from regulators and law enforcement makes many politicians and business people aligned against it. Add the fact that current technology simply cannot support absolute privacy and it is easy to understand why most engineers are also aligned against it. However, absolute privacy on the internet is not only possible but the cryptographic techniques for enforcing it creates many new business models with greater market opportunity, reduces global fraud, automates enforcement of regulations and laws, and secures distributed systems better than anything we have ever seen.
The goal of this article is to walk you through the recent innovations in both cryptography and decentralized systems design to show you a new way of thinking about information flow, security, and privacy in all digital systems. I call it “zero architecture” because systems shift from operating directly on data — personal or otherwise — to using zero-trust authentication and authorization to operate on zero-knowledge proofs derived from verifiably authentic data to achieve operational decision making with zero personally identifiable information (PII) being collected and stored. It affects everything from email and messaging to healthcare to fintech and the entire global e-commerce infrastructure. Companies focused on selling zero architecture solutions have a total addressable market that includes every computer system that creates and processes data.
This is the latest in a multi-year series on the nature of decentralization and the future of privacy and trust. The best prepared readers of this article are people already familiar with my previous writing outlining the principles of user sovereignty, a unified theory of decentralization, and the authentic data economy. For completeness, there is also a tangentially related article criticizing the current and problematic efforts at the W3C to build distributed identity systems that they believe will bring privacy, but won’t. Privacy is intimately linked to user sovereignty which is, in turn, a property of decentralization and I believe their efforts were doomed from the beginning because — despite popular belief to the contrary — the web was never decentralized.
Before we can talk about the future, we must first understand the past and present. We need a mental model that describes the nature of existing digital systems and their privacy characteristics before we can use that model to understand how everything is changing. So, hang on, here we go.
In broad terms, every digital system that exists today falls into one of two categories and the difference between the two is how regulation affects their design. I used to call these systems “class 1” and “class 2” systems but those are terrible names. I now call them “transparent” systems and “protected” systems respectively. The short explanation is that transparent systems are governed by regulation that requires the disclosure of user data, in the interest of transparency, and protected systems are governed by regulation that requires the protection of user data, in the interest of privacy. Examples of transparent systems are payment processing networks, stock trading platforms, and international remittance services. Examples of protected systems are social networks, email service providers, and everything else that isn’t a transparent system. If I had to guess, I would estimate that the world is roughly 20% transparent systems and 80% protected systems in terms of the amount of user data they contain.
The way I like to remember “transparent” systems is by picturing an underwater observation room at an aquarium with a massive glass wall between me and the water. Inside the aquarium I see groups of people in nice business attire swimming around like schools of fish, darting this way and that and gobbling up bits of pizza floating in the water. I picture the worst boss I ever had as that giant shark with gnarly teeth that cruises by and glares at me. I’m looking through the transparent window and I can see everything about everybody inside. Regulation requires transparency.
The way I like to remember “protected” systems is by picturing a prison transport bus with metal grating welded crudely over the windows and absurdly large and comical padlocks over the doors. I can see that the bus is full of people but I can’t quite identify any one person. The bus rattles along the road next to me, backfires, gears grind and black smoke pours out as the old engine struggles to carry all of the weight of the people inside. Regulation requires protection.
Obviously, transparent systems provide very little, if any, privacy for their users. The primary problem is that databases in these systems become high value targets. Security breaches at large retailers have led to millions of identities being stolen. Similar breaches at cryptocurrency exchanges and decentralized finance platforms have resulted in the theft of millions of dollars worth of cryptocurrency. Protecting user data in transparent systems is exceedingly difficult because as transactions flow through them, the user’s data flows along as well. Regulation requires that the personal data be checked and verified at each step of the transaction execution. This is called the “travel rule”. Along with all of the rampant fraud and identity theft, a new and ever increasing side effect of transparent systems is that they enable coordinated action across many companies in the financial system to deny individuals access to financial services. If you say the wrong thing on social media these days, you may find that you have lost your bank account, PayPal account, Venmo account, Stripe account, your credit cards no longer work and you cannot receive or process payments for your business.
Protected systems aren’t much better, they too suffer from being high value targets with the associated security problems but they also have the added regulatory burden of complying with privacy regulations such as the California Consumer Privacy Act (CCPA), the General Data Protection Regulation (GDPR). Despite the intent of the regulation, the privacy of users is not really maintained because the systems themselves still gather data and store user data. The regulation really only limits how the user data is used to benefit the companies that run the systems. It inverts the power dynamic between users and the system forcing users to abrogate their privacy in exchange for access. Consent agreements and privacy policies oftentimes allow operators of protected systems to sell user data and/or form partnerships with outside companies that they share data with for profit reasons. Even though the primary interest of regulation is user privacy, in reality users have no privacy which is why I use the name “protected” instead.
The mental model I want you to take away from this is one of fish-like humans swimming around in schools in transparent systems and faceless dark figures sitting solemnly inside dilapidated prison buses of protected systems. The people in these mental images represent their private data; data about where they live, what they like, who they like, their political affiliation, sexual orientation, medical conditions, buying habits, and on and on and on. It doesn’t matter whether the system is transparent or protected, both kinds gather and store — and in most cases, share — our most private information. Therefore none of us have any real privacy on the internet as it is.
The primary fault in both kinds of systems is the gathering and storing of user data in the first place. If our very first act as users is to turn over our personal information then how do we ever expect to maintain any privacy or freedom of action as private citizens? We shouldn’t and we aren’t and no amount of privacy regulation will ever fix it. The only option we have is to use cryptography to enforce our privacy — with or without regulation — and then build new infrastructure and systems that work with cryptographic abstractions of our data instead of working with our data directly.
Mediated access to the primary subject, or an object that stands in place of the primary subject; a proxy.
There are a number of recent innovations in the application of cryptography that opens up a whole new approach to systems design I call — as mention in the introduction — “zero architecture”. Remember that zero architecture is about using zero-trust authentication and authorization along with zero-knowledge proofs to build systems that require zero personal information to operate. It is the abstractions that make possible the zero-trust authentication and zero-knowledge proof presentation. When combined in the correct way they prevent any disclosure of private data while ensuring that the operator of a service cannot ever correlate, or track, the clients of the service. In short, it is possible to end surveillance capitalism altogether.
This first part of zero architecture is focused entirely on moving away from any authentication and authorization systems that uniquely identify the client of a service while improving security and utility to systems designers. Traditionally the most common authentication method is the humble username and password that is combined with what is called an access control list for authorization. Authentication is how a client identifies themself to a system so that the system can authorize them to use the system in ways recorded in the system’s access control list.
Obviously this presents many problems for any systems designer wishing to follow the principles of user sovereignty and preserve the absolute privacy of the clients. The new cryptographic abstractions that make zero-trust authentication and authorization possible are based on what are called capability tokens. A good physical analog to a capability token is the key to your home. Anybody can possess the key and the key always works regardless of who uses it. That at least takes care of the privacy aspect of accessing your home but what happens if somebody steals your keys? Well, the solution for that is what crypto-systems designers call “revocation”. In the house key example, revocation can be done by changing all of the locks on your home so that the stolen key can no longer be used to access your home.
A recently proposed work item at the Applied Cryptography Working Group of the Decentralized Identity Foundation is open sourcing a new authentication and authorization approach called Oberon. Oberon makes it possible for a service provider to issue capability tokens to clients in such a way that the service provider never sees the value of the capability token. This prevents the service provider from ever impersonating the client. Oberon also relies upon zero-knowledge proof presentation of the capability token so that token is never transmitted and never revealed. Instead of the client sending the token to the service — as is done with API tokens today — the client sends a zero-knowledge proof proving to the service that the client has a valid capability token issued by the service provider; this is called proof-of-knowledge.
The end result is that a client gets a capability token from a service provider securely, then they never reveal that token to anyone and they can access a service in such a way that the service provider only knows an authorized client is accessing their service but not which specific client it is. The service provider’s servers only require the public keys from the issuer to verify the validity of the Oberon proofs-of-knowledge and so compromise of a verifier does not allow the attacker to issue valid capability tokens. Furthermore, the issuance of capability tokens can be done in a decentralized way with multiple machines required to sign a token for it to be valid. This hardens the token issuance against compromise of a single computer leading to token mis-issuance.
I have covered zero knowledge proofs in other articles so I won’t go in too much detail about them here other than to say that they are the primary kind of data that zero architecture systems operate on. To make this possible, zero-knowledge proofs have to be created from authentic data. Authentic data — I describe in detail in The Authentic Data Economy — is data that comes with cryptographic a proof telling where the data came from, to whom it was given, and that the data has not been modified. The proofs associated with authentic data forms the backbone of zero architecture systems.
To better illustrate this, I will give you an example. Today, when somebody goes to a broker to apply for a loan, part of the process involves the broker getting permission to get the credit history and score of the person applying for the loan. The broker goes directly to the credit ratings agency to get the report because that is the only way they can trust the veracity of the data. Now the broker has the loan applicant’s entire credit history when they only need a fraction of the data to process the loan application.
A zero architecture approach to this is for the loan applicant to first get their credit history from a credit ratings agency as authentic data. This means that the credit ratings agency has digitally signed the data and has also published a non-revocation registry that they maintain to show the data is still accurate to the best of the credit rating agency’s knowledge. The digital signing keys and non-revocation registry from the credit ratings agency are tracked in a digital provenance long that they publish on their servers and anchor in an external proof-of-existence system such as a public blockchain (e.g. Bitcoin). The identity of the credit rating agency is assured through a process called know-your-business (KYB) where a regulated organization verifies the identity of the credit rating agency and then issues authentic data containing their verified details (e.g. name, address, etc) along with the “root keys” the organization uses to digitally sign data. The KYB proof is also stored in the provenance log that tracks the agency’s digital signing keys, linking all digital signatures over credit reports to their verified KYB identity. All of this means that anybody can independently verify the identity of the signer on a credit report as well as have cryptographic proof that the signer still considers the data valid because it is still included in their published non-revocation registry.
For the careful readers among you, the little detail about anchoring in a proof-of-existence system is critical to the trust of authentic data. But wait, isn’t Bitcoin relatively slow? How can it handle so many pieces of authentic data being anchored when its transaction rate is so low? There’s another cryptographic abstraction at work called a cryptographic accumulator that allows for any number of pieces of data to be added to the accumulator in such a way that is possible to prove with cryptography that each piece is part of the final accumulator value. When using what are called pairing based accumulators, the final accumulator value is just 32 bytes in size and can easily fit into a Bitcoin “null data transaction”. Provenance logs contain the proofs that they are in the accumulator and which Bitcoin transaction the accumulator is in. This establishes exactly when the provenance log existed and what state it was in when it was anchored. If the provenance log manages digital signing keys and KYB proofs, then we have cryptographic proof that those keys and identity existed at the time of the Bitcoin transaction. This is all that is needed for a fully decentralized method for verifying authentic data.
Back to the zero-architecture loan application process. The loan applicant has their credit history as authentic data. Now, when they apply for the loan, instead of the broker seeing their credit history, the applicant instead gives them zero-knowledge proofs for all of the information the broker needs to decide on funding the loan. For instance a broker wants to know the range in which the applicant’s credit score lands, such as between 500 and 600 or over 800. They don’t need to know the precise score, just the range to the 100’s. So the applicant shares a zero-knowledge “range proof” that their score is in a specific range. But wait, how does the broker verify that the range proof can be trusted? The applicant also supplies the broker with zero-knowledge proofs that the underlying data is signed by a signature created by the credit rating agency without disclosing the actual digital signature or data. This is another form of the proof-of-knowledge discussed earlier in the section on zero-trust authentication and authorization.
In the end, the loan applicant won’t ever share any of their personal authentic data but instead only shares zero-knowledge proofs necessary for the broker to decide the terms of the loan offer. Another key detail about the interaction between the applicant and the broker is itself a series of digitally signed messages that are recorded in a provenance log of which both the applicant and the broker receive a copy. This creates a cryptographically secure record — authentic data again — of the entire transaction, including proof presentations, policy code the broker executed and the zero-knowledge proofs the code operated on when coming to the final offer. The cryptographic record is stored for future auditing and potential litigation or regulatory action.
The previously discussed abstractions aren’t the only ones required to maintain absolute privacy online but they are the ones that have had major breakthroughs recently that completed the entire set of abstractions necessary to make zero architecture possible. One of the primary weaknesses of zero-knowledge proofs is that they only preserve privacy if the service provider cannot link, or correlate, multiple interactions with the same client over time leading to a possible false sense of security. To put it another way, zero-knowledge proofs alone suffer from what I call the “20 questions problem”. The game 20 questions is where somebody thinks of something, anything, on earth and then the other person asks only yes or no questions attempting to narrow down the answer to the correct thing. Typically it takes 20 or fewer questions to get the correct answer for anything on earth.
Zero-knowledge proofs without anti-linking/anti-correlation works the same way. If a service provider asks a client if they are 18 or older on one interaction and then they ask them if something else in a second interaction and something else in a third and so on, you can see how eventually the service provider will have enough data — even if it is range data or set data — to narrow down the identity of the client to a single person. All because they can link the zero-knowledge proofs from multiple interactions together as applying to the same client.
The Oberon protocol described in the zero-trust authentication and authorization section above prevents a service provider from linking multiple interactions with the same client and therefore they cannot link zero-knowledge proofs together to break the client’s privacy. This is why the combination of all of these abstractions is necessary to preserve absolute privacy. We need authentic data, we need zero-knowledge proofs, we need the Oberon capabilities tokens and we need cryptography to achieve absolute privacy while preserving e-commerce and complying with regulations.
For the more technical readers thinking about how IP addresses and routing information break a client’s privacy, you are correct, however that problem is already solved and yes, everybody should always use the Tor network.
Creating transparent systems that allow for law enforcement and regulatory action over cryptographic abstractions instead of raw data is challenging. It requires a few new pieces to add to our mental model. Instead of a user submitting their information to a transparent system, they instead enlist the services of a trusted and regulated 3rd party — such as a bank — to verify their data and issue it back as authentic data. The user then creates zero-knowledge proofs that prove the underlying authentic data was confirmed by a regulated and known organization, the issued authentic data has not been revoked and the issued authentic data contains the minimum amount of data required to meet the transparency regulations for the system. Along with the zero-knowledge proofs they also provide a piece of verifiably encrypted data to the transparent system. Verifiable encryption is a method for encrypting data that also proves that a particular recipient can decrypt it and it matches an expected format type like a key or identifier. In this case we want to prove to the transparent system that the data can be decrypted by the trusted 3rd party that verified and issued the user their authentic data.
I know this is a lot to take in all at once but let me show you how it works with an example: a simple e-commerce purchase. A customer goes to merchant.com and wants to buy a t-shirt without revealing who they are; for now we’ll forget about payment and shipping, we’re just focusing on dealing with know-your-customer (KYC). Typically, as part of a purchase, the customer gives the merchant their name and address to associate with the purchase transaction. The merchant and/or the payments processor will gather and store this data because e-commerce is a transparent system. To achieve absolute privacy, the customer can instead receive what is called a KYC credential from a KYC service provider. The KYC credential is a piece of authentic data containing verified attributes, such as the customer’s name, address, etc. Instead of providing personal data to the merchant, the customer sends zero-knowledge proofs about their data.
The merchant never sees the customer’s data so why should they trust the zero-knowledge proofs? Because of cryptography, that’s why. For a merchant to trust the zero-knowledge proofs, they must be certain that the zero-knowledge proofs are based on authentic data. That can be proven using proof of knowledge of a digital signature over the authentic data made by the KYC vendor as proof of their verification. Since the KYC vendor is known to the merchant, the merchant can verify the proofs and trust the conclusion. Along with the proofs the customer gives the merchant some data that is verifiably encrypted for the KYC vendor. If there is ever a law enforcement or regulatory action where a judge issues a warrant for the unmasking of the customers, the merchant can provide the courts with the proofs and verifiably encrypted data they received from the customer. The courts can then go to the KYC vendor, have them decrypt the data and then reveal the personal information of the customer. This is how we maintain absolute privacy while preserving — and even automating — law enforcement and regulation. For my American readers, this is true 4th amendment privacy and if this is deployed widely, it will end surveillance capitalism.
For those of you that are curious, to make the shipping and payment parts of e-commerce work, the customer just needs a bank account and an account with a private shipper such as FedEx. With the cryptographic abstraction approach, the customer can give the merchant what is called a “cryptographic capability” allowing them to withdraw funds the specified amount and to also get a “blinded” shipping label from FedEx. The user never discloses anything other than their bank and shipper. They do not have to reveal their account numbers nor address. The shipping label is “blinded” because it contains what appears to the merchant to be random data but to FedEx it tells them where to send the package.
What we’re talking about is new payment “rails” and new shipping infrastructure that is no longer transparent but traceable. The use of cryptographic abstractions that link authentic data and zero-knowledge proofs to regulated, and known organizations such as the KYC vendor, bank, and shipper preserves the customer’s absolute privacy but creates a cryptographic paper trail that allows for the origin of all of the data to be traced and the customer’s information to be revealed if needed to enforce the law.
Transparency is no longer sufficient reason to build systems that gather and process personal information directly. The infrastructure for making the authentic data economy work also enables the transition from transparent to traceable through the use of cryptographic abstractions where we used to use the underlying data.
Protected systems are now my favorite simply because if they are only profitable because they gather and monetize private data, we now have the tools to force them out of business. We don’t need regulation. We only need adoption and there is significant motivation for adopting this new approach beyond the privacy argument. The traceable nature of the cryptographic abstractions drastically reduces fraud, it automates compliance with privacy and other consumer protection regulation, and it reduces the high-value-target security risk. Those three things alone represent such a significant savings in overhead and risk that any company that adopts this approach will have an immediate advantage over their competitors. However, if the system’s only way of turning a profit is to gather and monetize personal data, then it will go the way of the dinosaur and start making oil soon.
Without any transparency requirements, there is simply no reason for any online services to collect any information from a customer for any reason. But what about marketing, customer research, and loyalty programs? The power dynamic is inverted and the customers are now entirely in charge. They can appear as a first time customer, every time. They can voluntarily give the merchant a “correlation value” that they will use each time they visit so the merchant knows they are a returning customer. There is even a technique where the merchant can issue a customer “token” and when the customer returns they prove they have a valid token without revealing the token. This is telling the merchant that they are a return customer without revealing which customer they are.
Two good friends of mine Joyce and Doc Searls are working on building infrastructure for doing what they call “intent casting”. If I understand it correctly, when used in a retail marketing sense, it allows customers to declare what they are looking to buy and then participate in a privacy preserving market making function that matches up offers from merchants with the customers’ intent to buy. That perfectly fits with this new model of using 3rd party service providers such as banks, shippers, KYC service providers, and now market makers to preserve our privacy when interacting with digital systems. I think they have invented a whole new industry of market making service providers.
We live in an exciting time. Our understanding of trust and privacy combined with recent advances in cryptography is now making it possible for the construction of fully decentralized and user sovereign systems without sacrificing the efficiencies and convenience of automated online systems. In fact, zero architecture is an upgrade over the current ad hoc and messy systems. Not only do we move towards achieving absolute privacy and preserving our liberty as private citizens but the systems we build actively prevent fraud, automate regulation compliance and provide efficient means for law enforcement to act. In a world where the internet is looking more and more like a force for destruction in society, there is hope that the zero architecture can turn things around. There is little standing in the way as the zero architecture building blocks are being open sourced right now at the Decentralized Identity Foundation and the plan is to create open standards for the protocols and file formats at the IETF so that everybody can use it in their software and be compatible with everything else.