Another Side of Web3「Privacy」
0. Quick Takes
In the era of WEB 3.0, users own the data on the chain, and the data is open, transparent and traceable. We seem to have found the utopia of freedom and equality, on the other side, how to protect the privacy will be another holy grail.
Here is the mind map of this article:
We have no privacy in Web2
In Chapter 10 of the Bitcoin white paper, Satoshi devotes an entire section to describing the Bitcoin network’s privacy model. In a traditional banking model, participants and trusted third parties have somewhat restricted access to information, which achieves some forms of privacy. However, on a blockchain network, transactions must be guaranteed to be public, so Bitcoin privacy is maintained by the anonymity of the public key. There is usually no way to associate a randomly generated public key with a person (although we now have tools such as whale analysis to deduce this information).
In Satoshi’s example, the privacy design of the blockchain evolved from banks to the Bitcoin network. We can extrapolate from this example to the privacy design of Web2 to Web3. We assume that the underlying network of Web3 will be a blockchain network like Bitcoin. Then the privacy we are discussing will be based on the premise that transactions are open, data is open source, and decentralized.
We actually realized the importance of privacy on the Internet a long time ago. When you were introduced to the Internet, your elementary school information teachers and parents would tell you never to reveal your real name on the creepy Internet since you didn’t know who was behind the screen.
But we were also late to realize the importance of privacy on the Internet. Our clipboards are frequently read by third-party applications, our preferences and browsing activities are sent to Google Analytics from countless websites (if you hit F12 right now, open your browser’s Console, click on the source, you’ll probably see it), and our data is sold for a price. All of these steal our data and privacy under the table. In the last few years, we’ve realized that we’ve been robbed by some Web2 Internet companies for a long time, so we started using Telegram, Duckduckgo, Mirror and other applications… Most importantly, in the Web3 era, privacy is finally being taken seriously by users and developers with the crypto boom. In the open and user-driven Web3 era, privacy protection will be a standard. Privacy will be a standard in the open and ux-driven Web3 era.
1. What is Web3 Privacy?
Web3 Privacy = Confidentiality + Anonymity = Data Privacy + Identity Privacy + Computation Privacy
In the Web3 era, assuming that all our interactions and network traces are interactions with on-chain DApp, all our data will be a single transaction and the information contained in the transaction. Take the transferFrom function in ERC-20 (with parameters of _from, _to, _value) as an example, the transaction would consist of the following: transfer sender, transfer recipient, transfer amount. For these transactions, we can define privacy, anonymity, and confidentiality in the Web3 era.
- Anonymity: the real physical identity of the sender and the recipient of the transaction needs to be non-public, and the amount of the transfer can be public (only parameter _value can be known).
- Confidentiality: the amount of the transfer and other data need to be non-public, and the sender and receiver can be public (only _from and _to can be known).
- Privacy: anonymity + confidentiality, which means that everything about the transaction, including the sender, the recipient, and the transfer amount, needs to be non-public.
On top of this, a further step towards privacy is that Web3 users need to be given the right to choose whether or not to disclose data before a transaction is sent, allowing users to actively choose privatize a transaction. The user-initiated choice of privacy for transaction execution after the transaction submitted may be difficult to achieve given the tamper-evident nature of the blockchain.
2. Data Privacy (Confidentiality)
Data privacy, or confidentiality, consists of two main components: control and ownership of the data and confidentiality of the data content itself.
a) Control and Ownership of Data
User data should not be a product. The user needs to have absolute control and ownership of the data, and it is a manifestation of privacy to ensure the ownership of the data and to prevent unauthorized manipulation of the data by the platform.
In the Web2 era, the user is the product. Thinking back, don’t you think almost all search engines, e-commerce platforms, and video sites have countless ads? In the Web2 era, companies treated users as their property, and the users were the source of their advertising revenue. From cookie tracking to Google Analytics, every action you perform, every second you spend on a page, is tracked. And you’re probably tracked just due to your checking boxes on privacy agreements that no one ever reads. In the future, as the Web becomes more pervasive, more users and data will flood the Internet, making it even more lucrative for companies that specialize in data theft. In the Web2 era, the default is that users have no privacy, no control or ownership of their data.
Data should not be a product, but information. In the Web3 era, the user have control over the data. Every interaction and transaction made and the data generated belongs to the user. Only the user can decide what to do with that data. It is also a sign of privacy for the user to not have the application decide what happens to the data, but to have the user’s subjective will be the basis.
Web2 platform applications are like farmers who give users a plot of land to live and monitor it 24 hours a day to let them generate data. They make money from the data users generate. Web3 applications are more like personal butlers, helping you manage your data without snooping or misinterpreting it without your knowledge. With the help of a public database like the blockchain, Web3 apps give control of the data back to the user, and all the app does is help the user filter and visualize the data better.
According to Vincent‘s article on Web3 reshaping the value of data, users can derive value from all their data. In the Web2 era, data had value, but it was the property of the company, and the value was not owned or distributed to the user. In the Web3 era, the data on the chain is a gold mine, and it is owned by the users. The more Web3 applications there are, the more data there is, the bigger the goldmine.
Web3 users are free to take their data goldmine with them and surf the entire web in the Web3 era. Imagine banking data interoperability, social media data interoperability, video site data interoperability …… You don’t have to imagine it, it’s already happening in Web3. Each of your footprints is left in the chain, with your address. Switching to another DApp, you don’t have to start from scratch, because the previous data will always be yours and with you.
In Web2, when you use social media, you’re talking to someone else on Facebook’s centralized servers; in Web3, when you use social media, you’re talking to yourself, the data is stored in your own account, and all the app does is to go up the chain and grab the data you have.
Data ownership is not particularly relevant to data privacy, so I won’t go too far here, but I do recommend you read the article mentioned above.
- Arweave is a permanent storage blockchain. In the worst case scenario, Arweave can store data for 200 years (for an average human, that’s forever). Although there is no rival for permanent storage, the real value of Arweave lies in the permanent control and ownership of the data.
- The data uploaded to Arweave by Web3 users and developers can never be turned off, and will always belong to the users and the entire decentralized network, where the users have absolute control over the data that it cannot be taken down. Immortalizing data is only the first step, the real focus is on sharing the risk of data being taken down equally by all participants in Arweave through the entire storage network.
- NFTs uploaded to Arweave will never be deleted or lost, which is the real value of NFT ownership to users. An ever-present NFT and NFT ownership should be a necessary feature of an NFT.
- In addition to the decentralized Arweave, a content platform that can be made anti-censorship through open source contract code (as data filtering and processing tools) and public on-chain data (as raw data), is the Library of Alexandria that can never be destroyed. This is the last mile of user data ownership in the Web3 era, that is, the perpetuation of data and ownership.
- Recently everFinance has made a Mirror search engine on Arweave. Any data is taken from Arweave, a decentralized network. Users will not have to worry about whether their favorite media platform will be taken down one day due to various pressures. If you want to BUIDL more platforms with permanent data from Arweave, you can try this open source library.
- Crypto Wallets
- Crypto wallets are a very important point of data control. As the gateway to the Web3 era, crypto wallets are as important as the Google search engine of Web2 (although crypto wallets do not collect and sell your data in the same way that Google does).
- All crypto wallets (Metamask, Bitkeep, etc.) embody Web3 users’ control over their data. Every transaction and on-chain action is signed or agreed to by the user. Users know exactly what data will be made public and what interactions will be recorded on the chain, without fear of being tracked by applications. This is one of the most widespread and often overlooked aspects of privacy in Web3, but Web3 users have long enjoyed the privacy experience of being in control of their data.
Data Control and Ownership ⇒ Better UX
With the Web3 trend, almost all user data is available at will, and it’s up to the project to decide what to do with it. Web2 developers will compete for data first, enclosing a piece of land for users to enter and turning them into a cow of milking data, rather than thinking about how to attract users with a better app. According to DuckDuckGo’s analysis of popular free Android apps, 96% of free apps on Android include third-party trackers, 87% of which send data to Google and 68% to Facebook. The homogenization of data on the blockchain allows Web3 developers to compete on the UX of their products, which is the only way to increase the number of users.
In the Web3 era, the user is no longer the product, and the user’s data is no longer controlled and utilized by a single entity. The user’s data belongs to him or her, and the user has autonomous control and permanent ownership of all interactions. Web3 data also belongs to the entire transparent decentralized network. This is definitely a good thing for user privacy.
b) Confidentiality of Data Content
The confidentiality of data content is achieved through private transaction applications. By using techniques such as zero-knowledge proofs and token mixers, the input and output of the transaction and the amount of money can be kept private.
Data content confidentiality refers to the encryption or non-disclosure of the specific content of a transaction, or the user’s transaction history. We can consider that the data content confidentiality is reflected by hiding the input and output addresses of the transaction or obscuring the exact amount of the transaction.
Ethereum’s account system is inherently not “private”. You go to a claim ENS airdrop, then you expose your address to contract interactions, and people can look at all your transactions. In real life, this is similar to when you go downstairs to buy a coffee, your room and hotel purchases could be looked at by others. It’s also similar to the way that the whereabouts of infected people in the Covid times are completely exposed (in some rare cases in certain countries). Such exposure is a good thing for the security of the entire health system and blockchain network, but it’s a more privacy-damaging thing for individuals.
For example, the image below shows the address of a hacker who crawls the web for leaked private keys, which are then transferred by the hacker as soon as the victim receives an airdrop. We can clearly see the process of his crime. Although it is righteous for us to examine his crime process, it reveals his privacy ……
A very simple and direct way to implement private transactions is to encrypt all accounts and transactions and then decrypt them. However, such a method is very expensive and time-consuming because it also involves the verification of transactions by the network.
Note that the private transaction here only hides and disables the data on the chain that is otherwise transparent. The privacy security of the onto the chain and off the chain amounts will be discussed in a later section.
- Aztec (zk.money)
- Aztec’s zk.money is a privacy zkRollup Layer2 on Ethereum.
- zk.money implements privacy transactions by directly dropping the Ethereum account system and switching to the UTXO system. It uses notes directly for ledger. A transaction is no longer a change in the status of two related accounts, but a change in the ownership of a note. Encrypting a note is much simpler than encrypting a transaction between accounts.
- Transactions on zk.money are invisible to third-party users. For the Aztec network system as a whole, and to avoid problems such as double-spending, the privacy of transactions is guaranteed by zero-knowledge proofs. The owner of UTXO proves that such a note exists in the system and that he or she owns the note by generating his or her own proof. The user does not have to show the actual amount of their transaction to prove that it is legitimate.
- The note ownership of zk.money is stored in two Merkle trees, one containing all notes ever created, and the other containing all notes ever destroyed. When a note is destroyed, instead of deleting the note from the first tree, it is simply added to the second tree.
- The work flow of zk.money is: user deposits funds from the main network to Layer2, generates proofs → user sends fund on Layer2 (with privacy protection) → user withdraws funds to the main network
- tornado.cash is a privacy on-chain token mixer on Ethereum, sort of like DASH’s anonymous transactions. It’s aptly named: put money into a tornado, then take it out, and nobody knows who sent it.
- tornado.cash also uses zero-knowledge proofs to hide the recipient account of a transaction. It breaks the link between the sender and the receiver by using a smart contract as a black box in the middle of the transaction. The sender provides a confidential hash value at the time of deposit, and the receiver (which can be the sender itself) only needs to provide a zkSNARK proof at the time of withdrawal to accept the deposit directly.
In addition, there are also public chains for privacy transactions such as monero, ZCash, DASH, etc., which basically implement privacy transactions through zero-knowledge proof and token mixing to achieve secrecy.
We already have these stable privacy solutions for the basic token function of transaction transfer. Web3 will be built around tokens with different values and utilities. The transaction transfer of tokens is only a small part of Web3 usage, but it is one of the most privacy revealing operations. In the Web3 era, our transactions will be private.Web3 era, our transactions will be private.
3. Comuputation Privacy (Anonymity and Confidentiality in Computation)
Computation privacy is a deeper extension of transactional privacy in data privacy. Computation privacy for smart contract execution is often achieved through cryptography, AI techniques, and trusted execution environments, but it is very difficult to achieve a perfect balance between performance and privacy.
Computation privacy is a step closer to privacy transactions for data privacy, and extends to Turing-complete smart contracts. The privacy protection of smart contracts focuses on the execution process of smart contracts, shielding the data and intermediate state involved in the execution from third parties and the nodes executing the smart contracts. Computation Privacy are divided into three main directions: cryptography (e.g., MPC), AI techniques (e.g., federated learning), and trusted execution environments (e.g., SGX).
Multi-party secure computation is usually accomplished with the help of various underlying cryptographic frameworks, mainly Oblivious Transfer (OT), Garbled Circuit (GC), Secret Sharing (SS), and Homomorphic Encryption (HE), etc. .
Federated learning is a highly efficient machine learning among multiple participants or computational nodes while guaranteeing the information security of big data exchange, protecting the privacy of end data and personal data, and ensuring legal compliance. In short, it is the process of sharing data with other parties without exposing privacy, thus improving machine learning together. Federated learning is more about AI.
Trusted execution environment is mainly related to the underlying hardware. It is usually a trusted, isolated, confidential space within the CPU that is independent of the operating system. Since data processing takes place in the trusted space, the privacy of the data depends on the implementation of the trusted hardware. The main challenge is how to balance performance and privacy.
- Oasis Network
- Oasis Network is mainly benchmarked against Polkadot, layering consensus and computation, and using ParaTime parallel chains to handle computation. Oasis Network uses a trusted execution environment solution (SGX-based Confidential ParaTime) to achieve private computation. There is a good combination of performance and privacy in a layered and trusted execution environment.
- The ecosystem of Oasis Network is more advantageous in privacy computing. Also, Oasis Network is compatible with EVM, so the ecosystem is more scalable.
- The use cases of Oasis Network are mainly in data tokenization (pledging data, gaining revenue and some permission control) and as a high-performance EVM L2.
- The disadvantages of Oasis Network are: low composability, complicated layering design, no communication between different ParaTime; stateless contracts, inflexible applications, and still has a vague application scenarios.
- PlatON Network
- PlatON Network is primarily a privacy + AI public chain project. Its main features are multi-party secure computing and AI, also separated consensus and computation, on-chain verification, off-chain computation (feels like SCP). Off-chain computing not only brings higher performance, but also allows for a variety of complex operations (especially in AI and machine learning).
- The main difference between Oasis Network and PlatON Network is that Oasis Network is using differential detection + non-full node consensus, while PlatON Network is using full node consensus; PlatON guarantees the trustworthiness of off-chain computation by homomorphic encryption.
Other projects include Secret Network, Phala Network, etc. ICP is also working on adding a trusted execution environment to achieve computation privacy.
Both transaction privacy and computation privacy are about protecting sensitive data from third-party‘s prying. Transactional privacy can be achieved directly through on-chain DApps combined with zero-knowledge proofs or token mixers, which is more pluggable and flexible than computational privacy. Computation privacy is more complex and must be designed on the entire blockchain network, involving cross-chain solutions and ecosystems to also be developed.
4. Identity Privacy (Anonymity)
Identity privacy is anonymity, which consists of two main components: the separation of physical identity and digital identity, and the independence of digital identity.
a) Separation of Physical Identity and Digital Identity
The separation of physical and digital identities represents separating the user’s real identity from the web identity. In Web 1 we didn’t need to reveal our phone number and name to browse, but in Web 2 we had to submit our information for kyc. This is a serious privacy invasion, but one that is still difficult to address at this stage.
The separation of physical identity from digital identity refers to the separation of a person’s real identity from their online identity, which is essentially the anonymization of a person’s identity during the process of entering the Internet.
It sounds easy to do, as long as the bitcoin giant whale doesn’t reveal the name of the organization, and as long as the user doesn’t use their real name as their username or their real geographic location. But in fact, this is one of the most difficult pain points for privacy. We can’t protect our real identities with seamless separation, as long as we’re online, the telecom operator has access to our identities; as long as we buy things, the e-commerce platform has access to our identities; as long as we use ENS, we’re likely to be found out…
In addition, the various third-party logins on Web2 sites actually reveal a lot about our online whereabouts and real identity. For the sake of ease of use, we log in directly through Google or Facebook, which in fact contributes to the monopoly and centralization of these large companies, and also violates our own anonymity.
Probably the most relevant example I can think of for the separation of physical and digital identities is the dark web. We have to be thankful to dark web, without networks like the dark web to adopt Bitcoin payments, the industry would probably not have grown as much as it has. What the dark web tries to do is to separate the real world from the digital world, but users who buy illegal items on the dark web are still likely to need them delivered to them by courier. So, the dark web doesn’t really achieve that either.
The anonymity of physical identities and the separation from digital identities is really what Web1 is about, but on Web2 we are exposed to the spotlight. Not being able to use certain apps without filling in your name and phone number is an invasion of user experience and privacy.
b) Independent Digital Identity
On top of the separation between physical and digital identities, what we can go deeper in terms of privacy is the independence of our digital identities. If our digital identity is not independent, then it means that digital identity is still “connected” to real identity, and our privacy is still at great risk of leakage and exposure.
The Metaverse is defined by an authoritative body is a virtual world based on the Internet that is interconnected with the real world and exists parallel to it, a virtual space that maps the real world and is independent of it. It is a virtual space that can map the real world and is independent of the real world. Personally, I think this definition is very much in line with my ideal vision of the metaverse concept. This “virtual space that maps the real world and is independent of it” is a perfect combination of digitalization and privacy. We can enjoy the real world built through tens of thousands of years of history, but we can also be “reborn” in the metaverse, and become residents of the metaverse through an independent and private identity.
Digital Identity Socialization
The independence of digital identity is complemented by the data ownership feature of the blockchain Web3. On the basis of digital identity independence, we can develop digital social independence. In Web2, we have social or real-time communication products such as Twitter, Facebook, and Zoom, but the social relationships in these software are based on real identities and cannot be independent. Digital identity-independent social networking would be a very important application of privacy and the most important part of the metaverse.
Digital Identity Formation and Revival
Web3 users can take all their data with them, making it easier to develop a digital identity. The interoperability of data naturally builds bridges between projects and connects lone islands of data. Through your degen score, CryptoPunks avatar, and governance experience on the chain, we can credibly and transparently cultivate an identity. This makes community formation more transparent and efficient, and makes it easier to socialize digital identities.
When you want to jump from a real identity to a digital identity, you can simply rebear your KOL or celebrity career into a digital identity with a new look. The prime example are virtual idols. Often we know exactly who the real identity behind the digital identity is, but the person behind the identity can still use the new digital identity to foster new communities and generate more memes. Of course this is the opposite of privacy.
Even when you don’t want the digital identity, you can just create a new account and start from scratch (of course your real identity can’t be exposed in the digital identity), while in real life, it’s more dangerous to try to rebear it. All socializing activities with a digital identity stays in the digital world. What happens in Vegas stays in Vegas. Your real identity is not affected at all.
- Realy is a Metaverse project with street culture and urban landscape as the main theme, and proposes the concept of City DAO. Realy will transfer 3D virtual clothes, virtual concerts, and offline fashion brands to the chain, and will also hold virtual concerts and support users to do City governance in the Metaverse.
- The Metaverse of the future will definitely be a “virtual reality”, and on top of that, it needs to be able to exist independently without being overly dependent on the real world. What Realy brings is a complete representation of street culture on the chain, which is very attractive for young people. I personally like to experiment with different experiences in digital/metaverse identities, and my social accounts and games are all set to female gender. This opens up a myriad of possibilities for digital identity and social interaction.
- In Realy, you can freely embrace trendy culture regardless of gender and appearance, or digitally participate in virtual concerts regardless of geographic location, time and risk of epidemics. These are important ways for Web3 users to live and socialize. All of this can be done without the need for physical identity. All on-chain data, including trendy clothing purchases and concert attendance records, will be accompanied by your digital identity, and your image as an 88rising fan and fashion trendsetter will appear in other metaverse worlds.
In essence, the reality of socializing and privacy is somewhat contradictory. If ou have to socialize, you have to be visible, then you have to give up some of your privacy, to exchange information with others.
But Blockchain Web3 gives users: control and ownership of data and independence of digital identity, complete privacy of physical identity. We can interact with each other in the same way as in real life with a digital identity on the chain. The existence of a new digital identity also gives people a second life and a second way of life.
5. Paradox of Blockchain and Privacy
- Impossible trinity of performance, privacy and usability
Cross-chain application + privacy chains or decentralized applications that specialize in privacy can improve blockchain privacy without excessive loss of performance and availability.
- In an article by BluemountainLabs, it is mentioned that privacy protection needs to be integrated into the global underlying logic. Vitalik also says: “Only a global anonymous set is truly reliable and secure.” This means that perhaps only a global privacy protection on a blockchain network would be most effective.
- In practice, both Bitcoin and Ethereum sacrifice some privacy protections in order to preserve decentralization and computational costs. In Bitcoin, by grabbing data from the whales’ accounts, their operations such as selling and transferring money are exposed, which causes them to lose some of their privacy. The term “privacy” is only mentioned in the chapter on the account system and UTXO in Principles of Ethereum Design. Instead of UTXO, which is more privacy-protected and scalable, Ethereum has adopted a higher performance and user-friendly account system.
- In the impossible trinity of performance, usability, and privacy, the first thing that ordinary users and developers want to solve is performance, followed by usability, and finally privacy. In the age of Web3, perhaps the last darkness before the dawn will be the privacy issue.
- In this paradox, what we want is global privacy for the blockchain without excessive performance degradation.
- One of the most complete but also most bloated solutions is to make a dedicated privacy chain (e.g. Monero), and then adapt it with various cross-chain tools and various wallets. But this is a very bad user experience. It’s like if you want to post a picture of your tattoo on social media, and you don’t want your elders to see it, then you need to block them one by one, or even go to a new social media and only add your peers as friends. This is a pain in the ass in both Web2 and Web3 scenarios.
- One way to improve the user experience and performance by giving up part of the global picture is to implement privacy operations through pluggable decentralized applications or Layer2 (such as zk.money). This allows users to enjoy privacy protection (and even additional performance benefits) without leaving the original blockchain network. One of the pluggable decentralized applications is a bit better than Layer2. Because in my imagination, the distant and beautiful Web3 will definitely be multi-chain interconnected. A pluggable multi-chain decentralized privacy application can be more flexible and “decentralized” (not served for a single chain). In this regard, I am very optimistic that a more flexible design paradigm will enable privacy-related applications, such as SCP on Arweave.
- Paradox of disclosure/immutability and privacy
It is not possible to withdraw or hide transaction information after the submission. However, users can decide whether or not to disclose data about a transaction by using a privacy option before the transaction is submitted (like StarkWare’s Volition).
- First of all, blockchain data is public. The input and output of transactions and their contents are usually visible in the blockchain browser. This makes data protection extremely difficult.
- Secondly, blockchain data is immutable. Once data is uploaded and written to a block, it cannot be tampered with. Users who submit a transaction that exposes their privacy cannot withdraw or hide it, and may have to simply discard the address to clear their name. This actually violates the EU data protection regulations (all users have the right to be “forgotten”).
- The openness and immutability of data can actually be considered together. These are two fundamental qualities of blockchain that we can never compromise like performance, and privacy is definitely an additional feature after these two vital qualities. Then public and immutable can’t be eliminated (we can solve anonymity and confidentiality though).
- But what we can figure out is that these two qualities were created for the security and openness of the network. We actually have control and ownership of our data. Even if our transaction data can be withdrawn, it is likely to have been crawled by thousands of web crawlers before it is withdrawn, in which case the withdrawal may not make much sense. The Internet has memory, and an open and untamperable Internet is even more so.
- While it is not possible to retract or hide data after the fact, it is possible to make it public beforehand at the user’s discretion. This optional privacy option is user-friendly and allows privacy to be used for real purposes. A similar solution is the Volition data availability model in immutableX by StarkWare (but be aware that if there is too much data off the chain it will still go back to the way of Web2), which allows user to choose whether to store the data off or on the chain. So if you don’t want to make the data public, and keep some privacy on the public network, then you can just choose to store the data off-chain.
- In fact, most of the time, your on-chain data is your own wealth and value, but for the small amount of data that needs to be private, I think it’s important to protect it with optional privacy options.
No one knows what will happen with Web3, and most Web3 users have no idea what they want, just as it took Web2 users a long time to realize that their privacy was being violated with such recklessly. The recent flood of funding for privacy projects has actually brought the need for privacy to the forefront and made more people aware that in the future, a better Internet will require privacy.
In the Web3 era, we need data autonomy, data privacy, data computation privacy, and real-world identity privacy. At the same time, the best privacy applications ideally need to be user experience-centric, pluggable, and lightweight.
Web3 = Get + Post + Own. In the Web3 era, everyone will own privacy.