Scalability is still a point of concern for Bitcoin — but clever minds continue to produce innovative proposals to resolve bottlenecks.
One of the specific scalability concerns is the growth of the unspent transaction output (UTXO) data set held by Bitcoin nodes. This is the list of all (fractions of) bitcoin in existence and details on how it can be spent. Unfortunately, this list tends to grow over time, particularly as new users enter the system — they’ll have their coins reflected in UTXOs as well.
In an attempt to address this growing problem, Tadge Dryja of MIT Media Lab’s Digital Currency Initiative (and co-author of the Lightning Network white paper) has proposed a hash-based accumulator he calls the “Utreexo” solution.
A user’s bitcoin balance is the sum of all of their UTXOs. When you make a transaction, your wallet application will “rummage” through the collection of your UTXOs to find enough funds to cover the transaction. The transaction (when confirmed on the blockchain) then essentially “destroys” the UTXOs used, and “creates” new UTXOs: now controlled by the recipient of the transaction, for them to use in the next transaction.
Bitcoin nodes also store the entire UTXO data set as a way to know which outputs from previous payments have not yet been spent, determining which UTXOs are still available. Whenever they receive a new transaction from the network, they check that transaction against the list of UTXOs to make sure that the coins that are being spent in the transaction really exist, and that they haven’t already been spent. If the transaction is valid, they update their UTXO data set.
Bitcoin nodes, therefore, have two big data sets to store: the blockchain and the UTXO list. The blockchain is over 230 gigabytes at the time of writing, and it’s growing at a predictable pace. Users that prefer not to store this data can “prune” (delete) old blockchain data: They’ll still know which coins are spendeable due to the UTXO list. However, there are already nearly 60 million UTXOs at present, with no true incentives in place to stop the list’s growth rate. And this data cannot be deleted because nodes need it to tell which new transactions are valid.
As the number of Bitcoin users is increasing and the UTXO data set is growing, it is becoming a significant factor in the growing requirements to run a full node. In the long term, this problem is perhaps even bigger than the growth of the blockchain.
Bitcoin developers have been aware of the UTXO problem for years and have proposed several solutions. Recent research has proposed utilizing cryptographic accumulators, a “cousin” of the Merkle Tree. Where a Merkle Tree hashes the data together into a single hash, an accumulator multiplies the data into a single sum of data of fixed size: a “root.” In combination with some other data — a “proof” — an accumulator lets users check that a specific piece of data is “stored” within the root.
The idea behind Dryja’s Utreexo project is to create another type of pruning but for the UTXO set. Utreexo uses an accumulator to create a root of the UTXO set. By storing only this sum instead of the full UTXO set, the accumulator keeps RAM and disk storage at less than a kilobyte of data. Nodes that use this Utreexo accumulator are called “compact nodes.”
When a new transaction is created by a compact node and transmitted over the network, it sends the inclusion proof along with the transaction. From then on, each compact node forwards the transaction and the proof to other compact nodes. Since every compact node has the same accumulator state and root, the proofs are the same for each node. All the compact node does is forward the exact same message to its peers.
Once the transaction is included in a block, they discard all the proof data. This means that the compact nodes can keep transactions and inclusion proofs in their mempool, but never actually write to the hard drive.
But what happens when a transaction isn’t created by a compact node?
Although compact nodes can easily communicate these transactions to one another, not everyone will keep a compact node. To allow compact nodes to communicate with full nodes, Utreexo also introduces the idea of a “bridge node.” The bridge node stores the blockchain and the entire UTXO set.
This bridge node is an important “bootstrap” to the existing network. Compact nodes can’t receive transactions from standard full nodes directly because the standard node won’t send inclusion proofs. When a full node sends a transaction to a compact node, it first passes through the bridge node. The bridge node looks up the UTXO, builds a proof from its accumulator, and forwards both the transaction and proof to the compact node. The compact node checks the forwarded inclusion proof to its own accumulator of proofs and validates the transaction if it matches.
An important note is that the bridge node is only needed in one direction. Compact nodes are able to send transactions directly to full nodes. Although compact nodes store less data, the messages between compact nodes contain all of the data that full node messages do, in addition to the inclusion proofs. If a compact node wants to send a transaction to a full node, all they have to do is omit the inclusions proof and the standard full node will accept the transaction.
Utreexo does come with a trade-off.
There are concerns over the potential centralization of nodes if Utreexo were to become widely used. Bridge nodes would actually require more storage than traditional full nodes, since they would require them to store not only the regular UTXO set, but also the inclusion proofs that they forward to compact nodes. The question remains whether or not there would be enough “good Samaritans,” with no direct financial incentives, to run these bridge nodes.
It should also be noted that Utreexo is still a work in progress. There are currently no concrete plans to have it included in Bitcoin Core, but as it is not a consensus change, any person or entity could proceed with it.