Merkle Tree
Incremental Merkle tree implementation for commitment storage
Merkle Tree
The Merkle tree is the data structure at the heart of ZKMix's privacy model. It stores all deposit commitments as leaves and enables any depositor to prove membership in the set without revealing which leaf is theirs. ZKMix uses an incremental Merkle tree with Poseidon hashes, optimized for on-chain storage efficiency on Solana.
Why a Merkle Tree
A zero-knowledge mixer requires a mechanism to prove set membership: "my commitment is one of the N commitments that have been deposited." There are several ways to prove set membership in zero knowledge, but Merkle trees offer the best trade-off between proof complexity and scalability:
Merkle tree approach: The prover demonstrates knowledge of an authentication path (20 sibling hashes for a depth-20 tree) from their leaf to the publicly known root. The circuit verifies 20 hash computations, regardless of the total number of leaves. This means the proof size and proving time are O(log N) in the number of deposits, while supporting over one million deposits.
Alternatives considered:
- Accumulator-based proofs (RSA accumulators): Membership witnesses are constant-size, but the modular exponentiation operations are extremely expensive inside a SNARK circuit, leading to millions of constraints.
- Bloom filters: Probabilistic and allow false positives, which is unacceptable for a financial protocol where false membership proofs would enable theft.
- Flat list with linear scan: Proving membership requires iterating over all commitments inside the circuit, making the constraint count O(N). This is infeasible for any non-trivial number of deposits.
The Merkle tree achieves O(log N) proof complexity with O(N) storage, which is the optimal trade-off for this application.
Incremental Merkle Tree Design
An incremental Merkle tree (also called an append-only Merkle tree) is a Merkle tree where leaves are inserted sequentially from left to right, and no leaf is ever modified or removed after insertion. This property is critical for ZKMix because it allows the on-chain program to maintain only the minimal state needed to compute the next insertion, rather than storing the entire tree.
Structure
The tree has a fixed depth of 20 and a fixed number of leaves: 2^20 = 1,048,576. All leaves are initialized to a fixed "zero value" (the hash of an empty leaf). The zero value propagates up the tree: at each level, the default hash is Poseidon(defaultHash[level-1], defaultHash[level-1]). These default hashes are precomputed and hardcoded.
Level 20 (Root): H(H(...), H(...))
/ \
Level 19: H(...) H(...) <- default hash at level 19
/ \ / \
... ... ... ...
Level 1: H(L0,L1) H(L2,L3) ... ...
/ \ / \
Level 0: L0 L1 L2 L3 ... (1,048,576 leaves)On-Chain State
The on-chain program stores the following state for the incremental Merkle tree:
pub struct MerkleTree {
/// Current number of inserted leaves (next insertion index)
pub next_index: u32,
/// The "filled subtrees" array: filledSubtrees[i] stores the hash of
/// the most recently completed left subtree at level i.
/// Length = tree depth (20)
pub filled_subtrees: [Field; 20],
/// Current root of the tree
pub current_root: Field,
}The filled_subtrees array is the key optimization. When inserting a new leaf, the algorithm only needs to recompute hashes along the path from the new leaf to the root. At each level, it either uses the newly computed hash or the stored filled_subtrees[level] value as the left or right child, depending on whether the current index's bit at that level is 0 or 1.
Insertion Algorithm
function insert(leaf):
currentIndex = nextIndex
currentHash = leaf
for level in 0..depth:
if currentIndex % 2 == 0:
// New leaf is a left child
// Store it as the filled subtree at this level
filledSubtrees[level] = currentHash
// Right sibling is the default (zero) hash at this level
currentHash = Poseidon(currentHash, zeros[level])
else:
// New leaf is a right child
// Left sibling is the stored filled subtree
currentHash = Poseidon(filledSubtrees[level], currentHash)
currentIndex = currentIndex / 2
currentRoot = currentHash
nextIndex = nextIndex + 1This algorithm computes exactly 20 Poseidon hashes per insertion (one per level), regardless of the tree size. It modifies at most 20 elements of the filled_subtrees array and updates the root. The total on-chain computation per deposit is therefore constant.
Storage Cost
Each field element in the BN128 scalar field is 32 bytes. The on-chain Merkle tree state consists of:
| Field | Size |
|---|---|
next_index | 4 bytes |
filled_subtrees (20 elements) | 640 bytes |
current_root | 32 bytes |
| Total | 676 bytes |
This is remarkably compact. The entire Merkle tree state for a million-leaf tree fits in a single Solana account with less than 1 KB of data. The rent-exempt minimum for this account is approximately 0.003 SOL.
Tree Depth: 20 Levels
The tree depth of 20 was chosen to balance capacity, circuit complexity, and proving time:
| Depth | Max Leaves | Circuit Constraints | Proving Time (snarkjs) | Authentication Path Size |
|---|---|---|---|---|
| 16 | 65,536 | ~12,000 | ~6s | 512 bytes |
| 20 | 1,048,576 | ~16,000 | ~10s | 640 bytes |
| 24 | 16,777,216 | ~20,000 | ~15s | 768 bytes |
| 32 | 4,294,967,296 | ~26,000 | ~22s | 1,024 bytes |
Depth 20 supports over one million deposits per pool, which provides a large anonymity set. The constraint count increase from depth 16 to 20 is modest (~4,000 additional constraints, or 4 additional Poseidon hashes), and the proving time increase is acceptable. Going beyond depth 20 provides diminishing returns: the anonymity set is already large enough that the bottleneck shifts to actual usage volume rather than tree capacity.
If a pool reaches its maximum capacity of 1,048,576 deposits, a new pool must be deployed. The protocol can manage multiple pools through a registry program.
Poseidon Hash Function
Why Poseidon
The choice of hash function is critical in ZK-SNARK circuits because hash computations dominate the constraint count. ZKMix uses the Poseidon hash function, which is specifically designed for efficient arithmetic circuit implementation.
Poseidon vs. SHA-256: SHA-256 operates on bits and uses boolean operations (XOR, AND, rotations). Expressing these in an arithmetic circuit over a prime field requires decomposing field elements into individual bits and simulating boolean operations with arithmetic constraints. A single SHA-256 invocation costs approximately 25,000 R1CS constraints. A depth-20 Merkle path with SHA-256 would require ~525,000 constraints, making proof generation impractically slow (minutes instead of seconds).
Poseidon vs. Pedersen hash: Pedersen hashes are based on elliptic curve scalar multiplication and are relatively efficient in circuits (~1,500 constraints per hash). However, Poseidon is even more efficient (~700 constraints per hash in the 2-to-1 configuration) and operates natively over the field without requiring elliptic curve point arithmetic. Pedersen hashes also have a more complex implementation and are less widely adopted in modern ZK protocols.
Poseidon vs. MiMC: MiMC is another arithmetically-friendly hash function, but it has a higher number of rounds (typically 110+ for 128-bit security over BN128) and uses the simpler x^3 S-box, which has weaker resistance to certain algebraic attacks. Poseidon's use of x^5 with fewer rounds (8 full + 57 partial for width 3) achieves the same security level with fewer total constraints. Additionally, MiMC's security margins have been questioned by some cryptanalytic results, while Poseidon's security analysis is more mature.
Poseidon Specification
ZKMix uses Poseidon with the following parameters, consistent with the widely adopted configuration from the Poseidon paper and implementations in circomlib:
| Parameter | Value |
|---|---|
| Field | BN128 scalar field (p ~ 2^254) |
| S-box | x^5 |
| Width (t) | 3 (for 2-to-1 hashing) |
| Full rounds (R_F) | 8 (4 at start, 4 at end) |
| Partial rounds (R_P) | 57 |
| Security level | 128 bits |
| Capacity | 1 field element |
| Rate | 2 field elements (for 2-to-1 hash) |
The Poseidon permutation applies a sequence of rounds to a state vector of t field elements:
- Add round constants: Each state element is added to a pre-determined round constant.
- S-box layer: In full rounds, the S-box (x^5) is applied to every state element. In partial rounds, it is applied to only the first element.
- MDS matrix multiplication: The state vector is multiplied by a Maximum Distance Separable (MDS) matrix, providing diffusion across all state elements.
The round constants and MDS matrix are generated deterministically from a seed string using the Grain LFSR, ensuring there are no "nothing up my sleeve" concerns.
Constraint Cost
A single Poseidon 2-to-1 hash invocation in Circom decomposes into:
- 8 full rounds x 3 S-box applications = 24 fifth-power computations
- 57 partial rounds x 1 S-box application = 57 fifth-power computations
- Each x^5 requires 2 multiplication constraints (x^2 = x x, x^4 = x^2 x^2, x^5 = x^4 x is 3 multiplications, but x^4 x can be folded into the round structure, yielding ~2 constraints per S-box on average)
- MDS matrix multiplication: free in R1CS (linear operations are absorbed into the constraint wiring)
- Round constant addition: free in R1CS (constant additions are absorbed)
Total: approximately 700 R1CS constraints per 2-to-1 Poseidon hash invocation. This is roughly 35x cheaper than SHA-256 in a circuit, which is why Poseidon is the standard choice for ZK-SNARK Merkle trees.
Commitment Insertion
When a user deposits funds into ZKMix, the following process inserts their commitment into the Merkle tree:
Commitment Construction (Client-Side)
- Generate random
secret(248 bits) andnullifier(248 bits) from a CSPRNG. - Compute
commitment = Poseidon(nullifier, secret). - Encode the
secret,nullifier, pool address, and denomination into a deposit note string.
On-Chain Insertion
- The deposit instruction receives the
commitmentas an argument. - The program verifies
next_index < 2^20(tree is not full). - The program executes the incremental insertion algorithm (described above), computing 20 Poseidon hashes to update
filled_subtreesand derive the new root. - The program updates
current_rootand incrementsnext_index. - The program appends the new root to the root history buffer.
- The program emits a
Depositevent containing:
commitment: the inserted leaf valueleafIndex: the index where the commitment was inserted (equal to the oldnext_index)timestamp: the block timestamp
The leafIndex emitted in the event is essential for the client to reconstruct the Merkle authentication path during withdrawal. The client must know which leaf position corresponds to their commitment to compute the correct pathElements and pathIndices.
On-Chain Poseidon Implementation
Computing Poseidon hashes on-chain within Solana's compute unit budget is a significant engineering challenge. Each Poseidon 2-to-1 hash invocation costs approximately 5,000-8,000 compute units on Solana (depending on the implementation). With 20 hashes per insertion, the total is approximately 100,000-160,000 CU for the Merkle tree update alone, plus overhead for account deserialization, rent checks, and token transfers.
ZKMix implements Poseidon using optimized field arithmetic in the BPF instruction set, leveraging 64-bit limb representations for the 254-bit field elements and minimizing memory allocations. The deposit instruction requests 300,000 compute units to accommodate the full insertion plus token transfer logic.
Root History
The Async Withdrawal Problem
In a naive implementation, the on-chain program would store only the current Merkle root and require withdrawals to reference this exact root. However, this creates a race condition: if a new deposit occurs between when a user generates their proof and when the withdrawal transaction is confirmed, the root changes and the proof becomes invalid.
On Solana, with block times of approximately 400ms and potentially high deposit throughput, this race condition would cause frequent withdrawal failures, severely degrading the user experience.
Solution: Root History Buffer
ZKMix stores a circular buffer of the last N Merkle roots (N = 30 by default). When processing a withdrawal, the on-chain program checks whether the proof's root matches any of the last 30 stored roots, rather than requiring an exact match with the current root.
pub struct RootHistory {
/// Circular buffer of recent roots
pub roots: [Field; 30],
/// Index of the most recently inserted root
pub current_index: u8,
}
impl RootHistory {
pub fn add_root(&mut self, root: Field) {
self.current_index = (self.current_index + 1) % 30;
self.roots[self.current_index as usize] = root;
}
pub fn is_known_root(&self, root: Field) -> bool {
self.roots.iter().any(|r| *r == root)
}
}Storage cost for the root history: 30 x 32 bytes = 960 bytes, plus 1 byte for the index = 961 bytes.
Root History Depth
The value N = 30 means a withdrawal proof remains valid as long as fewer than 30 deposits have occurred since the proof was generated. At typical deposit rates, this provides a window of several minutes to hours. If deposit activity is extremely high (e.g., 30 deposits within a few seconds), users may need to regenerate their proofs, but this is an unlikely scenario in practice.
If a proof's root is not found in the history buffer, the withdrawal transaction fails with an "unknown root" error. The client can detect this by checking the current root history before submission and regenerating the proof if necessary.
The root history depth is a configurable parameter. Setting it too low causes frequent proof invalidation; setting it too high increases the storage cost and the window during which a deposit-then-withdraw-from-old-root attack could theoretically be relevant (though this does not impact security since the Merkle membership proof is still valid for any historical tree state).
On-Chain Storage Optimization
ZKMix employs several strategies to minimize on-chain storage costs:
PDA Layout
The Merkle tree state and root history are stored in a single PDA derived from the pool's denomination and asset type:
seeds = ["mixer", denomination.to_le_bytes(), asset_mint.as_ref()]This PDA contains:
| Section | Size | Description |
|---|---|---|
| Discriminator | 8 bytes | Anchor account discriminator |
| Authority | 32 bytes | Pool authority public key |
| Asset mint | 32 bytes | SPL token mint (or system program for SOL) |
| Denomination | 8 bytes | Deposit amount in base units |
| Next index | 4 bytes | Current leaf count |
| Filled subtrees | 640 bytes | 20 x 32-byte field elements |
| Current root | 32 bytes | Latest Merkle root |
| Root history | 961 bytes | 30 x 32-byte roots + 1-byte index |
| Total | ~1,717 bytes | Rent-exempt cost: ~0.012 SOL |
No Full Tree Storage
The complete Merkle tree with 1,048,576 leaves would require 2,097,151 nodes x 32 bytes = approximately 64 MB of storage. Storing this on-chain would cost over 400 SOL in rent-exempt deposits, which is entirely impractical.
Instead, ZKMix stores only the 676-byte incremental state on-chain and relies on clients to reconstruct the full tree from event logs. This is possible because every deposit emits an event with the commitment and leaf index. Clients fetch these events (via getSignaturesForAddress and getTransaction, or through an indexer) and rebuild the tree locally.
Event-Based Tree Reconstruction
The client reconstructs the tree as follows:
- Fetch all
Depositevents from the mixer program for the target pool. - Initialize an empty depth-20 Merkle tree with zero-value leaves.
- Insert each commitment at its recorded leaf index.
- Verify that the locally computed root matches the on-chain root (or one of the historical roots) as a consistency check.
- Locate the user's commitment in the tree by scanning leaves.
- Compute the authentication path from the user's leaf to the root.
This reconstruction is computationally inexpensive (approximately 1 second for 100,000 deposits on a modern device) because each insertion requires only 20 hash computations, and the client can use the same incremental algorithm as the on-chain program.
For performance optimization, the client can cache the reconstructed tree locally and incrementally update it by fetching only new deposit events since the last synchronization. The SDK provides this incremental sync capability.