#051: 🚀 Scaling To A Million TPS

Tezos' Path to Massive Scalability: A Deep Dive into Its Data Availability Layer

Welcome to Just The Metrics

Read time: 5 minutes

Hey there, Just The Metrics fam! 😄,

Our book “A 3 Step Assessment Framework of Layer 1 Blockchains” went live on Book.io and we are amazed by all your feedback and support! 💙

Thank you so much! If you are curious, check it out here 👇

Okay, now let’s dive into today's topics. This is what we have for you today:

  • Tezos' Path to A Million TPS

  • 💎 Gem of the Week

TL;DR

  • Tezos is a self-amending cryptographic ledger, has undergone 14 different upgrades to date.

  • Tezos stands out as the sole blockchain that has implemented scaling solutions capable of handling thousands of TPS without sacrificing decentralization & security.

  • The proposed Data Availability Layer (DAL) combined with its enshrined rollups is a sophisticated system that is designed to be fully decentralized and capable of handling up to a million TPS.

  • An initial version of the DAL is currently being tested, with a full Mainnet release anticipated in early 2024.

Tezos' Path to A Million TPS

In the 14 years since Bitcoin was first introduced, blockchain technology has seen substantial evolution. Yet, the mass adoption of this technology remains an unrealized goal.

The chief obstacle in achieving this is the complex task of enhancing scalability, without compromising the system's decentralization and security.

However, Tezos, a unique blockchain, stands against this tide. To date, it has introduced 14 such enhancements, with recent ones hinging on a rollup-centric roadmap.

In this edition of our newsletter, we will explore Tezos's scalability roadmap in detail. We will take a comprehensive look at the upcoming major upgrade to the Tezos network, designed to enable significant scalability.

Ok, let’s dive in👇

A Throwback to the Recent Tezos Upgrades

Tezos is a self-amending cryptographic ledger, has undergone 14 different upgrades to date, each bringing significant improvements and new capabilities to the protocol.

So let’s take a look at two of the most significant updates that happened this year.

Mumbai Upgrade

One of the most notable upgrades was the Mumbai upgrade, which was successfully activated at block number 3,268,609 on March 29, 2023.

The Mumbai upgrade introduced several key enhancements:

  • Block times were halved from 30 seconds to 15 seconds, enabled via pipelining, which significantly increased the speed of transactions.

  • The introduction of Smart Rollups (Smart Contract Optimistic Rollups /SCORUs) on the mainnet, providing an initial scaling capability through WASM (WebAssembly) kernels.

  • The launch of a zk-rollup solution, referred to as 'validity rollups' on Tezos, called ePoxy, on the testnet.

Nairobi Upgrade

Following the Mumbai upgrade, the Tezos community activated its 14th protocol upgrade, Nairobi, on June 24, 2023, at block number 3,760,129.

The Nairobi upgrade introduced several key enhancements:

  • An 8x increase in TPS for various operations such as transactions, smart contract calls, and SCORUs maintenance operations.

  • A functionality boost to Smart Rollups by incorporating new host functions and internal Layer 2 messages.

  • Enhanced speed of pre-attestation propagation, enabling faster consensus within the network.

So lets take a closer look at SCORUs, that L2 scaling solution that enables Tezos to scale without compromising the decentralisation or security.

SCORUs: A Scalable and Trustless Solution for the Next Generation dApps

Rollups are processing units that interact with the Tezos blockchain. SCORUs are Layer 2 solutions that sit on top of Layer 1, allowing anyone to originate and operate one or more rollups for increased scalability.

This optimistic integration means that published claims about the rollup's state are initially trusted, but can be refuted and economically punished if proven invalid.

Tezos Rollups: Enshrined in the Protocol

Tezos rollups are enshrined, meaning they receive special treatment from the L1 Tezos blockchain.

This distinction provides better security guarantees than Ethereum rollups, built into specific smart contracts with no L1 guarantees.

The enshrinement of Tezos rollups leads to improvements in gas fees, performance, and interoperability compared to Ethereum smart contract (SC) rollups.

  • Security: Designing secure rollup bridges is easier with native code than in a smart contract execution environment.

  • Performance: Native code allows for better computing per operation and lower fees.

  • Governance: Tezos rollups are governed by the Tezos governance process, which has a proven track record.

In contrast, Ethereum core rollup technology is limited by the Solidity language. Tezos rollups can support multiple languages and can be upgraded to integrate new features as needed.

Tezos Rollups: Scaling and Trust

Tezos addresses the trust issue in optimistic rollups by allowing anyone to register a rollup into the Layer 1 by posting a 10,000 XTZ bond. The rollup creator must also attach a kernel, which acts as an execution environment for their rollup.

The kernel serves as a rule book, dictating the permissible actions within the rollup, such as how blocks must be stacked and ensuring that transactions follow the rules.

Optimistic rollups in Tezos trust that submitted messages represent valid state transitions according to the kernel. However, the possibility of invalid commitments remains. To address this, Tezos uses a refutation game mechanism.

Refutation Game: The Key to Trust

The refutation game comes into play when a rollup node operator submits an invalid commitment. If this occurs, anyone can challenge the commitment by submitting proof.

This process is crucial because Tezos L1 is a high-security, relatively low-throughput environment, and escalating every execution to L1 could overwhelm the system. The refutation game offers a selective approach to maintaining trust and security in the rollup ecosystem.

Scalability Constraints of the Existing System

With SCORUs and the current data availability of Tezos L1, Tezos can scale to thousands of tps, a significant feat in the blockchain space.

An approximate calculation suggests that with the current protocol parameters (block size, block time, etc.), Tezos can achieve a maximum throughput of around 3400 TPS.

However, to truly compete with Web 2.0 and traditional payment systems, blockchains need to achieve a level of scalability that allows them to process millions of transactions while maintaining security and decentralization. Tezos is on the path to achieving this level of scalability.

So, let's take a look at the next big upgrade that will enable Tezos to become a platform capable of processing more than a million transactions per second.

But before that, let’s understand the data availability problem.

The Data Availability Problem

To compete with Web 2.0 and traditional payment systems, blockchains must offer comparable or superior throughput. However, managing millions of transactions per second while maintaining security and decentralization of the L1 blockchain necessitates a unique approach.

Posting all transaction data to an L1 is impractical, so the data must be stored off-chain, separate from L1 blocks.

This strategy, however, presents a challenge known as the data availability problem. It involves ensuring that off-chain data is reliably accessible to all network participants.

Meaning,

In a blockchain network, all nodes must access and verify all transaction data to uphold the system's integrity and security. Scaling solutions like rollups move some transaction data from the main blockchain (L1) to a second layer (L2) to enhance throughput and alleviate congestion.

The question then arises: if the data isn't stored on the main blockchain, how can we ensure its availability for verification by all nodes?

Inaccessibility could prevent nodes from verifying transactions, potentially leading to fraud or other issues.

The Current DA Solution Is Not Fully Decentralized

It's possible to achieve a million TPS of throughput on Tezos right now.

However, to achieve that, Tezos currently relies on Data-Availability Committees, also known as DACs.

A DAC is essentially a consortium of data providers that maintain off-chain data for rollups and make it accessible through the reveal data channel. This channel serves as a bridge that allows Smart Rollups to access data that is external to the Tezos blockchain.

The DAC model offers a high level of security, provided that there are a few honest participants within the DAC.

However, it's important to note that the presence of a few dishonest participants can potentially restrict the throughput of the rollup. This arrangement carries different trust assumptions compared to a fully decentralized data availability solution.

However, Tezos has charted a clear path towards a fully decentralized data availability solution. A preliminary version is already operational on the testnet, with a planned rollout in early 2024. This solution offers the promise of significant scalability while maintaining the system's decentralization.

Now, let's explore the fully decentralized data availability solution coming to Tezos.

The Tezos Data-Availability Layer (DAL)

The proposed Data Availability Layer (DAL) is the most critical component of the Tezos's scalability roadmap. It is designed to ensure the accessibility and integrity of data.

A Fully Decentralized Solution

The DAL will be a fully decentralized solution that stores data and provides guarantees about its availability.

It relies on Layer 1 consensus, i.e., bakers (aka Tezos validators).

An Independent P2P Network

The DAL will operate as an independent peer-to-peer (P2P) network running parallel to Tezos' Layer 1.

Data can be submitted and retrieved on this network, and bakers continuously monitor the DAL to attest on Layer 1 whether a given piece of data is available on the DAL.

Unlike the P2P protocol used by Layer 1, where each node receives all data, the DAL's P2P protocol is designed such that nodes receive only part of the data.

Data Availability Sampling (DAS)

Beyond the attestations provided by bakers, DAL nodes also implement a method known as data availability sampling (DAS). This technique, while only necessitating the download of a minimal portion of the total data, assures with a high degree of certainty that the data is accessible.

Here's how it works:

Erasure Coding

The first step in DAS is to erasure code the data. Erasure coding is a data protection method where data is broken into fragments, expanded, and encoded with redundant data pieces.

These pieces are then stored across different locations. This process ensures that even if some data is lost or unavailable, the original data can still be reconstructed from the remaining pieces.

After erasure coding, the data is converted into a polynomial, a mathematical representation that can be reconstructed from any point. This conversion extends the data and allows for more efficient sampling.

Evaluation at Random Indices

The erasure-coded data is then evaluated at a number of random indices.

To successfully trick DAS nodes into thinking the data was made available when it wasn’t, an attacker would have to hide more than 50% of the block.

If 50% of the erasure-coded data is available, the entire block can be reconstructed. The probability of less than 50% being available after many successful random samples is very small, making this a robust method for verifying data availability.

This process is combined with KZG commitments (also known as Kate commitments) to prove that the original data was erasure coded properly.

Roles and Responsibilities within the DAL

The DAL ecosystem comprises several roles:

Adding Data: Anyone can submit new data to the DAL, though pre-approval by Layer 1 is required to deter spamming.

Storing Data: Anyone can contribute to storing data. The more people contributing, the higher the resiliency and efficiency of the DAL.

Verifying Availability: Bakers continuously publish attestations on Layer 1, declaring the availability of the data. Other DAL nodes perform data availability sampling.

Retrieving Data: Anyone can retrieve any data from the DAL. Infrastructure and Compatibility Given the different P2P protocol, a separate node is implemented for connecting to the DAL network.

Both rollup operators and bakers will need to run DAL nodes. Hardware recommendations for participating in the DAL will be provided later.

Moreover, the SCORUs were designed for compatibility with a DAL.

How Does DAL Achieve a Million TPS?

Achieving a million TPS on Tezos through DAL involves a complex interplay of various parameters and techniques.

Here's a detailed breakdown:

Slots and Slot Size: Think of slots as lanes on a highway and slot size as the size of the vehicles. The more lanes (slots) and the bigger the vehicles (slot size), the more traffic (transactions) you can handle.

To reach 1 million TPS, you need a bandwidth of 10 MB.

This can be achieved with 256 slots and a slot size of 512 KiB if the time between blocks is 15 seconds (its the current block time of Tezos).

Erasure Coding and Data Conversion: This is like breaking down a truck's cargo (data) into smaller packages (shards), making it easier to handle. Each slot's data is broken down into 2048 shards.

Attesters and Attestations: Attesters are like checkpoints on the highway. They are assigned to a subset of 2048 shards in proportion to the stake they hold. They signal when they are able to download all their shards from the network, for a given slot.

Bandwidth Requirements: This is the capacity of the highway to handle traffic.

The system allows each slot producer to distribute shards of total size S × R MiB to the attesters and other slot consumers.

Attesters on a given set of shards receive the corresponding data for all slots. Slot consumers must be able to receive enough shards to reconstruct the original data reliably.

S × R MiB refers to the total size of the data that needs to be distributed by each slot producer to the attesters and other slot consumers.

Here, S represents the slot size, which is the amount of data in each slot, and R represents the replication factor, which is a parameter of the erasure code used to protect the data. The replication factor determines how many times each piece of data is duplicated to ensure its availability and protect against data loss.

So, S × R MiB is the product of the slot size and the replication factor, measured in Mebibytes (MiB). This product gives the total amount of data, including all the replicated pieces, that needs to be distributed for each slot.】

DAL/P2P Design: This is the design of the highway system itself. It allows for the propagation of shards either eagerly or lazily along software-defined, dynamic routes.

By carefully managing these parameters and techniques, the proposed DAL of Tezos will be able to support up to a million TPS.

Current Status and Roadmap

  • An initial version of the DAL is presently accessible on the Mondaynet testnet.

  • The existing DAL version lacks data-availability sampling implementation, and involvement is not mandatory for testnet bakers.

  • Anticipated changes before the Mainnet release include the implementation of data-availability sampling and mandatory participation for bakers.

  • The Mainnet release of the DAL is projected for early 2024.

That's it for this week. See you next Sunday!

💎 Gem of the Week 🧵

Subscribe to Just The Metrics 👇

LET US HEAR IT

What’d you think of this email? Tap your choice below 👇

Give us Feeback!

We would love to hear from you!

That's it for today, see you next week!

If you want to learn more about Cardano, crypto metrics and fundamentals give us a follow

DISCLAIMER: None of this is financial advice. This newsletter is strictly educational and is not investment advice or a solicitation to buy or sell assets or make financial decisions. Please be careful and do your own research.

Join the conversation

or to participate.