"Ethereum's PeerDAS" | Composable and distributed systems group

Mon, 2026-01-05

Jan 07, 2026

Sharing our experimental call summaries.
Al-generated digests of Yak Collective study groups.

Setting and Topic

Reading: Ethereum’s PeerDAS | https://eprint.iacr.org/2024/1362

The CADS Study Group focused on Ethereum’s PeerDAS (Peer Data Availability Sampling) and its role in scaling rollups and L2s. The discussion ranged from intuitive polynomial-based explanations, to architectural tradeoffs in Ethereum’s roadmap, to limits of what PeerDAS does and does not solve (especially over long time horizons), and a brief organizational/process perspective from Ethereum Foundation (EF) R&D.

Participants generally did not attempt a line-by-line reading of the core paper; instead they aimed for high-level understanding, intuition-building, and connections to broader distributed systems and memory-scaling issues.

Framing the Problem: Scalability, Verification, and Data Availability

The group began by stepping back to why mechanisms like PeerDAS exist.

Replication bottleneck on L1s
Base-layer blockchains replicate everything to every full node. This makes verification simple and robust, but caps throughput at whatever a typical full node can handle without becoming “a supercomputer.” If you want decentralization (home stakers, non-professional nodes), you must keep per-node resource requirements modest, which constrains throughput and leads to fee spikes (referencing the “high fee days” about a decade ago).
Verification vs. data availability as distinct problems
One participant emphasized that splitting verification across nodes (as in rollups) is relatively well-understood:
- Zero-knowledge rollups: publish zk proofs that all transactions in a batch are valid.
- Optimistic rollups: assume correctness unless someone posts a fraud proof.
The harder problem is data availability (DA):
- It’s not enough to know a rollup state could be verified; the network must know the underlying transaction data exists and can be retrieved, especially if a sequencer misbehaves or you need to fork and recreate state.
- The specific threat model: a single bad sequencer produces valid blocks (or valid proofs/headers), but never actually publishes the full data. Because not every node downloads everything, this attack could slip through, compromising the rollup’s safety.
Ethereum’s “moderation in all things” design philosophy
One participant noted how Ethereum, compared to Bitcoin and Solana, seems to consistently pick “moderate” design points:
- Bitcoin: push most scaling off-chain (e.g., Lightning).
- Solana: “really beefy chain” with high on-chain throughput, beefy hardware.
- Ethereum: trying to be both a “world computer” and remain decentralized. This tension leads to solutions like rollups + DA sampling—pushing data off the main execution path but retaining strong security properties for modest hardware operators.

Polynomial Encoding and Erasure Coding: Shared Intuitions

Several participants anchored their understanding of PeerDAS in polynomial interpolation and erasure coding, while acknowledging they were skipping the heavy cryptographic details.

Basic polynomial intuition
- If you have ( n ) data points, you can fit them with a polynomial of degree ( n-1 ).
- If data is encoded into such a polynomial (or more precisely into polynomials over a finite field), you can reconstruct missing data points as long as you have enough samples.
- One participant: they think of this as translating data into polynomial space, where the original data and its “extension points” all lie on that polynomial.
Erasure coding angle
- Data is encoded and then extended to additional points, creating redundancy.
- If some subset of the encoded points is lost, you can still reconstruct the entire polynomial and hence the original data, as long as the loss is below a threshold.
- This means there’s no tiny “critical” byte or chunk that a malicious actor can hide that is both:
  - Small enough to escape detection, and
  - Large enough to prevent reconstruction.
- If they hide too much, reconstruction fails and it’s detectably invalid; if they hide too little, the erasure coding “heals” the missing parts.
Probability and sampling
- The intuition: each node samples a small fraction of the encoded data. If everyone’s random samples consistently succeed, then with high probability the full data was indeed published.
- Several participants used AI models (Claude, ChatGPT) to get explanations of concepts like cosets and KZG commitments underlying the scheme. They noted these helped develop intuition, but they still consider themselves far from expert.
Gaps acknowledged
- Participants explicitly did not dive into:
  - KZG (Kate–Zaverucha–Goldberg) polynomial commitments beyond “seems complicated.”
  - The exact math of “cosets” in the context of evaluation domains and FFT-style structures.
- There was an explicit boundary: they were comfortable with polynomial/erasure-coding intuition but not with the full cryptographic construction.

From Call Data to Blobs to PeerDAS

The group reconstructed the historical pathway of Ethereum’s L2 scaling:

Phase 1: L2 data in call data
- Early rollups and L2s stored their transaction data in regular calldata on L1.
- This increased gas costs significantly because calldata is expensive and fully replicated.
Phase 2: EIP-4844 blobs
- Ethereum introduced blobs: large, cheaper data segments attached to blocks, intended only for data availability, not for permanent storage or direct EVM access.
- Initially, nodes still had to download all blobs, so the per-node bandwidth/storage requirement remained a bottleneck.
Phase 3: PeerDAS as a blob-scaling mechanism
- PeerDAS was understood as a mechanism to:
  - Keep blobs as the DA substrate for rollups/L2s.
  - Allow each node to handle only a fraction of each block’s blob data (figures like “1/8” or “1/8 of the data” were mentioned).
  - Use polynomial/erasure coding plus sampling to ensure that, even though each node only sees part of the blob data, the network as a whole can be confident the whole data was published.
Blob usage and lifecycle
- One participant noted that blob data is ephemeral, on the order of ~18 days of retention, and is meant to secure ledger and rollup correctness, not to serve as permanent archival storage of application payloads.
- Example analogy: a receipt from a store:
  - While you have the detailed receipt, you can verify line items and catch basic billing errors.
  - Long-term, the system may only retain that “a purchase of $X occurred” without retaining all itemized details.
- L2s (e.g., Base) and especially stablecoin-related activity reportedly consume a large portion of blob space in practice. The group flagged this but did not go deep into specific metrics.

What PeerDAS Solves vs. What It Explicitly Does Not

A major part of the discussion was clarifying the guarantees of PeerDAS and their limits.

What PeerDAS Targets

Short-term DA guarantees at the time of block creation
- PeerDAS ensures that at block creation time the data is widely enough published (and redundantly encoded) that:
  - No adversary can “sneak in” a block whose data was never made available.
  - Nodes can probabilistically verify data availability by sampling small portions.
Mitigating the “withholding by a single sequencer” attack
- In the rollup/sequencer context, PeerDAS aims to ensure:
  - A single malicious sequencer cannot create a valid-looking block that actually hides the underlying data from the network.
  - As long as the data is initially published and propagated, fork/reorg recovery is possible based on the available data.

What PeerDAS Cannot Guarantee

Long-term persistence
- PeerDAS does not guarantee that data will remain accessible indefinitely:
  - Nodes can delete blob data after the DA period (e.g., ~18 days).
  - There is no on-chain guarantee that anyone will keep the full historical data around.
- The assumption: raw storage is cheap (e.g., spinning disks, data hoarders, or archival nodes run by EF or others). As long as at least one such entity archives the data, history can be recovered.
Post-hoc data availability
- No mechanism in PeerDAS can prevent future deletion:
  - If every archiver deletes a particular dataset in, say, 10 years, the network cannot retroactively reconstruct it.
  - This is outside the scope of PeerDAS, which is about initial publication and short-term availability.
Resilience and “growing over time”
- One participant mentioned reading that the “resilience” of this approach grows as network state grows, perhaps due to the probability of successful DA sampling improving with more nodes and more data.
- However:
  - They did not derive this from first principles.
  - They flagged their understanding as incomplete and speculative regarding exact transition points or formal guarantees.

Sharding, “Danksharding,” and What Is Actually Being Sharded

Terminology created some confusion, especially around “danksharding” and how it interacts with blobs and PeerDAS.

Sharding notion
- In older sharding designs, the idea was to split the chain into multiple “shards,” each with its own subset of data and potentially its own validators.
- In current Ethereum roadmap language (danksharding / proto-danksharding), the design shifts to:
  - A single beacon chain with “shard-like” data streams (blobs).
  - Centralized proposer, but with dispersed data storage/availability responsibilities.
What is actually sharded?
- One participant observed: it’s not obvious that individual blobs are actually “sharded” in the sense of being broken apart and independently stored.
- Their understanding (tentative) was:
  - Validators keep only a subset (e.g., 1/8) of the overall blob data per block.
  - But within that subset, they might keep entire blobs, not pieces of each blob.
- This area remained fuzzy, and the group did not reach a crisp, shared mental model about how exactly blobs are partitioned across peers under PeerDAS.
KZG commitments and polynomial commitments
- KZG commitments were repeatedly flagged as a concept people tried (and failed) to fully understand via ChatGPT:
  - Recognized as central to verifying polynomial evaluations succinctly.
  - But the technical details (discrete logs, trusted setup, pairing-based cryptography) were outside the comfort zone for this discussion.

Organizational and Process View: Ethereum R&D as a Technical Org

One participant brought a “fly on the wall” perspective from attending an Ethereum R&D meeting in Istanbul where early PeerDAS proposals were debated:

History glimpse
- They recalled sitting in on a research meeting around November 2023, not long after an initial proposal (by Danny, referencing a link another member had shared) was posted.
- The atmosphere: a “regular detailed technical meeting” with argument, uncertainty, and tradeoff debate, not a fully settled, top-down roadmap.
Assessment of EF as an engineering org

The participant characterized EF R&D as:

One of the healthiest technical organizations they’ve seen. - Populated by competent, sincere people grappling seriously with tradeoffs and details.

They contrasted this positively with technical decision-making processes they’ve seen in private corporations.
Significance of PeerDAS shipping
- Watching the journey from “fragile and uncertain” early discussion to production deployment (with Vitalik publicly praising it) was taken as evidence that:
  - The broader Ethereum ecosystem can move hard cryptographic ideas from research to mainnet.
  - There is a functioning pipeline from whiteboard to live protocol.

Memory as a Cross-Domain Bottleneck: Blockchain and AI

Toward the end, the group stepped back to compare blockchain DA constraints with memory constraints in AI systems.

Hierarchy and cost of memory
- In both contexts:
  - The closest memory to the core computation is the most expensive and capacity-limited.
    - In AI: on-chip SRAM, HBM close to accelerators.
    - In blockchains: on-chain storage, short-term DA-limited blobs.
  - Cheaper, larger, slower storage exists “further away”:
    - Disk or archival cold storage for blockchain history.
    - Disaggregated storage for older AI training data.
Analogy to LLM training data
- One analogy:
  - Blob-level DA ≈ high-bandwidth, close-to-compute memory used for correctness and short-term reasoning.
  - Off-chain archival data ≈ historical training corpora like those used for GPT-3:
    - Potentially important for audits, reproducibility, or re-training.
    - Not constantly resident in fast memory, but still part of the system’s “long-term history.”
Theme: composable/distributed systems constrained by memory
- The group saw a recurring pattern in composable and distributed systems:
  - The ultimate constraints are often about how much data you can keep close to the core and how reliably you can ensure it was once available, rather than raw compute flops.
  - PeerDAS is one instantiation of this theme in blockchain land, paralleling HBM/DRAM bottlenecks in large-scale AI clusters.

Wrap-Up

Key takeaways

Data availability is a distinct, hard problem separate from transaction verification; PeerDAS targets this by making it probabilistically infeasible to sneak in blocks whose data was never published.
Polynomial encoding and erasure coding provide the core intuition: data is embedded in a polynomial with enough redundancy that small missing parts can be reconstructed, while large omissions become detectably invalid.
Ethereum’s shift from calldata to blobs, and then to PeerDAS-backed blobs, reflects a broader “moderation in all things” philosophy: scaling via rollups and DA sampling while preserving decentralization and home-node viability.
PeerDAS secures short-term availability at block time; it does not guarantee long-term archival persistence, which depends on separate archival efforts.
Concepts like “danksharding,” KZG commitments, and exact blob-partitioning behaviors remain partially opaque to the group and are recognized as areas needing deeper study.
Similar memory and bandwidth constraints appear in both blockchain and AI systems: expensive, close-to-compute memory vs. cheaper, distant archival storage.

Open questions explicitly surfaced

How exactly are blobs partitioned and stored across validators under PeerDAS (e.g., does each validator store entire blobs, or chunks across many blobs)?
What are the precise guarantees and limits of PeerDAS’s probabilistic security (e.g., how does network size or state growth quantitatively affect resilience)?
How do KZG commitments and coset constructions work in detail in the PeerDAS design, beyond high-level metaphors?
In real-world L2s (e.g., Base), how exactly are blobs used in practice, and how do blob price dynamics evolve as PeerDAS and further scaling measures roll out?

Yak Collective Discord call thread:
https://discord.com/channels/692111190851059762/1457581129979793606

Discussion about this post

Ready for more?