From aac520880fdf88af516f124b8b88156db18f71f0 Mon Sep 17 00:00:00 2001 From: Sebastian Nagel Date: Mon, 27 Oct 2025 09:25:09 +0100 Subject: [PATCH 1/8] Sketch outline for overview chapter --- docs/leios-design/README.md | 51 ++++++++++++++++++++++++++++--------- 1 file changed, 39 insertions(+), 12 deletions(-) diff --git a/docs/leios-design/README.md b/docs/leios-design/README.md index 15f9a36c1..eab24fa9e 100644 --- a/docs/leios-design/README.md +++ b/docs/leios-design/README.md @@ -33,18 +33,45 @@ Ouroboros Leios is introducing _(high-)throughput_ as a third key property and i As it was the case for the Praos variant of Ouroboros (TODO: cite shelley network-design), the specification embodied in the published and peer-reviewed paper for Ouroboros Leios (TODO: cite leios paper) was not intended to be directly implementable. This was confirmed during initial R&D and feasibility studies, which identified several unsolved problems with the fully concurrent block production design proposed in the paper. The latest design presented in CIP-164, also known as "Linear Leios", focuses on the core idea of better utilizing resources in between the necessary "calm" periods of the Praos protocol and presents an immediately implementable design. > [!WARNING] -> TODO: (re-)introduce the main protocol flow of Leios? - -> [!WARNING] -> TODO: Notes on what could go here -> - Node is a concurrent, reactive (real-time) system -> - contrast real-time to not be ms-level hard real-time (control systems etc.), but as "timely" action is crucial to the success (see also network-design.pdf) -> - today concurrency is minimal and basically transaction submission and block diffusion happening at the same time + side-topics like peer sharing -> - only two responsibilities, but peer cardinalities make this highly concurrent (with quite some shared resources; related: https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsubsection.5.3.2) -> - adding production and diffusion of Leios block data and votes will emphasize this further -> - use this to specify behavior as largely independent work-flows? -> - it is crucial that the node stays reactive and resource management across the concurrent responsibilities is crucial -> - bounding resource usage and/or prioritizing certain tasks over others will be crucial for the system to act +> TODO: Rewrite/expand using outline: +> +> - Cardano as a cryptocurrency system -> consensus protocol is essential to realize a permissionless, globally distributed ledger +> - two key properties that blockchains provide to realize this: persistence and liveness +> - Leios adds another one: (high-)throughput +> +> - As it was for Praos, the Leios research paper was not intended to be directly implementable +> - initial R&D and feasibility studies identified several unsolved problems with fully concurrent block production when considering the concrete Cardano ledger +> - Hence, design proposed in CIP-164 focuses on the core idea of better utilizing resources between the (name-giving) "calm periods" of the Praos protocol -> immediately implementable already providing orders of magnitude higher throughput +> +> - Summary of Leios protocol flow? +> +> - The cardano node +> - Node is a concurrent, reactive (real-time) system, operating in an adversarial environment +> +> - real-time system +> - contrast real-time to not be ms-level hard real-time (control systems etc.), but "timely" action is crucial to the success (see also network-design.pdf) +> - praos data diffusion targets, % reached within 5s (Delta): threshold 95%+, target 98%+, stretch 99%+ (https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsection.8.1) +> - reward calculation / epoch boundary is an example / current situation where the deadlines are currently often missed +> - not inherently a hard deadline, but we must realize that the network is most vulnerable in such times -> when honest nodes are less efficient in building the longest chain +> +> - concurrency +> - current responsibilities of a node: +> - basically transaction submission and block diffusion happening at the same time + side-topics like peer sharing +> - despite only two responsibilities, peer cardinalities (~10 upstream, ~100 downstream) make this highly concurrent (with quite some shared resources; related: https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsubsection.5.3.2) +> +> - new responsibilities +> - adding production and diffusion of Leios block data and votes will emphasize this further +> - use this to specify new node behavior as largely independent work-flows? +> - resource management across the concurrent responsibilities is crucial (https://github.com/input-output-hk/ouroboros-leios/blob/main/docs/ImpactAnalysis.md#resource-management) +> +> - optimize for the worst case (https://cardano-scaling.github.io/cardano-blueprint/principles/index.html#optimise-only-for-the-worst-case) +> - following Hyrum's law, despite being an overlay protocol, we must acknowledge that Leios throughput will be relied upon +> - Leios protocol security (https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md#protocol-security) is based on worst case diffusion of EBs and their transaction closures being "small enough" +> - small enough: difference between optimistic diffusion (which leads to a certified EB) and worst case diffusion must be bounded (by L_diff and Praos' Delta) +> - as before with Praos, even with new information exchange requirements, the node must be resilient against denial of service and asymmetric resource attacks +> +> - it is crucial that the node stays reactive and (continues) to operate in a reactive and "in-time" +> - bounding resource usage and/or priorization across responsibilities is essential > - this stresses importance of non-functional requirements (per component), performance engineering and conducting benchmarks along the way # Risks and mitigations From 11b27afaae95cb1fc76cd534de0ee68312a8d391 Mon Sep 17 00:00:00 2001 From: Sebastian Nagel Date: Mon, 27 Oct 2025 11:01:02 +0100 Subject: [PATCH 2/8] Write introductory/overview sections --- docs/leios-design/README.md | 108 +++++++++++++++++++----------------- 1 file changed, 58 insertions(+), 50 deletions(-) diff --git a/docs/leios-design/README.md b/docs/leios-design/README.md index eab24fa9e..28d181908 100644 --- a/docs/leios-design/README.md +++ b/docs/leios-design/README.md @@ -10,7 +10,7 @@ author: This technical design document bridges the gap between the protocol-level specification ([CIP-164](https://github.com/cardano-foundation/CIPs/pull/1078)) and its concrete implementation in [`cardano-node`](https://github.com/IntersectMBO/cardano-node). While CIP-164 defines *what* the Leios protocol is and *why* it benefits Cardano, this document addresses *how* to implement it reliably and serve as a practical guide for implementation teams. -This document builds on the [impact analysis](../ImpactAnalysis.md) and [early threat modelling](../threat-model.md) conducted. The document outlines the necessary architecture changes, highlights key risks and mitigation strategies, and proposes an implementation roadmap. As the implementation plan itself contains exploratory tasks, this document can be considered a living document and reflects our current understanding of the protocol, as well as design decisions taken during implementation. +This document builds on the [impact analysis](../ImpactAnalysis.md) and [early threat modelling](../threat-model.md) conducted. The document outlines the necessary architecture changes, highlights key risks and mitigation strategies, and proposes an implementation roadmap. As the implementation plan itself contains exploratory tasks, this document can be considered a living document and reflects our current understanding of the protocol, as well as design decisions taken during implementation. Besides collecting node-specific details in this document, we intend to contribute implementation-independent specifications to the [cardano-blueprint](https://cardano-scaling.github.io/cardano-blueprint/) initiative and also update the CIP-164 specification through pull requests as needed. @@ -26,53 +26,61 @@ This document is a living artifact and will be updated as implementation progres # Overview -Cardano as a cryptocurrency system fundamentally relies on an implementation of Ouroboros, the consensus protocol (TODO cite praos and genesis papers), to realize a permissionless, globally distributed ledger that guarantees _persistence_ and _liveness_. These two properties are central to the value proposition of Cardano, as they enable secure and censorship-resistant transfer of value, as well as the execution of smart contracts in a trustless manner. +Cardano as a cryptocurrency system fundamentally relies on an implementation of Ouroboros, the consensus protocol (TODO cite praos and genesis papers), to realize a permissionless, globally distributed ledger. The consensus protocol provides two essential properties that underpin Cardano's value proposition: **persistence** ensures immutability of confirmed transactions, while **liveness** guarantees that new valid transactions will be included. These properties enable secure and censorship-resistant transfer of value, as well as the execution of smart contracts in a trustless manner. -Ouroboros Leios is introducing _(high-)throughput_ as a third key property and is extending Ouroboros Praos, the currently deployed variant. By enabling the network to process a significantly higher number of transactions per second, Leios enables Cardano to scale economically and meet the demands of a growing user base and application ecosystem. +Ouroboros Leios introduces **high-throughput** as a third fundamental property, extending the currently deployed Ouroboros Praos variant. By enabling the network to process a significantly higher number of transactions per second, Leios addresses the economic scalability requirements necessary to support a growing user base and application ecosystem. This enhancement transforms Cardano from a system optimized for security and decentralization into one that maintains these properties while achieving higher transaction processing capacity demanded by modern blockchain applications. -As it was the case for the Praos variant of Ouroboros (TODO: cite shelley network-design), the specification embodied in the published and peer-reviewed paper for Ouroboros Leios (TODO: cite leios paper) was not intended to be directly implementable. This was confirmed during initial R&D and feasibility studies, which identified several unsolved problems with the fully concurrent block production design proposed in the paper. The latest design presented in CIP-164, also known as "Linear Leios", focuses on the core idea of better utilizing resources in between the necessary "calm" periods of the Praos protocol and presents an immediately implementable design. +## From research to implementation + +As was the case for the [Praos variant of Ouroboros](https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsection.5.1), the specification embodied in the published and peer-reviewed [research paper for Ouroboros Leios](https://eprint.iacr.org/2025/1115.pdf) was not intended to be directly implementable. Initial research and development studies confirmed this expectation, identifying several unsolved problems with the fully concurrent block production design when considering the concrete Cardano ledger and what consequences this would have (TODO: cite suitable R&D reports, [Tech Report #2](https://github.com/input-output-hk/ouroboros-leios/blob/main/docs/technical-report-2.md#conflicts-ledger-and-incentives)). + +The design presented in [CIP-164](https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md), also known as "Linear Leios", addresses these implementation challenges by focusing on the core insight of better utilizing network and computational resources during the necessary and eponymous "calm periods" of the Praos protocol. This approach provides an immediately implementable design that can deliver orders of magnitude higher throughput while preserving the security guarantees that make Cardano valuable. + +The Linear Leios protocol operates by allowing a second, bigger type of block to be produced in the same block production opportunity. Block producers can produce and announce an endorser block (EB), which endorses additional transactions that would not fit within the Praos block. EBs are distributed through the network and subjected to validation by a committee of stake pools, who vote on their transaction data closure's availability and validity. Only EBs that achieve a high threshold of stake-weighted votes become certified and can be included in the ledger through exclusive anchoring of a certificate in the subsequent block - now called a ranking block (RB). This mechanism allows for significantly higher transaction throughput while maintaining the security properties of the underlying Praos consensus. See the CIP for more details on the protocol specification and rationale itself. + +## Cardano node as a real-time system + +The implementation of Leios must be understood in the context of the Cardano node as a concurrent, reactive system operating under real-time constraints in an adversarial environment. While "real-time" in this context does not refer to the millisecond-level hard deadlines found in industrial control systems, timely action nontheless remains crucial to protocol success and network security. + +The currently deployed Praos implementation establishes clear [data diffusion targets](https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsection.5.1): blocks must reach 95% of nodes within the 5-second $\Delta$ parameter, with target performance at 98% and stretch goals at 99%. While these are comfortably achieved most of the time, blocks are regularly adopted within 1 second across the network, there are some situations even in the current system where the target is not reached. For example, due to reward calculations happening at the epoch boundary. + +Despite being hard deadlines, these targets reflect the reality that network vulnerability increases when not being met. The protocol's safety and liveness guarantees depend on honest nodes being able to propagate blocks rapidly enough to prevent adversarial forks from gaining traction. Failure to meet these timing constraints can lead to increased rates of short forks, reduced chain quality, and - if persistent - ultimately compromise the integrity of the ledger. + +## Concurrency and resource management + +The current primary responsibilities of a Cardano node are roughly: + +- Block diffusion: receiving chains from upstream, validate and select the best chain, and transmit chains downstream. +- Transaction submission: receiving, validating, and transmitting transactions to be included in blocks. +- Block production: creating new blocks and extending current chain when selected as slot leader. + +Despite this apparent simplicity, this already results in a highly concurrent system once cardinalities of upstream and downstream network peers are considered. A node typically maintains in the order of 10 upstream and 100 downstream connections, each of which may be simultaneously requesting or serving data. All of these operations share critical resources including memory, CPU, and network bandwidth, requiring careful resource management to ensure that timing requirements are met even under load. + +Leios significantly expands this concurrency model by introducing new responsibilities: + +- Endorser block and closure diffusion: receiving, validating, and transmitting EBs and their transaction closures. +- Voting and vote diffusion: receiving, validating, and transmitting own and foreign votes on EBs. + +With these (at least) two additional functionalities, each across many peers, the node set of concurrent tasks only increases. The implementation must ensure that the increased data flows and processing demands do not interfere with each other, or priorization mechanisms ensure to meet the stringent timing constraints necessary for protocol security. + +## Designing for the worst-case + +Related to the principle of [optimizing for the worst case](https://cardano-scaling.github.io/cardano-blueprint/principles/index.html#optimise-only-for-the-worst-case), the security argument for Leios protocol depends critically on worst-case diffusion characteristics. Endorser blocks and their transaction closures must be "small enough" that the difference between optimistic diffusion (leading to successful certification) and worst-case diffusion remains bounded by the protocol parameter $L_\text{diff}$ and Praos's maximum delay $\Delta$. + +If the optimistic, average-case performance is improved with suitable algorithms, data structures and optimizations, but the worst-case scenario is not, more conservative parameter choices would be required to maintain security guarantees. This would negate the anticipated benefits of the optimizations in the first place. Therefore, the implementation must prioritize ensuring that even in adverse network conditions or under attack, the diffusion of EBs and their closures remains within acceptable bounds. > [!WARNING] -> TODO: Rewrite/expand using outline: -> -> - Cardano as a cryptocurrency system -> consensus protocol is essential to realize a permissionless, globally distributed ledger -> - two key properties that blockchains provide to realize this: persistence and liveness -> - Leios adds another one: (high-)throughput -> -> - As it was for Praos, the Leios research paper was not intended to be directly implementable -> - initial R&D and feasibility studies identified several unsolved problems with fully concurrent block production when considering the concrete Cardano ledger -> - Hence, design proposed in CIP-164 focuses on the core idea of better utilizing resources between the (name-giving) "calm periods" of the Praos protocol -> immediately implementable already providing orders of magnitude higher throughput -> -> - Summary of Leios protocol flow? -> -> - The cardano node -> - Node is a concurrent, reactive (real-time) system, operating in an adversarial environment -> -> - real-time system -> - contrast real-time to not be ms-level hard real-time (control systems etc.), but "timely" action is crucial to the success (see also network-design.pdf) -> - praos data diffusion targets, % reached within 5s (Delta): threshold 95%+, target 98%+, stretch 99%+ (https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsection.8.1) -> - reward calculation / epoch boundary is an example / current situation where the deadlines are currently often missed -> - not inherently a hard deadline, but we must realize that the network is most vulnerable in such times -> when honest nodes are less efficient in building the longest chain -> -> - concurrency -> - current responsibilities of a node: -> - basically transaction submission and block diffusion happening at the same time + side-topics like peer sharing -> - despite only two responsibilities, peer cardinalities (~10 upstream, ~100 downstream) make this highly concurrent (with quite some shared resources; related: https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsubsection.5.3.2) -> -> - new responsibilities -> - adding production and diffusion of Leios block data and votes will emphasize this further -> - use this to specify new node behavior as largely independent work-flows? -> - resource management across the concurrent responsibilities is crucial (https://github.com/input-output-hk/ouroboros-leios/blob/main/docs/ImpactAnalysis.md#resource-management) -> -> - optimize for the worst case (https://cardano-scaling.github.io/cardano-blueprint/principles/index.html#optimise-only-for-the-worst-case) -> - following Hyrum's law, despite being an overlay protocol, we must acknowledge that Leios throughput will be relied upon -> - Leios protocol security (https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md#protocol-security) is based on worst case diffusion of EBs and their transaction closures being "small enough" -> - small enough: difference between optimistic diffusion (which leads to a certified EB) and worst case diffusion must be bounded (by L_diff and Praos' Delta) -> - as before with Praos, even with new information exchange requirements, the node must be resilient against denial of service and asymmetric resource attacks -> -> - it is crucial that the node stays reactive and (continues) to operate in a reactive and "in-time" -> - bounding resource usage and/or priorization across responsibilities is essential -> - this stresses importance of non-functional requirements (per component), performance engineering and conducting benchmarks along the way +> TODO: the situation is not as dire though, we have some design freedom because strictly less work needs to be done on the worst-case path (e.g. rely on certified validity and cheaply build ledger states instead of validating transactions) + +Besides, as with Praos, the enhanced information exchange requirements of Leios must not compromise the system's resilience against denial of service attacks and asymmetric resource consumption attempts. The implementation must maintain defensive properties while supporting the increased data flows and processing requirements that enable higher throughput. + +## Implementation imperatives + +In summary, the technical design described in subsequent chapters must ensure that nodes continue to operate reactively and meet timing requirements despite increased responsibilities and data volumes. This requires careful bounding of resource usage and sophisticated prioritization mechanisms across concurrent responsibilities. + +The complexity of this challenge emphasizes the critical importance of non-functional requirements specification for each component, rigorous performance engineering practices, and continuous benchmark validation throughout the development process. Only through systematic attention to these implementation details can the protocol deliver the security and performance properties that make Leios a valuable enhancement to Cardano's capabilities. + +The following chapters detail the specific risks that inform architectural decisions, the concrete technical design that addresses these challenges, and the implementation plan that will deliver a production-ready system. # Risks and mitigations @@ -104,7 +112,7 @@ As it was the case for the Praos variant of Ouroboros (TODO: cite shelley networ > - can be done from stake- and network-based attackers > - trivially impacts high-throughput because no certifications happening > - however, more advanced, potential avenue to attack blockchain safety (impact praos security argument) when carefully partitioning the network -> - mitigation: L_diff following the [security argument](https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md#protocol-security) +> - mitigation: L_diff following the [security argument](https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md#protocol-security) > - motivates validation of optimistic and worst-case diffusion paths ## Assumptions to validate early @@ -735,7 +743,7 @@ These principles are also reflected in the choice of validation techniques appli ## Correctness in two dimensions -Formal specification and verification play a central role in ensuring correctness throughout the implementation process, which happens along two dimensions: +Formal specification and verification play a central role in ensuring correctness throughout the implementation process, which happens along two dimensions: - Maturity: Implementations maturing from proof of concept, prototype to production-ready release candidates - Diversity: Multiple emerging implementations of Cardano nodes using different programming langugages and targeting slightly different use cases @@ -749,7 +757,7 @@ Multiple implementations provide additional assurance through diversity. The pri - validate that the specification is sufficiently precise and complete, - exercise different corner cases that might be missed by a single implementation, and - reduce the risk that a subtle bug in one implementation compromises the entire network. - + The formal specification must be maintained as a living artifact throughout implementation. As design decisions are made to address practical concerns, these decisions must be reflected back into the specification to ensure it remains accurate. This bidirectional relationship between specification and other steps on the implementation plan is essential. The specification guides implementation, while implementation experience reveals necessary refinements to the specification. Documentation of these refinements and the rationale behind them provides crucial context for future maintainers and for external review. Consequently, the specification itself and other implementation-independent artifacts will be contributed to the [`cardano-blueprint`](https://cardano-scaling.github.io/cardano-blueprint) initiative. ## Simulation and protocol validation @@ -758,7 +766,7 @@ Simulations provide a very controlled environment for exploring protocol behavio A discrete event simulation implemented in Rust, models Leios message exchanges between nodes, abstracting lower-level details for speed—running orders of magnitude faster than real time to enable statistical analysis over thousands of runs with complete observability and arbitrary adversarial behavior injection. This validates security arguments by systematically exploring protocol behavior under varying loads, expected data diffusion in small to medium sized network topologies, or adversarial scenarios like data withholding, and exploration of protocol parameters before testnet deployment. -Another Haskell-based simulation using IOSim and the actual network framework used in the `cardano-node`. This reduces model-implementation divergence while enabling studies of the dynamic behavior and resource management in detail. While IOSim is used in the existing network and consensus layers through property-based testing, and extends naturally to Leios components, the simulator built from this was not able to scale to large networks. +Another Haskell-based simulation using IOSim and the actual network framework used in the `cardano-node`. This reduces model-implementation divergence while enabling studies of the dynamic behavior and resource management in detail. While IOSim is used in the existing network and consensus layers through property-based testing, and extends naturally to Leios components, the simulator built from this was not able to scale to large networks. Both approaches necessarily abstract real system details and thus provide evidence of correct behavior under idealized conditions and suggest workable parameters, but cannot definitively predict real-world performance. Maintaining simulation synchronization with evolving implementation requires discipline, but enables rapid exploration of alternatives, early feature validation, and serves as executable documentation for new developers. @@ -766,7 +774,7 @@ Both approaches necessarily abstract real system details and thus provide eviden Prototypes on real infrastructure validate performance characteristics that simulation typically cannot guarantee. The line between simulation and prototyping is blurry, but both concepts share the trait of allowing rapid exploration of the most uncertain aspects of the design before committing to a full implementation. Referring back to the key threats and assumptions to validate early, the primary focus of prototyping is on network diffusion performance under high throughput conditions and adversarial scenarios. -**Network diffusion prototype:** An early implementation of the actual Leios network protocols and potential freshest-first delivery mechanisms, that allows running experiments with various network topologies. Ledger validation of Leios concepts is stubbed out and transmitted data is generated synthetically to focus purely on network performance. Deployed to controlled environments like local devnets and private testnets like the the Performance and Testing cluster, this prototype systematically explores how performance scales with network size and block size, tests different topologies, and crucially answers whether the real network stack achieves the diffusion deadlines required by protocol security arguments. Key measurements include endorser block arrival time distributions, freshest-first multiplexing effectiveness, topology impact on diffusion, and behavior under adversarial scenarios including eclipse attempts and targeted withholding. These measurements will answer questions like, "how much" freshest-first delivery we need, whether the proposed network protocols are practical to implement and what protocol parameter are feasible. +**Network diffusion prototype:** An early implementation of the actual Leios network protocols and potential freshest-first delivery mechanisms, that allows running experiments with various network topologies. Ledger validation of Leios concepts is stubbed out and transmitted data is generated synthetically to focus purely on network performance. Deployed to controlled environments like local devnets and private testnets like the the Performance and Testing cluster, this prototype systematically explores how performance scales with network size and block size, tests different topologies, and crucially answers whether the real network stack achieves the diffusion deadlines required by protocol security arguments. Key measurements include endorser block arrival time distributions, freshest-first multiplexing effectiveness, topology impact on diffusion, and behavior under adversarial scenarios including eclipse attempts and targeted withholding. These measurements will answer questions like, "how much" freshest-first delivery we need, whether the proposed network protocols are practical to implement and what protocol parameter are feasible. Adversarial testing represents a crucial aspect of prototype validation. In a controlled environment, some nodes can still be configured to exhibit adversarial behaviors such as sending invalid blocks, withholding information, or attempting to exhaust resources of honest nodes. Observing how honest nodes respond provides evidence that the mitigations described in the design are effective. Despite using real network communication, such systems can still be determinstically simulation-tested using tools like [Antithesis](https://antithesis.com/), which is currently picked up also by node-level tests in the Cardano community via [moog](https://github.com/cardano-foundation/moog). If we can put this technique to use for adversarial testing of Leios prototypes and release candidates, this can greatly enhance our ability to validate the protocol under challenging conditions by exploring a much wider range of adversarial scenarios than would be feasible through manually created rigit test scenarios. @@ -780,7 +788,7 @@ Focused prototypes provide empirical data that complements the theoretical analy ## Public testnets and integration -A public testnet serves distinct purposes over simulations and controlled environments: it requires integration of all components into a complete implementation, enables for tests under realistic conditions with diverse node operators and hardware, and allows the community to experience enhanced throughput directly. While some shortcuts can still be made, the testnet-ready implementation must offer complete Leios functionality - endorser block production and diffusion, vote aggregation, certificate formation, ledger integration, enhanced mempool - plus sufficient robustness for continuous operation and operational tooling for deployment and monitoring. +A public testnet serves distinct purposes over simulations and controlled environments: it requires integration of all components into a complete implementation, enables for tests under realistic conditions with diverse node operators and hardware, and allows the community to experience enhanced throughput directly. While some shortcuts can still be made, the testnet-ready implementation must offer complete Leios functionality - endorser block production and diffusion, vote aggregation, certificate formation, ledger integration, enhanced mempool - plus sufficient robustness for continuous operation and operational tooling for deployment and monitoring. The testnet enables multiple validation categories. Functional testing verifies correct protocol operation: nodes produce endorser blocks when elected, votes aggregate into certificates, certified blocks incorporate into the ledger, and ledger state remains consistent. Performance testing measures achieved throughput against business requirements - sustained transaction rate, mempool-to-ledger latency, and behavior under bursty synthetic workloads. Adversarial testing is limited on a public testnet, but some attempts with deliberately misbehaving nodes can be made on withholding blocks, sending invalid data, attempting network partitioning, or resource exhaustion. From 2ff263a46483efb9f57c3bd39b59b033d2f0c8dc Mon Sep 17 00:00:00 2001 From: Sebastian Nagel Date: Mon, 27 Oct 2025 11:55:23 +0100 Subject: [PATCH 3/8] Bump version and changelog --- docs/leios-design/README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/leios-design/README.md b/docs/leios-design/README.md index 28d181908..43f37bbef 100644 --- a/docs/leios-design/README.md +++ b/docs/leios-design/README.md @@ -1,7 +1,7 @@ --- title: Leios technical design and implementation plan status: Draft -version: 0.3 +version: 0.4 author: - Sebastian Nagel --- @@ -14,12 +14,13 @@ This document builds on the [impact analysis](../ImpactAnalysis.md) and [early t Besides collecting node-specific details in this document, we intend to contribute implementation-independent specifications to the [cardano-blueprint](https://cardano-scaling.github.io/cardano-blueprint/) initiative and also update the CIP-164 specification through pull requests as needed. -#### Document history +**Document history** This document is a living artifact and will be updated as implementation progresses, new risks are identified, and validation results become available. | Version | Date | Author | Changes | |---------|------------|-----------------|-----------------------------------| +| 0.4 | 2025-10-27 | Sebastian Nagel | Add overview chapter | | 0.3 | 2025-10-25 | Sebastian Nagel | Add dependencies and interactions | | 0.2 | 2025-10-24 | Sebastian Nagel | Add implementation plan | | 0.1 | 2025-10-15 | Sebastian Nagel | Initial draft | From fb414942fc16b769dd19b43e3acb429d03b0a367 Mon Sep 17 00:00:00 2001 From: Sebastian Nagel Date: Wed, 29 Oct 2025 10:21:58 +0100 Subject: [PATCH 4/8] Made worst-case statement more conservative --- docs/leios-design/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/leios-design/README.md b/docs/leios-design/README.md index 43f37bbef..50fbd8cda 100644 --- a/docs/leios-design/README.md +++ b/docs/leios-design/README.md @@ -66,7 +66,7 @@ With these (at least) two additional functionalities, each across many peers, th ## Designing for the worst-case -Related to the principle of [optimizing for the worst case](https://cardano-scaling.github.io/cardano-blueprint/principles/index.html#optimise-only-for-the-worst-case), the security argument for Leios protocol depends critically on worst-case diffusion characteristics. Endorser blocks and their transaction closures must be "small enough" that the difference between optimistic diffusion (leading to successful certification) and worst-case diffusion remains bounded by the protocol parameter $L_\text{diff}$ and Praos's maximum delay $\Delta$. +Related to the principle of [optimizing for the worst case](https://cardano-scaling.github.io/cardano-blueprint/principles/index.html#optimise-only-for-the-worst-case), the security argument for Leios protocol depends critically on worst-case diffusion characteristics. Endorser blocks and their transaction closures must be "small enough" that the difference between optimistic diffusion (leading to successful certification) and worst-case diffusion remains bounded by the protocol parameter $L_\text{diff}$ according to the [protocol's security argument](https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md#protocol-security). If the optimistic, average-case performance is improved with suitable algorithms, data structures and optimizations, but the worst-case scenario is not, more conservative parameter choices would be required to maintain security guarantees. This would negate the anticipated benefits of the optimizations in the first place. Therefore, the implementation must prioritize ensuring that even in adverse network conditions or under attack, the diffusion of EBs and their closures remains within acceptable bounds. From b855efdcb0b926e972b636ffbd65b33c504c62aa Mon Sep 17 00:00:00 2001 From: Sebastian Nagel Date: Wed, 29 Oct 2025 10:24:08 +0100 Subject: [PATCH 5/8] Refine wording on better network utilization --- docs/leios-design/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/leios-design/README.md b/docs/leios-design/README.md index 50fbd8cda..79a53e3f3 100644 --- a/docs/leios-design/README.md +++ b/docs/leios-design/README.md @@ -35,7 +35,7 @@ Ouroboros Leios introduces **high-throughput** as a third fundamental property, As was the case for the [Praos variant of Ouroboros](https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsection.5.1), the specification embodied in the published and peer-reviewed [research paper for Ouroboros Leios](https://eprint.iacr.org/2025/1115.pdf) was not intended to be directly implementable. Initial research and development studies confirmed this expectation, identifying several unsolved problems with the fully concurrent block production design when considering the concrete Cardano ledger and what consequences this would have (TODO: cite suitable R&D reports, [Tech Report #2](https://github.com/input-output-hk/ouroboros-leios/blob/main/docs/technical-report-2.md#conflicts-ledger-and-incentives)). -The design presented in [CIP-164](https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md), also known as "Linear Leios", addresses these implementation challenges by focusing on the core insight of better utilizing network and computational resources during the necessary and eponymous "calm periods" of the Praos protocol. This approach provides an immediately implementable design that can deliver orders of magnitude higher throughput while preserving the security guarantees that make Cardano valuable. +The design presented in [CIP-164](https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md), also known as "Linear Leios", addresses these implementation challenges by focusing on the core insight of utilizing the unused network bandwidth and computational resources during the necessary and eponymous "calm periods" of the Praos protocol. This approach provides an immediately implementable design that can deliver orders of magnitude higher throughput while preserving the security guarantees that make Cardano valuable. The Linear Leios protocol operates by allowing a second, bigger type of block to be produced in the same block production opportunity. Block producers can produce and announce an endorser block (EB), which endorses additional transactions that would not fit within the Praos block. EBs are distributed through the network and subjected to validation by a committee of stake pools, who vote on their transaction data closure's availability and validity. Only EBs that achieve a high threshold of stake-weighted votes become certified and can be included in the ledger through exclusive anchoring of a certificate in the subsequent block - now called a ranking block (RB). This mechanism allows for significantly higher transaction throughput while maintaining the security properties of the underlying Praos consensus. See the CIP for more details on the protocol specification and rationale itself. From 340c3d3293fc47c2d23d1f89086d877a7ea3068b Mon Sep 17 00:00:00 2001 From: Sebastian Nagel Date: Wed, 29 Oct 2025 10:27:20 +0100 Subject: [PATCH 6/8] Apply suggestions about network degree and refined wording Co-authored-by: coot --- docs/leios-design/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/leios-design/README.md b/docs/leios-design/README.md index 79a53e3f3..bd5d245b8 100644 --- a/docs/leios-design/README.md +++ b/docs/leios-design/README.md @@ -41,7 +41,7 @@ The Linear Leios protocol operates by allowing a second, bigger type of block to ## Cardano node as a real-time system -The implementation of Leios must be understood in the context of the Cardano node as a concurrent, reactive system operating under real-time constraints in an adversarial environment. While "real-time" in this context does not refer to the millisecond-level hard deadlines found in industrial control systems, timely action nontheless remains crucial to protocol success and network security. +The implementation of Leios must be understood in the context of the Cardano node as a concurrent, reactive system operating under real-time constraints in an adversarial environment. While "real-time" in this context does not refer to the millisecond-level hard deadlines found in industrial control systems, timely action at the scale of seconds nontheless remains crucial to protocol success and network security. The currently deployed Praos implementation establishes clear [data diffusion targets](https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsection.5.1): blocks must reach 95% of nodes within the 5-second $\Delta$ parameter, with target performance at 98% and stretch goals at 99%. While these are comfortably achieved most of the time, blocks are regularly adopted within 1 second across the network, there are some situations even in the current system where the target is not reached. For example, due to reward calculations happening at the epoch boundary. @@ -55,7 +55,7 @@ The current primary responsibilities of a Cardano node are roughly: - Transaction submission: receiving, validating, and transmitting transactions to be included in blocks. - Block production: creating new blocks and extending current chain when selected as slot leader. -Despite this apparent simplicity, this already results in a highly concurrent system once cardinalities of upstream and downstream network peers are considered. A node typically maintains in the order of 10 upstream and 100 downstream connections, each of which may be simultaneously requesting or serving data. All of these operations share critical resources including memory, CPU, and network bandwidth, requiring careful resource management to ensure that timing requirements are met even under load. +Despite this apparent simplicity, this already results in a highly concurrent system once cardinalities of upstream and downstream network peers are considered. A `cardano-node` with the default configuration maintains 20 upstream hot peers, 10 upstream warm peers and can have up to a few hundred downstream connections, each of which may be simultaneously requesting or serving data. All of these operations share critical resources, including memory, CPU, and network bandwidth, requiring careful resource management to ensure timing requirements are met even under load. Leios significantly expands this concurrency model by introducing new responsibilities: @@ -775,7 +775,7 @@ Both approaches necessarily abstract real system details and thus provide eviden Prototypes on real infrastructure validate performance characteristics that simulation typically cannot guarantee. The line between simulation and prototyping is blurry, but both concepts share the trait of allowing rapid exploration of the most uncertain aspects of the design before committing to a full implementation. Referring back to the key threats and assumptions to validate early, the primary focus of prototyping is on network diffusion performance under high throughput conditions and adversarial scenarios. -**Network diffusion prototype:** An early implementation of the actual Leios network protocols and potential freshest-first delivery mechanisms, that allows running experiments with various network topologies. Ledger validation of Leios concepts is stubbed out and transmitted data is generated synthetically to focus purely on network performance. Deployed to controlled environments like local devnets and private testnets like the the Performance and Testing cluster, this prototype systematically explores how performance scales with network size and block size, tests different topologies, and crucially answers whether the real network stack achieves the diffusion deadlines required by protocol security arguments. Key measurements include endorser block arrival time distributions, freshest-first multiplexing effectiveness, topology impact on diffusion, and behavior under adversarial scenarios including eclipse attempts and targeted withholding. These measurements will answer questions like, "how much" freshest-first delivery we need, whether the proposed network protocols are practical to implement and what protocol parameter are feasible. +**Network diffusion prototype:** An early implementation of the actual Leios network protocols and freshest-first delivery mechanisms, that allows running experiments with various network topologies. Ledger validation of Leios concepts is stubbed out and transmitted data is generated synthetically to focus purely on network performance. Deployed to controlled environments like local devnets and private testnets like the the Performance and Testing cluster, this prototype systematically explores how performance scales with network size and block size, tests different topologies, and crucially answers whether the real network stack achieves the diffusion deadlines required by protocol security arguments. Key measurements include endorser block arrival time distributions, freshest-first multiplexing effectiveness, topology impact on diffusion, and behavior under adversarial scenarios including eclipse attempts and targeted withholding. These measurements will answer questions like, "how much" freshest-first delivery we need, whether the proposed network protocols are practical to implement and what protocol parameter are feasible. Adversarial testing represents a crucial aspect of prototype validation. In a controlled environment, some nodes can still be configured to exhibit adversarial behaviors such as sending invalid blocks, withholding information, or attempting to exhaust resources of honest nodes. Observing how honest nodes respond provides evidence that the mitigations described in the design are effective. Despite using real network communication, such systems can still be determinstically simulation-tested using tools like [Antithesis](https://antithesis.com/), which is currently picked up also by node-level tests in the Cardano community via [moog](https://github.com/cardano-foundation/moog). If we can put this technique to use for adversarial testing of Leios prototypes and release candidates, this can greatly enhance our ability to validate the protocol under challenging conditions by exploring a much wider range of adversarial scenarios than would be feasible through manually created rigit test scenarios. From 44206863d636ffa39878f8203882ac2e101270a3 Mon Sep 17 00:00:00 2001 From: Sebastian Nagel Date: Wed, 29 Oct 2025 10:31:34 +0100 Subject: [PATCH 7/8] Add concrete cardinalities on thread concurrency --- docs/leios-design/README.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/docs/leios-design/README.md b/docs/leios-design/README.md index bd5d245b8..2c4eab280 100644 --- a/docs/leios-design/README.md +++ b/docs/leios-design/README.md @@ -57,12 +57,22 @@ The current primary responsibilities of a Cardano node are roughly: Despite this apparent simplicity, this already results in a highly concurrent system once cardinalities of upstream and downstream network peers are considered. A `cardano-node` with the default configuration maintains 20 upstream hot peers, 10 upstream warm peers and can have up to a few hundred downstream connections, each of which may be simultaneously requesting or serving data. All of these operations share critical resources, including memory, CPU, and network bandwidth, requiring careful resource management to ensure timing requirements are met even under load. +Concretely, in the current system there are (including protocols for supporting features like peer sharing): + +- 2 pipelined + 3 non-pipelined instances per upstream peers => 7 threads per upstream peer; +- 1 pipelined + 4 non-pipelined instances per downstream peer => 6 threads per downstream peer + Leios significantly expands this concurrency model by introducing new responsibilities: - Endorser block and closure diffusion: receiving, validating, and transmitting EBs and their transaction closures. - Voting and vote diffusion: receiving, validating, and transmitting own and foreign votes on EBs. -With these (at least) two additional functionalities, each across many peers, the node set of concurrent tasks only increases. The implementation must ensure that the increased data flows and processing demands do not interfere with each other, or priorization mechanisms ensure to meet the stringent timing constraints necessary for protocol security. +Given the [proposed Leios mini-protocols](https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md#leios-mini-protocols), this would result in: + +- 4 pipelined + 3 non-pipelined per upstream peers => 11 threads per upstream peer; +- 1 pipelined + 6 non-pipelined per downstream peer => 8 threads per downstream peer + +With these two additional functionalities, each across many peers, the node set of concurrent tasks strictly increases. The implementation must ensure that the increased data flows and processing demands do not interfere with each other, or priorization mechanisms ensure to meet the stringent timing constraints necessary for protocol security. ## Designing for the worst-case From 7164be8d180c3a719cd6fb841ff4e8e780d871b7 Mon Sep 17 00:00:00 2001 From: Sebastian Nagel Date: Wed, 29 Oct 2025 13:38:37 +0100 Subject: [PATCH 8/8] Refine wording on "From research to implementation" --- docs/leios-design/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/leios-design/README.md b/docs/leios-design/README.md index 2c4eab280..9c7fd7e47 100644 --- a/docs/leios-design/README.md +++ b/docs/leios-design/README.md @@ -33,9 +33,9 @@ Ouroboros Leios introduces **high-throughput** as a third fundamental property, ## From research to implementation -As was the case for the [Praos variant of Ouroboros](https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsection.5.1), the specification embodied in the published and peer-reviewed [research paper for Ouroboros Leios](https://eprint.iacr.org/2025/1115.pdf) was not intended to be directly implementable. Initial research and development studies confirmed this expectation, identifying several unsolved problems with the fully concurrent block production design when considering the concrete Cardano ledger and what consequences this would have (TODO: cite suitable R&D reports, [Tech Report #2](https://github.com/input-output-hk/ouroboros-leios/blob/main/docs/technical-report-2.md#conflicts-ledger-and-incentives)). +As was the case for the [Praos variant of Ouroboros](https://ouroboros-network.cardano.intersectmbo.org/pdfs/network-design/network-design.pdf#subsection.5.1), the specification embodied in the published and peer-reviewed [research paper for Ouroboros Leios](https://eprint.iacr.org/2025/1115.pdf) was not intended to be directly implementable. Initial research and development studies confirmed this expectation, identifying several unsolved problems with the fully concurrent block production design when considering the concrete Cardano ledger and what consequences this would have (TODO: cite suitable R&D reports, [Tech Report #2](https://github.com/input-output-hk/ouroboros-leios/blob/main/docs/technical-report-2.md#conflicts-ledger-and-incentives), [Impact analysis survey](https://github.com/input-output-hk/ouroboros-leios/blob/main/docs/ImpactAnalysis.md#full-survey)); further research is needed before those parts can be implemented. -The design presented in [CIP-164](https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md), also known as "Linear Leios", addresses these implementation challenges by focusing on the core insight of utilizing the unused network bandwidth and computational resources during the necessary and eponymous "calm periods" of the Praos protocol. This approach provides an immediately implementable design that can deliver orders of magnitude higher throughput while preserving the security guarantees that make Cardano valuable. +The design presented in [CIP-164](https://github.com/cardano-scaling/CIPs/blob/leios/CIP-0164/README.md), also known as "Linear Leios", focuses on the core insight of utilizing the unused network bandwidth and computational resources during the necessary and eponymous "calm periods" of the Praos protocol. This approach provides an immediately implementable design that can deliver orders of magnitude higher throughput while preserving the security guarantees that make Cardano valuable. The Linear Leios protocol operates by allowing a second, bigger type of block to be produced in the same block production opportunity. Block producers can produce and announce an endorser block (EB), which endorses additional transactions that would not fit within the Praos block. EBs are distributed through the network and subjected to validation by a committee of stake pools, who vote on their transaction data closure's availability and validity. Only EBs that achieve a high threshold of stake-weighted votes become certified and can be included in the ledger through exclusive anchoring of a certificate in the subsequent block - now called a ranking block (RB). This mechanism allows for significantly higher transaction throughput while maintaining the security properties of the underlying Praos consensus. See the CIP for more details on the protocol specification and rationale itself.