-
Notifications
You must be signed in to change notification settings - Fork 375
CIP-???? | Non-segregated Block Body Serialization #1084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,87 @@ | ||||||||||||||||||||||
| --- | ||||||||||||||||||||||
| CIP: XXXX | ||||||||||||||||||||||
| Title: Non-segregated Block Body Serialization | ||||||||||||||||||||||
| Category: Ledger | ||||||||||||||||||||||
| Authors: | ||||||||||||||||||||||
| - Teodora Danciu <[email protected]> | ||||||||||||||||||||||
| - Alexey Kuleshevich <[email protected]> | ||||||||||||||||||||||
| Implementors: N/A | ||||||||||||||||||||||
| Status: Proposed | ||||||||||||||||||||||
| Discussions: | ||||||||||||||||||||||
| - https://github.com/IntersectMBO/cardano-ledger/issues/5046 | ||||||||||||||||||||||
| - https://github.com/cardano-foundation/CIPs/pull/1084 | ||||||||||||||||||||||
| Created: 2025-07-30 | ||||||||||||||||||||||
| License: CC-BY-4.0 | ||||||||||||||||||||||
| --- | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ## Abstract | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| We propose changing the CBOR encoding of a block body from a segregated layout to a plain sequence of transactions. | ||||||||||||||||||||||
| Current layout: all transaction bodies are concatenated and encoded first, followed by their witness sets, then followed by auxiliary-data hashes, and finally followed by validity flags. | ||||||||||||||||||||||
| Proposed layout: each transaction is serialized in full before the next transaction is written to the stream. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ## Motivation: why is this CIP necessary? | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Segregated serialization of [CIP-0118? | Nested Transactions](https://github.com/cardano-foundation/CIPs/pull/862) would be challenging both to specify and implement. | ||||||||||||||||||||||
| Separating and concatenating components across nested and non-nested transactions introduces complexity that is error-prone and potentially inefficient, as it may require tracking offsets and performing additional buffering and copying at runtime. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||||||||||||||||||
| Currently, segregated serialization also complicates the logic used by Consensus to estimate transaction sizes when selecting transactions to fit within a block. Switching to a format where full transactions are encoded sequentially would simplify this process. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Considering - in hindsight - the original motivation for adopting segregated serialization rather than the more natural and predictable way of serializing a list of transactions - it's unclear whether it provides any real benefit. | ||||||||||||||||||||||
| The intent was to enable static validation of a transaction without decoding its witnesses. However, this benefit conflicts with the need for strict field evaluation, which is essential to prevent space leaks caused by retaining the original block bytes for an indeterminate period. | ||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was always under the impression that this segmentation was instead so that you could have light clients that, for example, ignored the auxiliary data. and just downloaded the transaction bodies and witnesses.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This segregation was done only because Bitcoin did it that way. This is what I've learned from others that were around during that time. From what I know, current structure is not utilized by anyone out there and it is definitely not utilized by the node. It is absolutely possible to download transactions without the parts that you don't care about, regardless how transaction is serialized. Naturally this would prevent any sort of validation, but presumably that would not be a problem for light clients that trust the data they are downloading. FTR. dropping auxiliary data prevents transaction validation, since its hash is in the body.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could still validate the transaction body; because only hash is in the body. You don't need to know what the preimage of the hash is to validate that the ledger state transition represented by / caused by the transaction body is valid. What you can't validate is the auxiliary data, which is fine, because you didn't download it anyway, and its contents can't change the validity of the state transition to the ledger. Maybe splitting hairs semantically about what "validating" encompasses, but the point is that there is a world where you can download just the tx bodies, validate them with respect to a ledger, without having to download the potentially much larger auxiliary data. Whether that is useful / valuable to preserve is another question entirely :) just clarifying what I meant.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Same applies to all of the witnesses. You don't need scripts or redeemers or signatures if you just want to reconstruct the ledger state.
You cannot download just bodies and validate them with respect to the ledger! You either need a full transaction or you don't do any validation. You can't just be picking and choosing what you want to validate. You either trust the full block/transaction and skip validation altogether or you don't trust any of it and verify it yourself using the ledger logic.
Even if you don't need to do validation this is not exactly true, since you also need Moreover, with introduction of nested transactions, you will no longer be able to download "just the transaction bodies", since those transaction bodies will contain subtransactions with their own witnesses and their own auxiliary data, unless we also split them out into a separate part of the block. My point is that, it is always possible to download transactions with any information stripped out from transactions, regardless of how they were serialized. In the matter of fact db-sync does just that and more. All one needs to do is discard the bytes that are not relevant for the end user (setting auxiliary data, witnesses, etc. to empty before transmission of transactions to the light client) @Quantumplation In any case, your arguments are about hypothetical use cases that someone somewhere might or might not implement at some point. While my argument in favor of this CIP is that it will facilitate a safer and less complex implementation of Nested Transactions. So, I appreciate your concern, but unless we find someone that actually relies on the current structure, such speculative arguments do not carry much weight IMHO.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not making arguments either way, for what it's worth. From my perspective, the stance that it was done this way "because bitcoin did it that way" is very incongruous with the thought and care that went into many other decisions of the early ledger. Usually things that seem arbitrary have at least some well thought out reasoning, if outdated or prematurely optimizing. I had some recollection that Duncan Coutts had told me what that reason was at one point. I may certainly be mistaken on that, but I'm just sharing in case it's helpful. Beyond making sure that that is actually understood, I don't have any particular opinion on this proposal (in fact, I'm slightly in favor!) Since I'm still not 100% sure you understand the point I'm making, given some of your response, I'll clarify again, but this is not to make an argument that we should or should not change it now. I'm actually quite in favor of simplifying things since the benefit as I understood it never did materialize much.
This is not true. Scripts, Witnesses, and redeemers are fundamentally different with regards to the validity and trust assumptions compared to the auxilary data like transaction metadata. If I don't ever need to look at the auxiliary metadata, and never depend on it for business logic, then I can get a fully trustless node by validating block headers and the transaction body + witness set. Regardless of what the preimage was, someone can't present me with fallacious data, because I'm not downloading the preimage, and consensus doesn't consider the auxiliary data for selecting a chain. On the other hand, if I avoid downloading the witness set, then I have a fundamentally different trust model; I don't know the redeemers to apply to the scripts to even run them; I don't know whether the transaction is even authorized to spend the UTxOs. There are ends of a spectrum, of course: I can have a trust-full node by ignoring the witness set, of course, and at the end of the day a full node needs to download the auxiliary data to make it available, I'm not arguing that that spectrum doesn't exist. But I still maintain that there is a fundamental difference between these things; the body lets you trust-fully reproduce the ledger state; the block headers + tx bodies + witnesses let you trust-lessly validate and reproduce the ledger state; the block headers + tx bodies + witness + auxiliary data lets you fully replicate and serve all historical state, including non-ledger state like transaction metadata. Again, I'm just sharing this as another point of data for what I thought someone had told me the motivation was, in case that shook forth additional context / memories / motivation for the people like yourself; I may certainly be misremembering, and I quite agree that it's likely not very relevant in todays landscape. Beyond making sure that I'm not being misunderstood, I don't have a dog in this race on the proposal itself.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Duncan was the person who told me that it was done that way "because bitcoin did it that way". I get what you are saying, but without auxiliary data it is not only the hash that is not verifiable, but you also cannot validate the size of the transaction, thus you cannot validate the fees, or the correctness of collateral amount. So, a lot of validation goes out of window in the ledger rules if you don't have the full thing.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Glad to hear though that you are in favor of this simplification 😉
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I was missing tx size, that's indeed a big factor of it 😅 |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Moreover, the current segregated layout, if it was used for its intended purpose mentioned above, conflicts with incremental decoding of blocks being received over the wire - for which there is a need. | ||||||||||||||||||||||
| Our decoders consume bytes on demand until the entire BlockBody structure is exhausted. As a result, the node decodes the full block - including all witness data - in a single pass. The segregated format therefore provides no practical benefit for partial or streaming validation. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Moreover, time spent doing deserialization amounts to a small fraction of the whole time spent applying the whole block to the ledger state, even when it is done without validating transactions. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| A significant benefit that we will get from the proposed change is reduction in complexity, both for the cardano-node core components, as well as various tools that handles chain data in some capacity. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ## Specification | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Currently, a block is serialized like this: | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ```cddl | ||||||||||||||||||||||
| block = | ||||||||||||||||||||||
| [ header | ||||||||||||||||||||||
| , transaction_bodies : [* transaction_body] | ||||||||||||||||||||||
| , transaction_witness_sets : [* transaction_witness_set] | ||||||||||||||||||||||
| , auxiliary_data_set : {* transaction_index => auxiliary_data} | ||||||||||||||||||||||
| , invalid_transactions : [* transaction_index] | ||||||||||||||||||||||
| ] | ||||||||||||||||||||||
| ``` | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| The proposal is to change it to: | ||||||||||||||||||||||
| ```cddl | ||||||||||||||||||||||
| block = | ||||||||||||||||||||||
| [ header | ||||||||||||||||||||||
| , transactions : [* transaction] | ||||||||||||||||||||||
| , invalid_transactions : [* transaction_index] | ||||||||||||||||||||||
| ] | ||||||||||||||||||||||
|
Comment on lines
+58
to
+60
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should make a concrete distinction what the block body is. This will also be useful for Peras and Leios, since we will be adding their respective certificates into the block body:
Suggested change
|
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| transaction = [transaction_body, transaction_witness_set, bool, auxiliary_data/ nil] | ||||||||||||||||||||||
| ``` | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Note that we propose keeping invalid transaction indices separately, because: | ||||||||||||||||||||||
| * `isValid` flag - which determins validity - is controlled by the block producing node, not by the the transaction creator | ||||||||||||||||||||||
| * it's more efficient: we serialize indices only for invalid transactions, which are a small minority. | ||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. in fact, "a small minority" understates it; in asymptotically 100% of blocks, this list would be empty and could take up a single byte
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder how this plays with the proposed changes in the Leios CIP
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That is why value form FYI. There is also a plan to remove
Leios does not care about how transactions are serialized in a block. |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ## Rationale: how does this CIP achieve its goals? | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Serializing transactions in sequence directly supports more complex constructs - such as nested transactions - by eliminating the need to coordinate disjointed segments across different levels of structure. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ## Path to Active | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ### Acceptance Criteria | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| - [ ] Block serializers and deserializers in [cardano-ledger](https://github.com/IntersectMBO/cardano-ledger) are implemented such that they follow the cddl specification described above, and reflected in the cddl specs | ||||||||||||||||||||||
| - [ ] The feature is integrated into [cardano-node](https://github.com/IntersectMBO/cardano-node) and released as part of the Dijkstra era hard fork | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ### Implementation Plan | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| The implementation of this CIP should not proceed without an assessment of the potential impact on all the components that deserialise blocks. | ||||||||||||||||||||||
| Leios and Peras R&D teams should also be aware of these changes. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| ## Copyright | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| This CIP is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode). | ||||||||||||||||||||||
Uh oh!
There was an error while loading. Please reload this page.