Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions CIP-XXXX/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
---
CIP: XXXX
Title: Non-segregated Block Body Serialization
Category: Ledger
Authors:
- Teodora Danciu <[email protected]>
- Alexey Kuleshevich <[email protected]>
Implementors: N/A
Status: Proposed
Discussions:
- https://github.com/IntersectMBO/cardano-ledger/issues/5046
- https://github.com/cardano-foundation/CIPs/pull/1084
Created: 2025-07-30
License: CC-BY-4.0
---

## Abstract

We propose changing the CBOR encoding of a block body from a segregated layout to a plain sequence of transactions.
Current layout: all transaction bodies are concatenated and encoded first, followed by their witness sets, then followed by auxiliary-data hashes, and finally followed by validity flags.
Proposed layout: each transaction is serialized in full before the next transaction is written to the stream.

## Motivation: why is this CIP necessary?

Segregated serialization of [CIP-0118? | Nested Transactions](https://github.com/cardano-foundation/CIPs/pull/862) would be challenging both to specify and implement.
Separating and concatenating components across nested and non-nested transactions introduces complexity that is error-prone and potentially inefficient, as it may require tracking offsets and performing additional buffering and copying at runtime.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CIP will also slightly simplify implementation of #964 as it requires each transaction bytes to be hashed. However this is always possible to do regardless of the format, so this CIP is not a strict dependency for #964.

Currently, segregated serialization also complicates the logic used by Consensus to estimate transaction sizes when selecting transactions to fit within a block. Switching to a format where full transactions are encoded sequentially would simplify this process.

Considering - in hindsight - the original motivation for adopting segregated serialization rather than the more natural and predictable way of serializing a list of transactions - it's unclear whether it provides any real benefit.
The intent was to enable static validation of a transaction without decoding its witnesses. However, this benefit conflicts with the need for strict field evaluation, which is essential to prevent space leaks caused by retaining the original block bytes for an indeterminate period.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was always under the impression that this segmentation was instead so that you could have light clients that, for example, ignored the auxiliary data. and just downloaded the transaction bodies and witnesses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This segregation was done only because Bitcoin did it that way. This is what I've learned from others that were around during that time. From what I know, current structure is not utilized by anyone out there and it is definitely not utilized by the node.

It is absolutely possible to download transactions without the parts that you don't care about, regardless how transaction is serialized. Naturally this would prevent any sort of validation, but presumably that would not be a problem for light clients that trust the data they are downloading.

FTR. dropping auxiliary data prevents transaction validation, since its hash is in the body.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could still validate the transaction body; because only hash is in the body. You don't need to know what the preimage of the hash is to validate that the ledger state transition represented by / caused by the transaction body is valid.

What you can't validate is the auxiliary data, which is fine, because you didn't download it anyway, and its contents can't change the validity of the state transition to the ledger.

Maybe splitting hairs semantically about what "validating" encompasses, but the point is that there is a world where you can download just the tx bodies, validate them with respect to a ledger, without having to download the potentially much larger auxiliary data.

Whether that is useful / valuable to preserve is another question entirely :) just clarifying what I meant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could still validate the transaction body; because only hash is in the body. You don't need to know what the preimage of the hash is to validate that the ledger state transition represented by / caused by the transaction body is valid.

Same applies to all of the witnesses. You don't need scripts or redeemers or signatures if you just want to reconstruct the ledger state.

Maybe splitting hairs semantically about what "validating" encompasses, but the point is that there is a world where you can download just the tx bodies, validate them with respect to a ledger, without having to download the potentially much larger auxiliary data.

You cannot download just bodies and validate them with respect to the ledger! You either need a full transaction or you don't do any validation. You can't just be picking and choosing what you want to validate. You either trust the full block/transaction and skip validation altogether or you don't trust any of it and verify it yourself using the ledger logic.
Anyone who has a different opinion on that matter can do whatever the hell they want, but we do not need to care or support their use case.

there is a world where you can download just the tx bodies

Even if you don't need to do validation this is not exactly true, since you also need isValid flag, otherwise you don't know which inputs have been spent or outputs created.

Moreover, with introduction of nested transactions, you will no longer be able to download "just the transaction bodies", since those transaction bodies will contain subtransactions with their own witnesses and their own auxiliary data, unless we also split them out into a separate part of the block.

My point is that, it is always possible to download transactions with any information stripped out from transactions, regardless of how they were serialized. In the matter of fact db-sync does just that and more. All one needs to do is discard the bytes that are not relevant for the end user (setting auxiliary data, witnesses, etc. to empty before transmission of transactions to the light client)

@Quantumplation In any case, your arguments are about hypothetical use cases that someone somewhere might or might not implement at some point. While my argument in favor of this CIP is that it will facilitate a safer and less complex implementation of Nested Transactions.
Imagine what we (and others who will need the full transaction from the block) would need to do if we are to keep segregated witnesses in presence of sub-transacitions! Today's logic for combininng and splitting is already a hairy mess and I do not want to make it worse. If we are to add another level to this monstrosity, there will be a much higher potential for bugs to be introduced, be it in ledger or in any other tool or node that has to deal with constructing full transactions.

So, I appreciate your concern, but unless we find someone that actually relies on the current structure, such speculative arguments do not carry much weight IMHO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not making arguments either way, for what it's worth.

From my perspective, the stance that it was done this way "because bitcoin did it that way" is very incongruous with the thought and care that went into many other decisions of the early ledger. Usually things that seem arbitrary have at least some well thought out reasoning, if outdated or prematurely optimizing.

I had some recollection that Duncan Coutts had told me what that reason was at one point. I may certainly be mistaken on that, but I'm just sharing in case it's helpful.

Beyond making sure that that is actually understood, I don't have any particular opinion on this proposal (in fact, I'm slightly in favor!)

Since I'm still not 100% sure you understand the point I'm making, given some of your response, I'll clarify again, but this is not to make an argument that we should or should not change it now. I'm actually quite in favor of simplifying things since the benefit as I understood it never did materialize much.

Same applies to all of the witnesses. You don't need scripts or redeemers or signatures if you just want to reconstruct the ledger state.

You cannot download just bodies and validate them with respect to the ledger! You either need a full transaction or you don't do any validation. You can't just be picking and choosing what you want to validate.

This is not true. Scripts, Witnesses, and redeemers are fundamentally different with regards to the validity and trust assumptions compared to the auxilary data like transaction metadata.

If I don't ever need to look at the auxiliary metadata, and never depend on it for business logic, then I can get a fully trustless node by validating block headers and the transaction body + witness set. Regardless of what the preimage was, someone can't present me with fallacious data, because I'm not downloading the preimage, and consensus doesn't consider the auxiliary data for selecting a chain.

On the other hand, if I avoid downloading the witness set, then I have a fundamentally different trust model; I don't know the redeemers to apply to the scripts to even run them; I don't know whether the transaction is even authorized to spend the UTxOs.

There are ends of a spectrum, of course: I can have a trust-full node by ignoring the witness set, of course, and at the end of the day a full node needs to download the auxiliary data to make it available, I'm not arguing that that spectrum doesn't exist.

But I still maintain that there is a fundamental difference between these things; the body lets you trust-fully reproduce the ledger state; the block headers + tx bodies + witnesses let you trust-lessly validate and reproduce the ledger state; the block headers + tx bodies + witness + auxiliary data lets you fully replicate and serve all historical state, including non-ledger state like transaction metadata.

Again, I'm just sharing this as another point of data for what I thought someone had told me the motivation was, in case that shook forth additional context / memories / motivation for the people like yourself; I may certainly be misremembering, and I quite agree that it's likely not very relevant in todays landscape.

Beyond making sure that I'm not being misunderstood, I don't have a dog in this race on the proposal itself.

Copy link
Contributor

@lehins lehins Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some recollection that Duncan Coutts

Duncan was the person who told me that it was done that way "because bitcoin did it that way".

I get what you are saying, but without auxiliary data it is not only the hash that is not verifiable, but you also cannot validate the size of the transaction, thus you cannot validate the fees, or the correctness of collateral amount. So, a lot of validation goes out of window in the ledger rules if you don't have the full thing.
I understand your thought process from the perspective of DApp developer, you only care about general transaction validity and ability to verify scripts. That is, however, only one perspective. And your view on the level of validity of any transaction with missing auxiliary data is very much flawed. And that is the point I am trying to bring across.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to hear though that you are in favor of this simplification 😉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I was missing tx size, that's indeed a big factor of it 😅


Moreover, the current segregated layout, if it was used for its intended purpose mentioned above, conflicts with incremental decoding of blocks being received over the wire - for which there is a need.
Our decoders consume bytes on demand until the entire BlockBody structure is exhausted. As a result, the node decodes the full block - including all witness data - in a single pass. The segregated format therefore provides no practical benefit for partial or streaming validation.

Moreover, time spent doing deserialization amounts to a small fraction of the whole time spent applying the whole block to the ledger state, even when it is done without validating transactions.

A significant benefit that we will get from the proposed change is reduction in complexity, both for the cardano-node core components, as well as various tools that handles chain data in some capacity.

## Specification

Currently, a block is serialized like this:

```cddl
block =
[ header
, transaction_bodies : [* transaction_body]
, transaction_witness_sets : [* transaction_witness_set]
, auxiliary_data_set : {* transaction_index => auxiliary_data}
, invalid_transactions : [* transaction_index]
]
```

The proposal is to change it to:
```cddl
block =
[ header
, transactions : [* transaction]
, invalid_transactions : [* transaction_index]
]
Comment on lines +58 to +60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make a concrete distinction what the block body is. This will also be useful for Peras and Leios, since we will be adding their respective certificates into the block body:

Suggested change
, transactions : [* transaction]
, invalid_transactions : [* transaction_index]
]
, block_body
]
block_body =
[ transactions : [* transaction]
, invalid_transactions : [* transaction_index]
]


transaction = [transaction_body, transaction_witness_set, bool, auxiliary_data/ nil]
```

Note that we propose keeping invalid transaction indices separately, because:
* `isValid` flag - which determins validity - is controlled by the block producing node, not by the the transaction creator
* it's more efficient: we serialize indices only for invalid transactions, which are a small minority.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, "a small minority" understates it; in asymptotically 100% of blocks, this list would be empty and could take up a single byte 80.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how this plays with the proposed changes in the Leios CIP

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in asymptotically 100% of blocks, this list would be empty and could take up a single byte 80.

That is why value form isValid itself is not serialized, otherwise that list would always be as long as number of transactions.

FYI. There is also a plan to remove isValid from transactions, so those indices for isValid=Flaslse has to stay as a separate list.

I wonder how this plays with the proposed changes in the Leios CIP

Leios does not care about how transactions are serialized in a block.
In the matter of fact, we've discussed with @nfrisby how this change could potentially be useful for speeding up mempool, which would be useful for Leios. Although this idea need to be properly experimented with.


## Rationale: how does this CIP achieve its goals?

Serializing transactions in sequence directly supports more complex constructs - such as nested transactions - by eliminating the need to coordinate disjointed segments across different levels of structure.

## Path to Active

### Acceptance Criteria

- [ ] Block serializers and deserializers in [cardano-ledger](https://github.com/IntersectMBO/cardano-ledger) are implemented such that they follow the cddl specification described above, and reflected in the cddl specs
- [ ] The feature is integrated into [cardano-node](https://github.com/IntersectMBO/cardano-node) and released as part of the Dijkstra era hard fork

### Implementation Plan

The implementation of this CIP should not proceed without an assessment of the potential impact on all the components that deserialise blocks.
Leios and Peras R&D teams should also be aware of these changes.

## Copyright

This CIP is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode).