CIP-???? | Non-segregated Block Body Serialization #1084

teodanciu · 2025-09-04T15:41:42Z

Simplify serialization of transactions in a block

rphair

@teodanciu @lehins - looks good in format & readability; we'll introduce this in Triage at the next CIP meeting: https://hackmd.io/@cip-editors/119

CIP-XXXX/README.md

rphair · 2025-09-05T19:15:08Z

@teodanciu please let's avoid force pushing again, as requested here: https://github.com/cardano-foundation/CIPs/blob/master/CIP-0001/README.md#1a-authors-open-a-pull-request

This is a shared resource for commentary & editing, not a mirror of the author's code. We need to keep track of review points and correlate these with changes to the branch, and force pushing effectively wipes out that history. 🙏

teodanciu · 2025-09-05T19:55:20Z

That makes sense, I apologize. Thank you for the information!

CIP-XXXX/README.md

rphair · 2025-09-16T17:44:06Z

@teodanciu @lehins today's CIP meeting has not yet confirmed this as a CIP candidate mainly (as I understood from meeting discussion) because we can't verify that other node development teams have endorsed this. We invited people at the meeting to tag relevant reviewers from those teams & so hopefully that will proceed next.

Including the Nested Transactions already mentioned in the proposal, we also noted that potential changes to the block body make this proposal considerable with these other CIPs (and so tagging authors & some reviewers of these proposals):

Quantumplation · 2025-09-16T18:15:57Z

CIP-XXXX/README.md

+Currently, segregated serialization also complicates the logic used by Consensus to estimate transaction sizes when selecting transactions to fit within a block. Switching to a format where full transactions are encoded sequentially would simplify this process.
+
+Considering - in hindsight - the original motivation for adopting segregated serialization rather than the more natural and predictable way of serializing a list of transactions - it's unclear whether it provides any real benefit.
+The intent was to enable static validation of a transaction without decoding its witnesses. However, this benefit conflicts with the need for strict field evaluation, which is essential to prevent space leaks caused by retaining the original block bytes for an indeterminate period.


I was always under the impression that this segmentation was instead so that you could have light clients that, for example, ignored the auxiliary data. and just downloaded the transaction bodies and witnesses.

This segregation was done only because Bitcoin did it that way. This is what I've learned from others that were around during that time. From what I know, current structure is not utilized by anyone out there and it is definitely not utilized by the node.

It is absolutely possible to download transactions without the parts that you don't care about, regardless how transaction is serialized. Naturally this would prevent any sort of validation, but presumably that would not be a problem for light clients that trust the data they are downloading.

FTR. dropping auxiliary data prevents transaction validation, since its hash is in the body.

You could still validate the transaction body; because only hash is in the body. You don't need to know what the preimage of the hash is to validate that the ledger state transition represented by / caused by the transaction body is valid.

What you can't validate is the auxiliary data, which is fine, because you didn't download it anyway, and its contents can't change the validity of the state transition to the ledger.

Maybe splitting hairs semantically about what "validating" encompasses, but the point is that there is a world where you can download just the tx bodies, validate them with respect to a ledger, without having to download the potentially much larger auxiliary data.

Whether that is useful / valuable to preserve is another question entirely :) just clarifying what I meant.

You could still validate the transaction body; because only hash is in the body. You don't need to know what the preimage of the hash is to validate that the ledger state transition represented by / caused by the transaction body is valid.

Same applies to all of the witnesses. You don't need scripts or redeemers or signatures if you just want to reconstruct the ledger state.

Maybe splitting hairs semantically about what "validating" encompasses, but the point is that there is a world where you can download just the tx bodies, validate them with respect to a ledger, without having to download the potentially much larger auxiliary data.

You cannot download just bodies and validate them with respect to the ledger! You either need a full transaction or you don't do any validation. You can't just be picking and choosing what you want to validate. You either trust the full block/transaction and skip validation altogether or you don't trust any of it and verify it yourself using the ledger logic.
Anyone who has a different opinion on that matter can do whatever the hell they want, but we do not need to care or support their use case.

there is a world where you can download just the tx bodies

Even if you don't need to do validation this is not exactly true, since you also need isValid flag, otherwise you don't know which inputs have been spent or outputs created.

Moreover, with introduction of nested transactions, you will no longer be able to download "just the transaction bodies", since those transaction bodies will contain subtransactions with their own witnesses and their own auxiliary data, unless we also split them out into a separate part of the block.

My point is that, it is always possible to download transactions with any information stripped out from transactions, regardless of how they were serialized. In the matter of fact db-sync does just that and more. All one needs to do is discard the bytes that are not relevant for the end user (setting auxiliary data, witnesses, etc. to empty before transmission of transactions to the light client)

@Quantumplation In any case, your arguments are about hypothetical use cases that someone somewhere might or might not implement at some point. While my argument in favor of this CIP is that it will facilitate a safer and less complex implementation of Nested Transactions.
Imagine what we (and others who will need the full transaction from the block) would need to do if we are to keep segregated witnesses in presence of sub-transacitions! Today's logic for combininng and splitting is already a hairy mess and I do not want to make it worse. If we are to add another level to this monstrosity, there will be a much higher potential for bugs to be introduced, be it in ledger or in any other tool or node that has to deal with constructing full transactions.

So, I appreciate your concern, but unless we find someone that actually relies on the current structure, such speculative arguments do not carry much weight IMHO.

I'm not making arguments either way, for what it's worth.

From my perspective, the stance that it was done this way "because bitcoin did it that way" is very incongruous with the thought and care that went into many other decisions of the early ledger. Usually things that seem arbitrary have at least some well thought out reasoning, if outdated or prematurely optimizing.

I had some recollection that Duncan Coutts had told me what that reason was at one point. I may certainly be mistaken on that, but I'm just sharing in case it's helpful.

Beyond making sure that that is actually understood, I don't have any particular opinion on this proposal (in fact, I'm slightly in favor!)

Since I'm still not 100% sure you understand the point I'm making, given some of your response, I'll clarify again, but this is not to make an argument that we should or should not change it now. I'm actually quite in favor of simplifying things since the benefit as I understood it never did materialize much.

Same applies to all of the witnesses. You don't need scripts or redeemers or signatures if you just want to reconstruct the ledger state.

You cannot download just bodies and validate them with respect to the ledger! You either need a full transaction or you don't do any validation. You can't just be picking and choosing what you want to validate.

This is not true. Scripts, Witnesses, and redeemers are fundamentally different with regards to the validity and trust assumptions compared to the auxilary data like transaction metadata.

If I don't ever need to look at the auxiliary metadata, and never depend on it for business logic, then I can get a fully trustless node by validating block headers and the transaction body + witness set. Regardless of what the preimage was, someone can't present me with fallacious data, because I'm not downloading the preimage, and consensus doesn't consider the auxiliary data for selecting a chain.

On the other hand, if I avoid downloading the witness set, then I have a fundamentally different trust model; I don't know the redeemers to apply to the scripts to even run them; I don't know whether the transaction is even authorized to spend the UTxOs.

There are ends of a spectrum, of course: I can have a trust-full node by ignoring the witness set, of course, and at the end of the day a full node needs to download the auxiliary data to make it available, I'm not arguing that that spectrum doesn't exist.

But I still maintain that there is a fundamental difference between these things; the body lets you trust-fully reproduce the ledger state; the block headers + tx bodies + witnesses let you trust-lessly validate and reproduce the ledger state; the block headers + tx bodies + witness + auxiliary data lets you fully replicate and serve all historical state, including non-ledger state like transaction metadata.

Again, I'm just sharing this as another point of data for what I thought someone had told me the motivation was, in case that shook forth additional context / memories / motivation for the people like yourself; I may certainly be misremembering, and I quite agree that it's likely not very relevant in todays landscape.

Beyond making sure that I'm not being misunderstood, I don't have a dog in this race on the proposal itself.

I had some recollection that Duncan Coutts

Duncan was the person who told me that it was done that way "because bitcoin did it that way".

I get what you are saying, but without auxiliary data it is not only the hash that is not verifiable, but you also cannot validate the size of the transaction, thus you cannot validate the fees, or the correctness of collateral amount. So, a lot of validation goes out of window in the ledger rules if you don't have the full thing.
I understand your thought process from the perspective of DApp developer, you only care about general transaction validity and ability to verify scripts. That is, however, only one perspective. And your view on the level of validity of any transaction with missing auxiliary data is very much flawed. And that is the point I am trying to bring across.

Glad to hear though that you are in favor of this simplification 😉

Ah, I was missing tx size, that's indeed a big factor of it 😅

Quantumplation · 2025-09-16T18:17:19Z

CIP-XXXX/README.md

+
+Note that we propose keeping invalid transaction indices separately, because:
+  * `isValid` flag - which determins validity -  is controlled by the block producing node, not by the the transaction creator
+  * it's more efficient: we serialize indices only for invalid transactions, which are a small minority.


in fact, "a small minority" understates it; in asymptotically 100% of blocks, this list would be empty and could take up a single byte 80.

I wonder how this plays with the proposed changes in the Leios CIP

in asymptotically 100% of blocks, this list would be empty and could take up a single byte 80.

That is why value form isValid itself is not serialized, otherwise that list would always be as long as number of transactions.

FYI. There is also a plan to remove isValid from transactions, so those indices for isValid=Flaslse has to stay as a separate list.

I wonder how this plays with the proposed changes in the Leios CIP

Leios does not care about how transactions are serialized in a block.
In the matter of fact, we've discussed with @nfrisby how this change could potentially be useful for speeding up mempool, which would be useful for Leios. Although this idea need to be properly experimented with.

aslesarenko · 2025-09-16T19:41:47Z

CIP-XXXX/README.md

+
+Segregated serialization of [Nested transactions](https://github.com/cardano-foundation/CIPs/pull/862) would be challenging both to specify and implement.
+Separating and concatenating components across nested and non-nested transactions introduces complexity that is error-prone and potentially inefficient, as it may require tracking offsets and performing additional buffering and copying at runtime.
+


This CIP will also slightly simplify implementation of #964 as it requires each transaction bytes to be hashed. However this is always possible to do regardless of the format, so this CIP is not a strict dependency for #964.

Co-authored-by: Ryan <[email protected]>

lehins

One important point we failed to mention in this CIP is that this rearrangement will also change the block body hash computation, while making it less complicated.
Currently block body hash is computed with this weird indirection:

blockBodyHash = hash(hash(txBodies) + hash(txWitnesses) + hash(txAuxData) + hash(txSeqIsValids))

We should mention in the CIP that block body hash computation will change to a single invocation of the hash function on the serialized block body. This simplification will be especially useful for Peras and Leios, since we'll be adding their respective certificates to the block body when those features get implemented:

blockBodyHash = hash(block_body)

where a block body has a clear definition:

block_body =
  [ transactions : [* transaction]
  , invalid_transactions : [* transaction_index]
  ]

lehins · 2025-10-30T15:19:33Z

CIP-XXXX/README.md

+  , transactions : [* transaction]
+  , invalid_transactions : [* transaction_index]
+  ]


We should make a concrete distinction what the block body is. This will also be useful for Peras and Leios, since we will be adding their respective certificates into the block body:

Suggested change

, transactions : [* transaction]

, invalid_transactions : [* transaction_index]

]

, block_body

]

block_body =

[ transactions : [* transaction]

, invalid_transactions : [* transaction_index]

]

rphair reviewed Sep 4, 2025

View reviewed changes

CIP-XXXX/README.md Show resolved Hide resolved

CIP-XXXX/README.md Outdated Show resolved Hide resolved

rphair added Category: Ledger Proposals belonging to the 'Ledger' category. State: Triage Applied to new PR afer editor cleanup on GitHub, pending CIP meeting introduction. labels Sep 4, 2025

Add initial version of CIP for Non-segregated Block Body Serialization

efdc1a0

teodanciu force-pushed the td/block-serialization branch from da79ecf to efdc1a0 Compare September 5, 2025 19:09

Ryun1 reviewed Sep 16, 2025

View reviewed changes

CIP-XXXX/README.md Outdated Show resolved Hide resolved

rphair added State: Unconfirmed Triaged at meeting but not confirmed (or assigned CIP number) yet. and removed State: Triage Applied to new PR afer editor cleanup on GitHub, pending CIP meeting introduction. labels Sep 16, 2025

Quantumplation reviewed Sep 16, 2025

View reviewed changes

aslesarenko reviewed Sep 16, 2025

View reviewed changes

Update link to Nested Transaction CIP as suggested

bb36afe

Co-authored-by: Ryan <[email protected]>

teodanciu mentioned this pull request Sep 19, 2025

Change to serialization of transaction in a Block Body IntersectMBO/cardano-ledger#5046

Closed

lehins mentioned this pull request Oct 7, 2025

Nested Transactions - CIP-118 IntersectMBO/cardano-ledger#5123

Open

23 tasks

lehins mentioned this pull request Oct 30, 2025

Implement serialization for Dijkstra era Block Body IntersectMBO/cardano-ledger#5380

Open

lehins reviewed Oct 30, 2025

View reviewed changes


		Segregated serialization of [Nested transactions](https://github.com/cardano-foundation/CIPs/pull/862) would be challenging both to specify and implement.
		Separating and concatenating components across nested and non-nested transactions introduces complexity that is error-prone and potentially inefficient, as it may require tracking offsets and performing additional buffering and copying at runtime.

CIP-???? | Non-segregated Block Body Serialization #1084

Are you sure you want to change the base?

CIP-???? | Non-segregated Block Body Serialization #1084

Uh oh!

Conversation

teodanciu commented Sep 4, 2025 • edited by rphair Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rphair left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rphair commented Sep 5, 2025

Uh oh!

teodanciu commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rphair commented Sep 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lehins Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lehins left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

teodanciu commented Sep 4, 2025 •

edited by rphair

Loading

teodanciu commented Sep 5, 2025 •

edited

Loading

lehins Sep 26, 2025 •

edited

Loading