-
Notifications
You must be signed in to change notification settings - Fork 375
CIP-???? | Non-segregated Block Body Serialization #1084
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
CIP-???? | Non-segregated Block Body Serialization #1084
Conversation
rphair
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@teodanciu @lehins - looks good in format & readability; we'll introduce this in Triage at the next CIP meeting: https://hackmd.io/@cip-editors/119
da79ecf to
efdc1a0
Compare
|
@teodanciu please let's avoid force pushing again, as requested here: https://github.com/cardano-foundation/CIPs/blob/master/CIP-0001/README.md#1a-authors-open-a-pull-request This is a shared resource for commentary & editing, not a mirror of the author's code. We need to keep track of review points and correlate these with changes to the branch, and force pushing effectively wipes out that history. 🙏 |
|
That makes sense, I apologize. Thank you for the information! |
|
@teodanciu @lehins today's CIP meeting has not yet confirmed this as a CIP candidate mainly (as I understood from meeting discussion) because we can't verify that other node development teams have endorsed this. We invited people at the meeting to tag relevant reviewers from those teams & so hopefully that will proceed next. Including the Nested Transactions already mentioned in the proposal, we also noted that potential changes to the block body make this proposal considerable with these other CIPs (and so tagging authors & some reviewers of these proposals): |
| Currently, segregated serialization also complicates the logic used by Consensus to estimate transaction sizes when selecting transactions to fit within a block. Switching to a format where full transactions are encoded sequentially would simplify this process. | ||
|
|
||
| Considering - in hindsight - the original motivation for adopting segregated serialization rather than the more natural and predictable way of serializing a list of transactions - it's unclear whether it provides any real benefit. | ||
| The intent was to enable static validation of a transaction without decoding its witnesses. However, this benefit conflicts with the need for strict field evaluation, which is essential to prevent space leaks caused by retaining the original block bytes for an indeterminate period. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was always under the impression that this segmentation was instead so that you could have light clients that, for example, ignored the auxiliary data. and just downloaded the transaction bodies and witnesses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This segregation was done only because Bitcoin did it that way. This is what I've learned from others that were around during that time. From what I know, current structure is not utilized by anyone out there and it is definitely not utilized by the node.
It is absolutely possible to download transactions without the parts that you don't care about, regardless how transaction is serialized. Naturally this would prevent any sort of validation, but presumably that would not be a problem for light clients that trust the data they are downloading.
FTR. dropping auxiliary data prevents transaction validation, since its hash is in the body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could still validate the transaction body; because only hash is in the body. You don't need to know what the preimage of the hash is to validate that the ledger state transition represented by / caused by the transaction body is valid.
What you can't validate is the auxiliary data, which is fine, because you didn't download it anyway, and its contents can't change the validity of the state transition to the ledger.
Maybe splitting hairs semantically about what "validating" encompasses, but the point is that there is a world where you can download just the tx bodies, validate them with respect to a ledger, without having to download the potentially much larger auxiliary data.
Whether that is useful / valuable to preserve is another question entirely :) just clarifying what I meant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could still validate the transaction body; because only hash is in the body. You don't need to know what the preimage of the hash is to validate that the ledger state transition represented by / caused by the transaction body is valid.
Same applies to all of the witnesses. You don't need scripts or redeemers or signatures if you just want to reconstruct the ledger state.
Maybe splitting hairs semantically about what "validating" encompasses, but the point is that there is a world where you can download just the tx bodies, validate them with respect to a ledger, without having to download the potentially much larger auxiliary data.
You cannot download just bodies and validate them with respect to the ledger! You either need a full transaction or you don't do any validation. You can't just be picking and choosing what you want to validate. You either trust the full block/transaction and skip validation altogether or you don't trust any of it and verify it yourself using the ledger logic.
Anyone who has a different opinion on that matter can do whatever the hell they want, but we do not need to care or support their use case.
there is a world where you can download just the tx bodies
Even if you don't need to do validation this is not exactly true, since you also need isValid flag, otherwise you don't know which inputs have been spent or outputs created.
Moreover, with introduction of nested transactions, you will no longer be able to download "just the transaction bodies", since those transaction bodies will contain subtransactions with their own witnesses and their own auxiliary data, unless we also split them out into a separate part of the block.
My point is that, it is always possible to download transactions with any information stripped out from transactions, regardless of how they were serialized. In the matter of fact db-sync does just that and more. All one needs to do is discard the bytes that are not relevant for the end user (setting auxiliary data, witnesses, etc. to empty before transmission of transactions to the light client)
@Quantumplation In any case, your arguments are about hypothetical use cases that someone somewhere might or might not implement at some point. While my argument in favor of this CIP is that it will facilitate a safer and less complex implementation of Nested Transactions.
Imagine what we (and others who will need the full transaction from the block) would need to do if we are to keep segregated witnesses in presence of sub-transacitions! Today's logic for combininng and splitting is already a hairy mess and I do not want to make it worse. If we are to add another level to this monstrosity, there will be a much higher potential for bugs to be introduced, be it in ledger or in any other tool or node that has to deal with constructing full transactions.
So, I appreciate your concern, but unless we find someone that actually relies on the current structure, such speculative arguments do not carry much weight IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not making arguments either way, for what it's worth.
From my perspective, the stance that it was done this way "because bitcoin did it that way" is very incongruous with the thought and care that went into many other decisions of the early ledger. Usually things that seem arbitrary have at least some well thought out reasoning, if outdated or prematurely optimizing.
I had some recollection that Duncan Coutts had told me what that reason was at one point. I may certainly be mistaken on that, but I'm just sharing in case it's helpful.
Beyond making sure that that is actually understood, I don't have any particular opinion on this proposal (in fact, I'm slightly in favor!)
Since I'm still not 100% sure you understand the point I'm making, given some of your response, I'll clarify again, but this is not to make an argument that we should or should not change it now. I'm actually quite in favor of simplifying things since the benefit as I understood it never did materialize much.
Same applies to all of the witnesses. You don't need scripts or redeemers or signatures if you just want to reconstruct the ledger state.
You cannot download just bodies and validate them with respect to the ledger! You either need a full transaction or you don't do any validation. You can't just be picking and choosing what you want to validate.
This is not true. Scripts, Witnesses, and redeemers are fundamentally different with regards to the validity and trust assumptions compared to the auxilary data like transaction metadata.
If I don't ever need to look at the auxiliary metadata, and never depend on it for business logic, then I can get a fully trustless node by validating block headers and the transaction body + witness set. Regardless of what the preimage was, someone can't present me with fallacious data, because I'm not downloading the preimage, and consensus doesn't consider the auxiliary data for selecting a chain.
On the other hand, if I avoid downloading the witness set, then I have a fundamentally different trust model; I don't know the redeemers to apply to the scripts to even run them; I don't know whether the transaction is even authorized to spend the UTxOs.
There are ends of a spectrum, of course: I can have a trust-full node by ignoring the witness set, of course, and at the end of the day a full node needs to download the auxiliary data to make it available, I'm not arguing that that spectrum doesn't exist.
But I still maintain that there is a fundamental difference between these things; the body lets you trust-fully reproduce the ledger state; the block headers + tx bodies + witnesses let you trust-lessly validate and reproduce the ledger state; the block headers + tx bodies + witness + auxiliary data lets you fully replicate and serve all historical state, including non-ledger state like transaction metadata.
Again, I'm just sharing this as another point of data for what I thought someone had told me the motivation was, in case that shook forth additional context / memories / motivation for the people like yourself; I may certainly be misremembering, and I quite agree that it's likely not very relevant in todays landscape.
Beyond making sure that I'm not being misunderstood, I don't have a dog in this race on the proposal itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some recollection that Duncan Coutts
Duncan was the person who told me that it was done that way "because bitcoin did it that way".
I get what you are saying, but without auxiliary data it is not only the hash that is not verifiable, but you also cannot validate the size of the transaction, thus you cannot validate the fees, or the correctness of collateral amount. So, a lot of validation goes out of window in the ledger rules if you don't have the full thing.
I understand your thought process from the perspective of DApp developer, you only care about general transaction validity and ability to verify scripts. That is, however, only one perspective. And your view on the level of validity of any transaction with missing auxiliary data is very much flawed. And that is the point I am trying to bring across.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to hear though that you are in favor of this simplification 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was missing tx size, that's indeed a big factor of it 😅
|
|
||
| Note that we propose keeping invalid transaction indices separately, because: | ||
| * `isValid` flag - which determins validity - is controlled by the block producing node, not by the the transaction creator | ||
| * it's more efficient: we serialize indices only for invalid transactions, which are a small minority. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact, "a small minority" understates it; in asymptotically 100% of blocks, this list would be empty and could take up a single byte 80.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how this plays with the proposed changes in the Leios CIP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in asymptotically 100% of blocks, this list would be empty and could take up a single byte 80.
That is why value form isValid itself is not serialized, otherwise that list would always be as long as number of transactions.
FYI. There is also a plan to remove isValid from transactions, so those indices for isValid=Flaslse has to stay as a separate list.
I wonder how this plays with the proposed changes in the Leios CIP
Leios does not care about how transactions are serialized in a block.
In the matter of fact, we've discussed with @nfrisby how this change could potentially be useful for speeding up mempool, which would be useful for Leios. Although this idea need to be properly experimented with.
|
|
||
| Segregated serialization of [Nested transactions](https://github.com/cardano-foundation/CIPs/pull/862) would be challenging both to specify and implement. | ||
| Separating and concatenating components across nested and non-nested transactions introduces complexity that is error-prone and potentially inefficient, as it may require tracking offsets and performing additional buffering and copying at runtime. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Ryan <[email protected]>
lehins
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One important point we failed to mention in this CIP is that this rearrangement will also change the block body hash computation, while making it less complicated.
Currently block body hash is computed with this weird indirection:
blockBodyHash = hash(hash(txBodies) + hash(txWitnesses) + hash(txAuxData) + hash(txSeqIsValids))
We should mention in the CIP that block body hash computation will change to a single invocation of the hash function on the serialized block body. This simplification will be especially useful for Peras and Leios, since we'll be adding their respective certificates to the block body when those features get implemented:
blockBodyHash = hash(block_body)
where a block body has a clear definition:
block_body =
[ transactions : [* transaction]
, invalid_transactions : [* transaction_index]
]
| , transactions : [* transaction] | ||
| , invalid_transactions : [* transaction_index] | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make a concrete distinction what the block body is. This will also be useful for Peras and Leios, since we will be adding their respective certificates into the block body:
| , transactions : [* transaction] | |
| , invalid_transactions : [* transaction_index] | |
| ] | |
| , block_body | |
| ] | |
| block_body = | |
| [ transactions : [* transaction] | |
| , invalid_transactions : [* transaction_index] | |
| ] |
Simplify serialization of transactions in a block
(latest version of proposal)