-
Notifications
You must be signed in to change notification settings - Fork 375
CPS-0024? | Canonical CBOR Serialization #1109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
CPS-0024? | Canonical CBOR Serialization #1109
Conversation
|
Thank you for drafting a CPS about this. You're right, a lack of canonical encoding has definitely been frustrating for many of us Cardano developers, especially those of us working on alternative node implementations (I can speak to my experience working on Amaru primarily). I have only done a quick skim of the content so I won't comment on the particulars in the CPS here right now, but hopefully I will have more time later to sit down and reread it carefully. But, to give some context: Sundae Labs has started work on a conformance testing suite which currently uses the Tweag has received treasury funding to work on exactly that actually, and they have a draft PR open already: #1083 (cc: @nc6 and @qnikst) |
rphair
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HinsonSIDAN now that this is marked "Ready for Review" we'll add it as Triage which puts it on the agenda for our next CIP meeting (https://hackmd.io/@cip-editors/123).
In the meantime I think it'll be useful to get some co-review with this current CIP candidate (cc @qnikst @lehins @nc6 @Ryun1 @Crypto2099):
|
|
||
| - Multi-signature workflows where each signer's wallet may re-serialize the transaction | ||
| - Cross-tool transaction building where fee calculations depend on exact byte size | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - Hardware wallets, which require the keys in every map to be sorted from lowest value to highest. | |
this is one Ive encountered, which was frustrating to figure out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ryun1 I did not put hardware wallets in since there is an active standard on it - https://cips.cardano.org/cip/CIP-0021. Perhaps you are referring to this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest issue with ordering is that Haskell (or any other programming language) ordering is not guaranteed to always match the CBOR ordering.
In other words, I suspect enforcing CBOR ordering would make Ledger implementation a bit more complicated.
For example What is the ordering for a key that is a tuple?
What is bigger -1 or 10?
According to CBOR ordering that would be -1, while in Haskell it would be 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest issue with ordering is that Haskell (or any other programming language) ordering is not guaranteed to always match the CBOR ordering.
In other words, I suspect enforcing CBOR ordering would make Ledger implementation a more complicated and error prone.
For example What is the ordering for a key that is a tuple?
What is bigger -1 or 10?
According to CBOR ordering that would be -1, while in Haskell it would be 10.
Not saying it is impossible, but something that needs to be taken under. consideration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HinsonSIDAN we're leaving this Unconfirmed for now due to lack of current agreement & statements about why such serialisation would be considered "dangerous" (as per #1109 (comment)) even though the node itself is doing this all the time (cc @fallen-icarus @Crypto2099).
@colll78 we collectively recalled that you've posted some things about this issue on X... can you give us the gist of how you stand on the issue & any other points that the community has made about it?
|
@rphair I am drafting this CPS mostly due to the common perspective I gathered in 2025 Vietnam Builderfest and 2025 UPLC conference in edinburge. I believe it is a pain to vast majority of builders. I am happy to keep this CPS unconfirmed indeed if that's not a too pain of an issue that people interested to put more comments in. Tho one more comment / observation. It seems like this issue has bugged so many Cardano developers from junior to middle level, that scares away many potential talents for our community. However, when we get to more senior in Cardano dev, usually we are experience enough to figure out workaround on issues caused by this problem. Like we in SIDAN Lab & DeltaDeFi built some stupidly complicated dev tools for it, where we shared in 2025 Edinburgh UPLC conference
Eventually, no one got interested enough to solve it for others. Sorry for this week CIP meeting I cannot squeeze time to attend, I would try to share more in next editor meeting. Appreciate the moderation! |
|
@HinsonSIDAN since you've already volunteered to attend the next meeting (https://hackmd.io/@cip-editors/124) as per #1103 (comment), I've also added this one to the agenda: keep in mind that this agenda will be packed due to requests to recap the Leios status, but at least we should have time for you to review the relationships above & reassure the technically oriented editors & observers about the "safety" of CBOR conversion. |
|
Hello, sorry for jumping late on the party. From the perspective of the team who works on canonical ledger state serialisation, supporting serialised CBOR standard is a very good initiative, that we will strongly support and ready to participate in. This CPS and CIP-0165 do overlap, or no be more concrete CIP-0165 can use the results of this intiative what ever they are. In CIP-0165 we want to use canonical representation, however as CIP-0165 does not put any restrictions on the ledger and on-wire data we just used what current CBOR RFC defines with small restrictions. In case if community align on the concrete implementation of the CBOR serialisation we will join the efforts and going to use that in the reference implementation and fix the rules in CIP. As for the current state we relied on the From my personal perspective in addition to defining canonical and deterministic serialisation rules it would be nice to agree in the document on using patterns and tag, like #258 for set. This way it would be possible to rely more on CDDL files and validation tools, that at the moment can't enforce properties like "values in the set should be ordered". The discussion about the safety and concrete examples should be held, especially because there is state of the art in CBOR RFCs about how to define deterministic and canonical CBOR. |
My opinion is:
Any argument that enforcing canonical cbor is insecure is mute because every dapp & wallet and application on Cardano must enforce it anyway for hardware wallet compatibility, and simply deciding to ignore hardware wallet compatibility is not an option as a product. As for why CBOR is bad, just look at: The size of data serialized with cbor is double the size of data serialized with protobuf, and the deserialization time differs by only 1 ms. That is not even close to a reasonable tradeoff. The size of serialized data is perhaps the single largest factor in terms of blockchain scalability. This data means that if everything was encoded with protobuf, the maximum throughput would be 2x whatever it is now. Even in a future when Leios exists this doesn't cease to matter, in-fact as throughput goes up this matters even more, because higher throughput means waste per message becomes much more important (because we have more messages). |
|
Please don't make it custom; make it the existing canonical cbor standard that we've all been forced to use already 😅 |
|
@HinsonSIDAN @fallen-icarus @colll78 @qnikst @Quantumplation editors would be ready to consider the candidacy of this CIP again, especially according the lesser-of-two-evils explanation in #1109 (comment). If there's anything else we should be considering, please come to the CIP meeting tomorrow, if possible, and we'll get everything on record in any case... and @colll78 if there's any other suggestion of protobuf around CIPs going forward I'll connect it back here. |
|
I'm not sure that two evils are as simple as stated. Yes, CBOR is not the most efficient in the size and efficiency of encoding/decoding protocol, as least because it's schema-less and has to keep type tags. On the other hand it does allow interspectability and allows to define global types and rules for them (read tags). However just schema is not enough to bring canonicity with either CBOR and protobuffers, as we should be careful enough how we store the data inside the fields and if it's canonical. For example reading the documentation of the protobuf I find the following property for the map type: In addition there are other sources for the protobuf criticism (e.g. https://reasonablypolymorphic.com/blog/protos-are-wrong/) whether to agree with all the points or not, choosing a protobuf as an option one may not be an easy choice that just works (tm) in our scenarios. As here we need a serialisation protocol that is consistent, canonical, allows to express required data structures and efficient enough. And CBOR is not a bad choice in this setting :) As for defining our own CBOR, not sure about the phrasing here, as canonical/deterministic CBOR definition exists in its RFCs. So likely that we should carefully check if it meets our requirements, agree on contradicting points (e.g. one RFC proposes length-ordered keys order in maps and another value based), possibly set an additional restrictions e.g. if we allow indefinite structures or not and verify that are all our choices are safe. As a result it may happen that the existing resulting rules can be fully based on existing RFC (so it's not a new CBOR). In addition we can define some additional rules like, always use sets tag for sets in does not change encoding and libraries compatible with existing RFCs. |
an effort for this has been made via CIPs towards this CIP-114 | CBOR Tags Registry |
It will be harder for sure; my point wasn't that this is a good option now, but that it's something we should have done from the start (in which case we would already have made progress in creating a canonical representation by now). As I said, at this point cbor is far too engrained in our ecosystem and all downstream tooling for us to even really consider using this approach. Why is this approach the best option if we could start from scratch? The proof for this can be seen from observing other high performance blockchains. Solana uses a custom canonical binary wire format built from its Rust structs + shortvec varints; Aptos uses Binary Canonical Serialization, which as implied by its name is a custom canonical binary serialization format. Sui also uses BCS. Near uses Binary Object Representation Serializer for Hashing, Borsh, a canonical, non-self-describing binary format. Cosmos and all Cosmos SDK chains use protobuf. Algorand uses Canonical MsgPack. There is a consistent trend amongst all of these high performance blockchains:
The benefit of having the serialization format be self-describing is relatively useless given the abundant of modern techniques for circumventing the need for interspecability entirely (these techniques were mostly pioneered by other blockchains explicitly to solve this problem). For example, even though Aptos' BCS requires the data type of a serialized value to the enforced by the application, this requirement is easily fulfilled by using unique hash seeds for each data type (a technique that they inherited from the Diem blockchain). Each of the high performance chains listed above employ their own techniques to circumvent any need for self-describing properties in their serialization formats. So my conclusion is that either:
Personally, I believe that when every other blockchain on the market since 2017 is using non-self-descriptive serialization formats, and we are the only ones using a self-descriptive format, then it's more likely than not that we are the ones who made the non-optimal choice. |
rphair
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @colll78 @qnikst for helping us understand the CBOR debate, even if there are some unhappy consequences today that we can't do anything about. My understanding & the general impression from yesterday's CIP meeting is that this statement of what to do about contemporary treatment of CBOR is still well-presented and usable in this CPS draft itself.
After further review at the meeting we've decided to progress this as a CPS candidate — @HinsonSIDAN please rename this directory to CPS-0024. 🎉
Some requests for expert review were emphasised at the meeting to try to keep this proposal concrete and usable (i.e. something that we can feel good about merging, not just a "wish list"):
- @qnikst sounded happy to review this as a complete document & the editors would look forward to ensuring all statements & recommendations are in order... and especially that the CPS is written so it can lead to CIPs that satisfy it. @colll78 whatever you can provide as review would be equally appreciated for the same reasons.
- @lehins though this is not about any structure in the Ledger, we did think it would be nice to have your opinion about anything the CPS should contain that would make resulting CIPs more interoperable with the Ledger, and with each other, where CBOR is concerned... if there's someone else you'd prefer we tag about this particular issue, please feel free to pass it on.
lehins
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very much against canonical CBOR to be enforced at the Ledger level and in general. Here is my reasoning:
- Drastically increases complexity and opens up possibilities for new bugs to be added due to such transition
- Non-canonical CBOR deserialization that we have today will have to be supported indefinitely anyways due to the on chain history.
- It is impossible to fully switch to canonical CBOR for on chain data because of plutus data. There are likely transaction outputs that are locked on chain today requiring data that is expected to have a non-canonical form. I.e. switching to canonical CBOR would lock those funds forever.
- It is not solving any issues, since proper solution should be correct support of tooling of an existing standard.
- Promotes reserialization of transactions that ought to be immutable for the purpose of signing. Re-serialization for the purpose of signing is just a bad idea in general.
This is my professional opinion, which means if Cardano community as a whole is willing to sacrifice safety to make Cardano more user friendly, then Ledger team will have no chance but implement it regardless of my opinion.
Disclaimer: I am not a big fan of CBOR standard in general and if this discussion had happened at the design stages of initial version of Cardano my opinion maybe would have been different or at least I could have been persuaded to change my opinion much easier. Today, however, I very much believe this ship has sailed and it would be much safer to support full CBOR instead of enforcing a drastic change like this.
|
|
||
| **Transaction Hash Instability**: When a transaction is passed between tools or wallets for signing, each may re-serialize it differently. Since transaction hashes are computed over CBOR bytes, logically identical transactions produce different hashes. This breaks: | ||
|
|
||
| - Multi-signature workflows where each signer's wallet may re-serialize the transaction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this as problem in this CPS, since reserialzaition should never happen for the purpose of signing and if it is then it is because tooling is doing incorrectly. I would concider it a bug in the software regardless if canonical vs non-canonical serialization is used.
Whenever a payload is given to a program for the purpose of signing, that payload should not be mocked with. Same applies to transactions, if all you need is to sign the transaction either for the purpose of multi-sig or any other, transaction hash and serializaiton should not be recomputed and the original bytes must be retained and that is what needs to be signed.
| **Transaction Hash Instability**: When a transaction is passed between tools or wallets for signing, each may re-serialize it differently. Since transaction hashes are computed over CBOR bytes, logically identical transactions produce different hashes. This breaks: | ||
|
|
||
| - Multi-signature workflows where each signer's wallet may re-serialize the transaction | ||
| - Cross-tool transaction building where fee calculations depend on exact byte size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow this one. If there is any change to a transaction body then the bytes might change which could affect the fee. That is quite normal and expected.
If you mean that some tooling might want to change some part of the transaction body that it itself should not change the size if canonical CBOR was used (eg. change a required signer, i.e. swap one hash for another), then I can see it as an argument, but I don't understand why such tooling couldn't just recompute the minimum fee?
|
|
||
| - Multi-signature workflows where each signer's wallet may re-serialize the transaction | ||
| - Cross-tool transaction building where fee calculations depend on exact byte size | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest issue with ordering is that Haskell (or any other programming language) ordering is not guaranteed to always match the CBOR ordering.
In other words, I suspect enforcing CBOR ordering would make Ledger implementation a bit more complicated.
For example What is the ordering for a key that is a tuple?
What is bigger -1 or 10?
According to CBOR ordering that would be -1, while in Haskell it would be 10.
|
|
||
| - Multi-signature workflows where each signer's wallet may re-serialize the transaction | ||
| - Cross-tool transaction building where fee calculations depend on exact byte size | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest issue with ordering is that Haskell (or any other programming language) ordering is not guaranteed to always match the CBOR ordering.
In other words, I suspect enforcing CBOR ordering would make Ledger implementation a more complicated and error prone.
For example What is the ordering for a key that is a tuple?
What is bigger -1 or 10?
According to CBOR ordering that would be -1, while in Haskell it would be 10.
Not saying it is impossible, but something that needs to be taken under. consideration
| - Multi-signature workflows where each signer's wallet may re-serialize the transaction | ||
| - Cross-tool transaction building where fee calculations depend on exact byte size | ||
|
|
||
| **Script Inconsistencies**: Smart contracts suffer from unpredictable script hashes, reference script mismatches across tools. The same compiled script may produce different hashes depending on the library used to apply parameters or cbor serialize the script. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe compiled scripts actually use CBOR serialization. @zliu41 will have a definitive answer, but from I know Plutus uses flat library for serialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct. They're wrapped in a script type defined in the CDDL within a transaction, but the compiled scripts themselves are just flat encoded bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to describe it more precisely, the issue we faced is that the different uplc libraries in different languages bahave differently. i.e. the Aiken uplc in Rust and HLabs uplc npm package implemented differently such that in Mesh users get different script cbor if using different cores. It is a massive devexp issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
users get different script cbor if using different cores.
@HinsonSIDAN plutus core is not using CBOR for serialization. So, this issue albeit looks related, it has nothing to do with CBOR! It has to do with the fact that there is no standard for plutus core serialization at all.
In other words, if you'd like to use canonical or non-canonical CBOR for Plutus core serialization you could create a separate CIP for it, but I suspect that there will be some pushback there as well, since custom serialization that is currently in use is likely more efficient that CBOR.
|
|
||
| **Script Hash Consistency**: A developer publishes a reference script on-chain, then references it from their off-chain code. Currently, locally computed script hashes may not match the on-chain version due to encoding differences. Canonical serialization guarantees hash consistency across compilation and deployment pipelines. | ||
|
|
||
| **Library Maintainers**: Serialization library authors currently must support multiple encoding strategies for compatibility. With a standard, they can focus on a single canonical implementation, reducing maintenance burden and improving deserialization reliability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, serialization libraries will have to support both non-canonical for historical data and canonical CBOR for new data. So, IMHO serialization library authors will be impacted negatively by this CIP
|
|
||
| ### Optional Goals | ||
|
|
||
| 4. **Ledger-level enforcement**: If community consensus supports it, implement validation rules in the ledger to guarantee compliance (requires hardfork and backward compatibility strategy). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deserialization in Ledger is not part of the Ledger rules. It is a totally separate stage, when compared to transaction validation.
| 4. **Ledger-level enforcement**: If community consensus supports it, implement validation rules in the ledger to guarantee compliance (requires hardfork and backward compatibility strategy). | |
| 4. **Ledger-level enforcement**: If community consensus supports it, canonical deserilization must be correctly implemented in ledger to guarantee compliance (requires hardfork, backward compatibility and forward migration strategies). |
| This CPS is successfully resolved when: | ||
|
|
||
| - A canonical CBOR serialization CIP reaches "Active" status with clear specifications | ||
| - At least 80% of major libraries and wallets demonstrate compliance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to quantify major libraries today?
Also, there is no mention of Ledger actually have it implemented:
| - At least 80% of major libraries and wallets demonstrate compliance | |
| - At least 80% of major libraries and wallets demonstrate compliance | |
| - cardano-node has a hard fork ready foe a new Ledger era that changes its deserializers to canonical CBOR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed this CPS is not intended to affect ledger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, there is a section about it:
Should the standard be enforced at the ledger level?
As I already pointed out, I would not want this CPS to affect Ledger either. I am a bit skeptical about standards that aren't enforced by the chain itself, but there are standards like these that proved themselves to work. So, at the very least, I would suggest adding all the drawbacks that I've mentioned to the section that suggests that this "standard should be enforced by the ledger level"
|
|
||
| When multiple valid CBOR encodings exist, how should we decide which becomes canonical? | ||
|
|
||
| - **Efficiency**: Minimize transaction size (e.g., smallest integer encoding, definite over indefinite length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definite and indefinite length encoding are have more efficient depending on number of elements. When there are less than 23 elements in an array definite length encoding is more efficient, while large count benefits from indefinite length encoding.
So, canonical will make encoding less efficient.
|
|
||
| ## Open Questions | ||
|
|
||
| ### What are the guiding principles for choosing the canonical form? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a custom canonical CBOR standard is to be designed then there is much higher chance that it will be implemented correctly by all of the tools. Grabbing an existing standard and accepting any potential drawback it could have (eg. non-optimal size) would be a safer bet IMHO.
rphair
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lehins I appreciate your detailed review and your providing a realistically complete look at the problem. I trust @qnikst / Tweag will also respond about their own considerations of all these issues.
Even if Ledger considerations prevent further progress, I don't regret the editors' assignment of a number and considering this a CPS "candidate" — since the standardisation problem will be discussed in the dev community no matter what, and this review thread should avoid fragmenting that discussion & might produce some acceptable resolutions here.
In any case @HinsonSIDAN the document would need a lot more detail & depth to reflect the considerations above. I don't think it would be proper to rely on CIPs to address these concerns: otherwise we would get a dispersion of approaches, instead of the unified response we would need to produce particular CIPs from (perhaps by category: Ledger, Tools, Wallets).
We should also be prepared for the possibility that this CPS may remain unmerged if the Ledger and community developer points of view can't be reconciled.
|
@lehins Appreciate a lot of your review! I agree with most of your concerns if we see the canonical form is built in ledger. Originally I created this CPS with multiple categories since it is possible to be built / supported in different parts of community - #1109 (comment). But eventually I think the If we think about the standard is applied only to the level of tools, we can still solve the devexp issue without the need to change any ledger implementation. And the list, quantifying as if collecting most active transaction building libraries and its dependencies for example, is not too difficult to obtain for any CIP candidate also, such as: - Mesh
- whisky
- Aiken uplc
- csl / pallas
- cardano-sdk-js
- HLabs uplc
- Lucid evo
- pycardano
- apollo
- CSL
- CML
- etc ...So eventually we can have a map of libraries who are actively following such standard, at least by default, so it becomes safe for 99% of DApp builders to never care CBOR again. If think about the standard is built in Tools level only, I think some of the concerns above becomes irrelevant. Let me know how it feels if using this perspective to see this CPS, appreciate that! |
@rphair With respect to canonical ledger state, it is a feature that is totally new, which makes making such decision much easier. Because of that and because @qnikst is working on a canonical ledger state, I believe using canonical CBOR would be a great idea.
I 100% agree with you. I've seen this idea of canonical CBOR being thrown around in all sorts of inappropriate places like twitter and such. This is a much better place to have a technical discussion about this |

This CPS emerged from community feedback at events like Cardano Builder Fest 2025, where developers identified CBOR serialization fragmentation as a critical pain point hindering ecosystem maturity.
This CPS addresses the growing interoperability challenges caused by non-deterministic CBOR serialization across Cardano's tooling ecosystem. The same logical transaction or script can be encoded in multiple valid ways, leading to different hashes and breaking multi-signature workflows, cross-tool transaction building, and script reference consistency.
Ideally we can establish a canonical CBOR serialization standard that would be adopted across major libraries and wallets, ensuring predictable behavior and reducing the development friction that currently exists when working across different tools.
(rendered latest document)