-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
log and log trie multicodec types #223
Conversation
Also changing eth-block to eth-header, as it is actually only the header that is hash referenced by "block"hash and the header in turn contains all the roots to the tx, rct, state (and thereby storage) tries. Also changed RLP to MarshalBinary in the description for tx and rct, because after EIP-2718 tx envelopers were introduced the consensus encoding of a tx or rct is only the pure RLP encoding for the legacy types.
The rename of WRT to the rest, I'm not clear on the relationship between the receipt and the log entries in ETH—by placing a link to the log entries in the receipt you're suggesting that this can be derived from some binary that I give you that represents the "receipt", but I'm not sure that's the case, is it? The log entries are separate to the receipt, so you don't have a two-way relationship such that you could round-trip a "receipt" through IPLD and back again, either way, and this log trie root CID is intact? In Maybe this is another one of those places where a separate utility type that exists outside of the chain would be a more appropriate choice rather than reading something into the chain that's not there? |
Depends on who's using this table? This won't cause a problem in go, but yeah, I'd prefer to leave it as-is unless there's a good reason to change it. |
@i-norden best leave the renaming out, but changing the note at the end to refer to the "header" is a good idea to be clear. |
Will do!
That's correct, our code needs the logs in order to derive the CID (we create a the log trie out of them, and derive the CID from the root hash of the trie). We are only placing the log trie CID link in the IPLD node representation of the receipt, we are not changing the binary encoding of a receipt IPLD- that remains the Etheruem consensus encoding of a Receipt. So when we decode from binary we use the logs to generate the CID and when we encode back to binary we don't encode the CID, directly, but we encode those logs which can be used to generate it again. The CID itself is not persistent but the data used to generate it in a deterministic way is. This probably doesn't qualify as a proper round trip, but with the consensus encoding of a receipt we have everything we need to go both ways. Considering how etheruem CID links are normally "encoded" it doesn't seem like too much of a stretch to me. We never actually persist an entire CID into the IPLD block binary- we instead have encoded a keccak256 hash that we use to derive, per the codec, a CID link when we materialize the IPLD node from its binary. Our use of map representation for these IPLD nodes when the binary is actually a struct/tuple encoding is another sort of materialization transformation like this. Although, this transformation is more complex and entirely specific to this type.
Yeah I think that makes sense, although we would really like to be able to use IPLD selectors on individual logs. And, as I see it, we aren't so much reading something into the chain that's not there but rather applying a transformation on what already exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I think the use of this is pushing some boundaries of IPLD's data model <> encoding concepts to the limits, or maybe beyond limits, although it could very well be that this simply highlights limitations of what we're trying to circumscribe IPLD and maybe we need to be more flexible when it comes to existing formats that play fast and lose with content addressing for "linking"—or maybe more accurately, don't use hashing for "linking" in the sense that we want to be linking, since these blockchain formats really just want to use hashing for merkle inclusion but we want to use it for linking and navigating.
Either way though, I don't think these details preclude inclusion in the multicodec table of something that is a thing in the wild and can be represented in IPLD at some level.
Thanks @rvagg! That makes sense, and I feel a bit apprehensive about making this change to the Receipt IPLD codec implementation myself because generating the LogTrie and calculating its root every time we materialize a Receipt ipld.Node from the Receipt binary will have some significant overhead if there are a lot of logs in the receipt. I can do some benchmarking to examine this further. Definitely agree it makes sense to merge these Log and LogTrie multicodecs, we can represent these objects as IPLD whether or not (or how) we decide to link to them from a Receipt. Perhaps we could introduce a second Receipt multicodec type and corresponding codec implementation where we perform this LogTrie materialization and linking and leave the original Receipt codec as is? Is there any precedent for that, having two different multicodec types for the same binary data? |
The main precedent is #207 has ongoing discussion about doubling-up but there's a lot of resistance to using the codec code in a CID as anything more than a "this is how to decode the binary data" signal. It continues to be tempting to want to use it to signal more than that! I think it argues for an extension of the CID format to include additional signalling options, codec should probably be as limited as possible. There is another almost-precedent which perhaps has some relationship here— I don't know if that helps at all, but the general rule is something like: codec code in a CID should only be used to signal the decode (and encode) algorithm used to turn the block's bytes into data model form. But there's some flexibility around the edges to deal with the messy real world. |
In our IPLD representation of the Ethereum Merkle data structure, we would like to compose the logs contained in a receipt into a "log trie" that is linked to from the receipt. We want to Merkleize them in a manner analogous to how transaction and receipt lists canonically are: composing them into a Modified Merkle Patricia Trie (MMPT), whereby the RLP encoded log is the value stored in a leaf node and the path to that leaf node is the RLP encoding of the log's index in the receipt.
We are not changing the binary encoding of a receipt IPLD- that remains the consensus encoding of a receipt- we only want to change the in-memory materialization of that binary data so that a receipt IPLD node links to the root node of a log trie.
Implementations of the new log and log trie codecs and the updated receipt codec can be found here.
The motivation for this is so that
The primary issue here, I think, is that it does add additional overhead when unmarshalling receipt binary into an ipld.Node as we need to materialize the trie to calculate its root hash and generate the link. If need be, we can create a custom codec that does this and leave the normal receipt codec as is.
Some unrelated tweaks I threw in, that might be superfluous:
eth-block
toeth-header
, as it is only the header that is referenced by "block"hash and stored as the IPLD block. The header contains all the links to the uncles, tx, rct, state (and thereby storage) tries but the block body is not a part of the "eth-block" IPLD block.