-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Jcb/serde minimal #147
base: master
Are you sure you want to change the base?
WIP: Jcb/serde minimal #147
Conversation
I don't understand that example. The IPLD Data Model doesn't impose an ordering. It depend on the codec you use. You'd also need to use the "DAG" version of those codecs, as the are targeted at generating the same hash, if the input data is the same (as good as possible). The hash is always built from the encoded data, not of the data model representation. So if you go Rust type -> Ipld -> DAG-CBOR -> DAG-JSON -> DAG-CBOR, the DAG-CBOR encoded data should be the same hash. The same if you flip DAG-CBOR and DAG-JSON. If not, then it's a bug.
Yes, using tuples is one way to preserve ordering across encodings if you need that. There is a new library called serde_ipld_dag_cbor, that should work better with Serde. I think you can get the same effect with |
I'm not sure if there's actually a way to go directly from The issue is that Ideally we would like this diagram to commute, but what my snippet shows is that depending on where a Rust type enters this diagram via serde, this might not be the case. Because JSON is order-preserving, and CBOR is strict-ordering, it is possible to cause the ordering of objects to change. I agree that this can't happen if you only convert to/from Ipld (which I think what you mean in your example). My preference would be to have the default serde Serializer use tuples by default. If we have to use the Furthermore, the field labels are not actually relevant to content-addressing, per se. And I think it's strange that e.g. changing the name of a field in a Rust type, would cause its hash to change. There ought to be a mechanism for providing this kind of schema or type information out-of-band, like through the CID codec field. |
Currently not. But hopefully we will also have a Serde based DAG-JSON coded, which then enables direct DAG-JSON <-> DAG-CBOR conversion.
DAG-JSON also has a specific key order, it just happens to be the same as the Rust's
Rust structs align well with IPLD Schema structs. There the default representation is a map, but you can chose a different representation, which is exactly what you are looking for. In your use case you don't care about the field names of the struct. I would try to solve this on the Serde level.
It depends on your application. If you want self-describing data without the need of an external schema, then you need the field names, else you wouldn't know what the data refers to. |
From the spec:
Okay, I think this results in the same ordering as DAG-CBOR (which is based on https://datatracker.ietf.org/doc/html/rfc7049#section-3.9), because of the restriction of DAG-CBOR maps to String keys (which iirc are restricted in CBOR to always be UTF-8). Doing DAG-JSON <-> DAG-CBOR then would mostly solve the ordering problem I raised, although I still think there's still a risk of end-users lowering their data to ordinary JSON. At a high level, it seems reasonable to me that we would want to support multiple representation strategies in IPLD.
It makes sense to me that the libipld serde serialization should support multiple representations. And doing that at the serde level makes sense, since then serializing via the I'm fine with having something like https://github.com/kardeiz/serde_tuple, where you use a different derive macro to indicate the representation you want:
But I think maybe it's worth including those derive macros in |
Sadly not. Quote from RFC-7049:
So you need to compare the lengths first. This was fixed in RFC-8949, but sadly we are stuck with the old RFC for DAG-CBOR.
Yes, similar to the macros libipld currently has for the non Serde based version. |
One more thought on this one. The same argument applies if you see a struct as a key-value map. Then it would be strange that re-ordering the fields (without changing their names) would change the hash. |
This is true, but I think this should be very explicitly a choice on the part of the user, since there are many cases where you might want to have named fields for developer ergonomics, but not to change the serde serialization. I've been looking at It might be possible to address this using more complicated derive macros, but I think my approach in this PR, of just defining an entirely new |
I haven't had a chance yet to look into the A problem I see with a feature flag is, that you then cannot selectively select one or the other, but all structs of your project will then be one way or the other. Though I'm generally open to just try things out and then break APIs whenever needed. |
Currently we serialize structs and enums as maps with the field names included in the data:
becomes
{"age": 52, "hobbies": ["geography", "programming"], "is_cool": true, "link": bafyreibvjvcv745gig4mvqs4hctx4zfkono4rjejm2ta6gtyzkqxfjeily, "name": "Hello World!"}
Which sorts the keys via lexicographic order of the strings, due to an interior use of
BTreeMap
. Order preserving codecs likedag-json
will preserve this order, whereas codecs likedag-cbor
will impose their own strict order. In this snippet https://gist.github.com/johnchandlerburnham/efcce4e52f820d10d5fcc96ef7b8cce6 we can see an example where by converting a Rust type to Ipld, then to cbor, then to json, we can obtain a different ordering of the fields, and thus a different hash.Instead, we can encode structs as tuples by default (similar to https://github.com/kardeiz/serde_tuple) and enums variants according to the order of their source position.
This has the further benefit of allowing the unit type
()
and the unit struct to be embedded as Ipld.Additionally, this PR alters the embedding of Rust map-like collections to use
Ipld::List
rather thanIpld::Map
. This allows maps with non-string keys to be embedded.By removing all occurrences of
Ipld::Map
from the default serde serialization, we can then solve #142 by using maps in the Serde data model, as a dynamic hint to indicate aCid
, similar to thedag-json
{"/": "bafy..."}
technique.