Skip to content

Conversation

@setrofim
Copy link
Contributor

This pull implements the caching of unknown extension values when unmarshalling CoRIMs. Cached values are then incorporated when re-marshalling. This ensures that a CoRIM is preserved in its entirety across re-marshalling, even when containing extension values that do not correspond to fields in registered extension structs inside the unmarshal target.

This is necessary to support future migration of Veraison services to use corim-store, which will require lossless handling of CoRIMs without awareness of scheme-specific extensions.

Additionally, this ensures CDE compliance when marshalling to CBOR.

Copy link
Contributor

@thomas-fossati thomas-fossati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

Allow caching of unknown fields during PopulateStructFrom(CBOR|JSON).
The cache is then used during SerializeStructTo(CBOR|JSON). This ensures
that any map entries in the source data that do not correspond to target
struct fields are preserved across decode/encode cycle.

This feature is enabled by adding a field to a struct tagged with
`field-cache:""`. This field must be of type map[string]any. When
PopulateStructFrom* functions encounter input entries that do not
correspond to a field in the target struct, they will be added to the
field-cache map instead. Analogously, when SerializeStructTo* functions
see a field-cache map, they will add its entries to the output.

This scheme has some (hopefully, obvious) limitations:

- field-cache field's tag must also contain `cbor:"-" json:"-"` to make
  sure that the field itself will be ignored by serializers.
- if a struct is unmarshaled from JSON, any unknown field names
  must be "stringified" integers, otherwise it is impossible to obtain
  the corresponding CBOR code point mapping for the name.

Signed-off-by: Sergei Trofimov <[email protected]>
Up to this point, we only had deterministic encoding for structs. Fields
were encoded in the order they appeared.

This fix ensures that fields are encoded in lexicographic order as
required by CDE spec:

    https://www.ietf.org/archive/id/draft-ietf-cbor-cde-13.html#name-the-lexicographic-map-sorti

(Note: we typically define fields in the order of their code points, so
in practice, we have been _mostly_ compliant with CDE; but this was not
guaranteed, and deviations were possible, especially when extensions are
involved.)

Signed-off-by: Sergei Trofimov <[email protected]>
@setrofim setrofim force-pushed the setrofim/ext branch 3 times, most recently from d33a654 to 68e1c10 Compare December 12, 2025 13:24
Extensions objects will now cache any extensions they don't recognize
(that don't correspond to a field inside their registered IMapValue)
when deserializing values.

This means CORIMs with extensions remain stable when deserialized and
then re-serialized using structs without registered extensions.

Cached extensions are only used during marshalling, and will not be
returned when calling the Get* methods. However they can be accessed
directly via Extensions.Cached field.

When an IMapValue struct is registered, cached values are scanned, and
the new struct is populated with cached values, which are then removed
form the cache.

When registering a new struct when there is an existing IMapValue, the
non-zero-value fields of the old IMapValue are stored in the cache.

Signed-off-by: Sergei Trofimov <[email protected]>
Copy link
Contributor

@yogeshbdeshpande yogeshbdeshpande left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@setrofim setrofim merged commit 58b9f18 into main Dec 12, 2025
13 checks passed
@setrofim setrofim deleted the setrofim/ext branch December 12, 2025 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants