Skip to content

Proposal draft: trivially mergeable Jelly files #102

@Ostrzyciel

Description

@Ostrzyciel

Currently, if you simply concatenated two Jelly files, you would most likely get out corrupted data. Jelly files start their lookup IDs and references from 0, which is lastId + 1 in most cases. When starting a stream, 0 means 0 + 1. However, if we just read a stream that set the lookup IDs to something (which is ALWAYS the case), the 0 will mean some completely different number. This is the only reason that makes impossible to concat Jelly files, you have to use transcoding to get around this.

While transcoding works fine, it's not always what's needed in the use case. It would be nice if we could introduce periodic checkpoints (e.g., in Kafka or in seekable large files). However, this is not possible in the current setup, unless we modify the serializer to always start with ID 1, instead of 0.

I see two potential solutions here.

(1) Force reset lookup IDs after an options row. This would work, but would be a backwards-incompatible change. Currently, we allow the options to reoccur as many times as we want in the stream, and they have absolutely no effect on the interpretation of lookups.

(2) Tell serializers to always start with an explicit ID: 1. This would be backwards-compatible, if we kept a part of the spec telling us how to interpret 0 at the start of the stream. However, for newer streams we can completely forbid the use of 0 at the start.

I think the second approach makes sense... It will result in a slightly worse compression ratio (literally a few bytes for the entire stream), which I think we can live with.

Consider introducing this together with RDF 1.2: #59

Metadata

Metadata

Assignees

No one assigned

    Labels

    new protocol featureDiscussion about a new feature in the Jelly protocol

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions