Currently, if you simply concatenated two Jelly files, you would most likely get out corrupted data. Jelly files start their lookup IDs and references from 0, which is lastId + 1 in most cases. When starting a stream, 0 means 0 + 1. However, if we just read a stream that set the lookup IDs to something (which is ALWAYS the case), the 0 will mean some completely different number. This is the only reason that makes impossible to concat Jelly files, you have to use transcoding to get around this.
While transcoding works fine, it's not always what's needed in the use case. It would be nice if we could introduce periodic checkpoints (e.g., in Kafka or in seekable large files). However, this is not possible in the current setup, unless we modify the serializer to always start with ID 1, instead of 0.
I see two potential solutions here.
(1) Force reset lookup IDs after an options row. This would work, but would be a backwards-incompatible change. Currently, we allow the options to reoccur as many times as we want in the stream, and they have absolutely no effect on the interpretation of lookups.
(2) Tell serializers to always start with an explicit ID: 1. This would be backwards-compatible, if we kept a part of the spec telling us how to interpret 0 at the start of the stream. However, for newer streams we can completely forbid the use of 0 at the start.
I think the second approach makes sense... It will result in a slightly worse compression ratio (literally a few bytes for the entire stream), which I think we can live with.
Consider introducing this together with RDF 1.2: #59
Currently, if you simply concatenated two Jelly files, you would most likely get out corrupted data. Jelly files start their lookup IDs and references from
0, which islastId + 1in most cases. When starting a stream,0means0 + 1. However, if we just read a stream that set the lookup IDs to something (which is ALWAYS the case), the0will mean some completely different number. This is the only reason that makes impossible to concat Jelly files, you have to use transcoding to get around this.While transcoding works fine, it's not always what's needed in the use case. It would be nice if we could introduce periodic checkpoints (e.g., in Kafka or in seekable large files). However, this is not possible in the current setup, unless we modify the serializer to always start with ID
1, instead of0.I see two potential solutions here.
(1) Force reset lookup IDs after an options row. This would work, but would be a backwards-incompatible change. Currently, we allow the options to reoccur as many times as we want in the stream, and they have absolutely no effect on the interpretation of lookups.
(2) Tell serializers to always start with an explicit ID:
1. This would be backwards-compatible, if we kept a part of the spec telling us how to interpret0at the start of the stream. However, for newer streams we can completely forbid the use of0at the start.I think the second approach makes sense... It will result in a slightly worse compression ratio (literally a few bytes for the entire stream), which I think we can live with.
Consider introducing this together with RDF 1.2: #59