-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Note: This is a yet another feature that may not be a great idea.
Investigate how and if to implement pattern-based compression.
In such a compression scheme, we'd have a dictionary of patterns, where each pattern is a sequence of RDF statements. These statements would have some terms defined, and some omitted (replaced with a variable). For example:
DefinePattern(
id = 5,
triples = (
Triple(Iri1, Iri2, VAR)
Triple(Iri3, VAR, VAR)
)
)
Then, we could materialize the pattern in the stream like this:
UsePattern(
id = 5,
terms = (Literal1, Iri4, Literal2)
)
Which would be equivalent to:
Triple(Iri1, Iri2, Literal1)
Triple(Iri3, Iri4, Literal2)
This would be especially useful in scenarios where there are regular statement patterns that repeat often, like in IoT messages.
We could employ both a static (pre-shared) and a dynamic dictionary of these patterns, using an approach like in #36. Pre-shared pattern dictionaries together with pre-shared string dictionaries would be very powerful and would improve compression efficiency greatly in IoT scenarios.
The pre-shared patterns possibly could also contain stream options, so that we can minimize the amount of data transmitted to an absolute bare minimum.
Downsides and alternatives
Employing such aggressive compression in all streams would be prohibitively expensive, slowing down serialization. Realistically, the only scenario where this is would be useful is places where we know the structure of the triples up-front, or can spend the time to analyze the data in-depth, like in some IoT applications.
I see two main alternatives:
- For general applications, diff-based compression (e.g., based on Proposal: Jelly extension for RDF Patch #11) would be way simpler to implement and more generalized.
- For optimized IoT scenarios, we could seriously look into making an off-spin version of Jelly that is specifically optimized for pre-shared dictionaries, pre-shared patterns, and super-efficient literal encoding. Something like S-HDT or RDF EXI (though I am not aware of any public implementation for either...). Question is – do we need this? Should Jelly try to serve this use case, or is there a better way? This seems to start to overlap a bit with the stated goals of CBOR-LD, and that's a very, very different beast.