Investigation: statement pattern compression

***Note:** This is a yet another feature that may not be a great idea.*

Investigate how and if to implement pattern-based compression.

In such a compression scheme, we'd have a dictionary of patterns, where each pattern is a sequence of RDF statements. These statements would have some terms defined, and some omitted (replaced with a variable). For example:

```
DefinePattern(
  id = 5,
  triples = (
    Triple(Iri1, Iri2, VAR)
    Triple(Iri3, VAR, VAR)
  )
)
```

Then, we could materialize the pattern in the stream like this:

```
UsePattern(
  id = 5,
  terms = (Literal1, Iri4, Literal2)
)
```

Which would be equivalent to:

```
Triple(Iri1, Iri2, Literal1)
Triple(Iri3, Iri4, Literal2)
```

This would be especially useful in scenarios where there are regular statement patterns that repeat often, like in IoT messages.

We could employ both a static (pre-shared) and a dynamic dictionary of these patterns, using an approach like in #36. Pre-shared pattern dictionaries together with pre-shared string dictionaries would be very powerful and would improve compression efficiency greatly in IoT scenarios.

The pre-shared patterns possibly could also contain stream options, so that we can minimize the amount of data transmitted to an absolute bare minimum.

### Downsides and alternatives

Employing such aggressive compression in all streams would be prohibitively expensive, slowing down serialization. Realistically, the only scenario where this is would be useful is places where we know the structure of the triples up-front, or can spend the time to analyze the data in-depth, like in some IoT applications.

I see two main alternatives:

- For general applications, diff-based compression (e.g., based on #11) would be way simpler to implement and more generalized.
- For optimized IoT scenarios, we could seriously look into making an off-spin version of Jelly that is specifically optimized for pre-shared dictionaries, pre-shared patterns, and super-efficient literal encoding. Something like S-HDT or RDF EXI (though I am not aware of any public implementation for either...). Question is – do we need this? Should Jelly try to serve this use case, or is there a better way? This seems to start to overlap a bit with the stated goals of CBOR-LD, and that's a very, very different beast.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigation: statement pattern compression #38

Downsides and alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigation: statement pattern compression #38

Description

Downsides and alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions