Questions and remarks about supported types, missing types and named variables

Great idea! I've used tuples for a long time, and always wished there was a standard for representing them and querying them. Hopefully this could be the one :)

*I apologize for the flood, they are notes I took when implementing a parser for fql in C#/.NET*

I have a few remarks, coming from a long time user of tuples, mostly in the context of writing complex layers in .NET/C#, and not 100% familiar with go conventions.

- **Numbers**: Why the distinction between `int` and `uint`? Is this a "go" thing, or is there an actual reason?
  - For `-123` it is clear that it is not "uint", but what about `123`? From my point of view it is both "int" and "uint"
  - What about 32-bit vs 64-bit?
  - Comparatively, there is no distinction with `float` between 32-bit and 64-bit IEEE numbers?
  - How do you handle `NaN` or infinities?
 - **Uuids**: I find it difficult to parse a UUID that is not bounded by either `{...}` or any other character.
   - I frequently see uuids escaped as either `{xxxx-xxx...}` or `"xxxx-xxxx-..."`. (or maybe it is a Windows thing?)
   - Using `{...}` for uuids would make it non-ambiguous for the parser
 - **Bytes**: when parsing `0x1234`, you first see the `0` which could the be start of a number, a uuid, or a byte sequence.
   - I don't have a nice solution for this, but maybe use `[ 1234 ]` or `'\x12\x34'` (single quote, like it was in the python 2 tuple encoder) for bytes?
   - Are hexa digits always upper? lower? either? the syntax definition is a bit ambiguous
   - For me, `0xFF` and `0xFE` is ambiguous, because they are the byte prefix for the System subspace and Directory Layer, which are a single byte, where here I think they would be encoded as `01 FF 00` and `01 FE 00` instead?
 - **Strings**: How do you handle escaping of unicode characters? What about unicode text like `"こんにちは世界"` , high/low surrogates? 
   - Could we define `\uXXXX` to encode any codepoint ?
 - **Tuples**: does `(...)` means "any tuple, empy or not? For ex, in `(1, <int>, (...), <int>)`, does the middle part means "any tuple" ?
 - **Ends With**: is `("hello", 123, ..., <int>)` supported? This would help with "variable sized" tuples in some layers, where you still need to parse the last one or two parts of a tuple.

There are a few types that I use frequently in tuples, and that are missing:
- **Empty**: most indexes have empty values. In the old python binding, it used `''` (two single quotes) to define them. What would we use here? `nil` seems weird because it is different (for me) than the concept of "empy". Maybe add `<empty>` or `<none>` for values?
- **Version Stamps**: maybe add a new `stamp` or `versionstamp` or `vs` type in variables? ex: `(1, <stamp>, ...)`
  - There are two types of version stamps, 80-bits and 96-bits. They use prefix `0x32` and `0x33` in the tuple encoding, followed by 10 or 12 bytes.
  - I usually don't really distinguish between both sizes, they are all "stamps" for me.
- **Dates**: this is unfortunately not specified, but I usually represent times as days (float) since Unix Epoch
- **Durations**: same, I store them as seconds (float), but this may not be a standard representation.
- **Custom types**: The original spec had a way to define "custom types" for the application, which would have a custom header byte, followed by 0 or more bytes (custom to the app).
  - In practice, I only used them to define the `Directory` (0xFE) and `System` (0xFF) prefix, which are useful in tuples that have to query the system space, or inside the Directory Layer (each nested partition adds another 0xFE to the bytes).
  - It would help resolve the ambiguity in `(0xFE, "hello")` which for me reads as "the key 'hello' in the top-level Directory Layer" and encoded as `FE 02 h e l l o 00`, where I guess here it would be encoded as `01 FE 00 02 h e l l o 00` which is not the same.
- **System subspace**: how can I represent `xFF/metadataVersion` or other system keys? They usually don't use the tuple encoding for the keys.
- **Regex**/**Patterns**/**Constraints**: Would it make sense to be able to impose constraints on types? Like a regex on a string, a range on a number, a maximum/minimum/exact size for string/bytes?
  - could be useful for counters which are usually exactly or up to 4/8 bytes.
  - The `uint` vs `int` distinction could be emulated with a "must be positive" or "must be exactly 64-bits" constraint
  - Same for 80-bit vs 96-bit version stamps

Regarding directories:
- What if I use partitions/sub-partitions ? This is a way to "lock" an application into a specific prefix (ie: if "/foo" is a partition with prefix `15 2A` (== `(42, )`), all keys will have this prefix, even sub-directories of this partition.
  - If the app is locked to sub-partition `/foo/bar`, then ALL keys would start with `/foo/bar/...` so in practice we represent them without the prefix, so something like `.../my/dir` (similar to a webapp that could be hosted under any path)
  - Since `...` means "zero or more" in the tuples, maybe use `./my/dir` or `~/my/dir` to represent "from the root defined for this application" ?
- Directory names are string, and I also use strings 99%+ of the time, but _technically_, the names can be any sequence of bytes, in order to represent numbers, uuids, etc... I'm not sure if this is frequently used in the field, but if it is, maybe we would also need variables in directory names? like `/foo/<string>/bar` or `/foo/<uuid>/bar` ?

On top of querying, I see this as very useful to encoded the "schema" of a layer somehow, so that a UI could automatically decode data into an arbitrary subspace (using the optional Layer Id in directories).

For example, I used the following format to define the schema of a custom layer, like a typical "table with index + change log" mini layer:
- `(..., <metadata_key>) = <metadata_value>`
- `(..., 0, <doc_id>) = <json_bytes>`
- `(..., 1, <index_id>, <value>, <doc_id>) = ''`
- `(..., 2, <index_id>, <value>) = <counter>`
- `(..., 3, <version_stamp>, <doc_id>) = <change_event>`

Legend:
- `...` means the prefix of the Directory where this layer is stored.
- `<name>` was a placeholder for a type of data, but the type was not specified

I think this could be adapted to use fql as the syntax, but this would required adding the support of *named* variables:
- either `<foo:int>` or `<int:foo>` would define `foo` to be a variable of type `int`
- would need a solution for "of any type", either `<foo:any>`/`<any:foo>` or `<foo:>`/`<:foo>`
- for counters (atomic add/increment), we need to specify a size, usually 32 or 64 bits: `<uint32>`/`<uint64>` ? `<uint:32>`/`<uint:64>` ? `<uint,32>`/`<uint,64>` ?

The above could become:
- `~/(<string:metadata_key>) = <any:metadata_value>`
- `~/(0, <uint:index_id>, <uint:doc_id>) = <bytes:json>`
- `~/(1, <uint:index_id>, <int|string|bytes:value>, <uint:doc_id>) = <empty>`
- `~/(2, <uint:index_id>, <int|string|bytes:value>) = <uint64>`
- `~/(3, <stamp:timestamp>, <uint:doc_id>) = <bytes:delta>`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions and remarks about supported types, missing types and named variables #213

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions and remarks about supported types, missing types and named variables #213

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions