Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explored basic sqlite jsonb support #719

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

seridescent
Copy link

i took a shot at supporting sqlite's JSONB format in the simplest manner possible: replace the databendlabs JSONB crate with a different crate that appeared to implement SQLITE's JSONB format.

i also wrote the small amount of scaffolding needed to implement and start testing sqlite's jsonb function.

turns out, this only kind of works. the tests i wrote revealed a deviation from sqlite: sqlite uses the "TEXT" type for the test input's key, but the crate being used uses the "TEXTRAW" type.

since @madejejej is also working on JSONB in #710 (unfortunately only discovered after writing this) and resolving this deviation would not be trivial for me right now, i don't think any of this will get merged.

however, in the interest of documenting approaches and providing some very basic tests, i wanted to put this up and have it closed. hopefully it will be somewhat useful without being much of a nuisance.

without it, `cargo test` fails with the following error when the python3 framework isn't available
```
dyld[58788]: Library not loaded: @rpath/Python3.framework/Versions/3.9/Python3
  Referenced from: <5CA24EEF-B406-3A8A-B845-2C4A3737ED55> /Users/nicholast/projects/limbo/target/debug/deps/_limbo-89f6cdf4f3551f5a
  Reason: tried: '/Library/Frameworks/Python3.framework/Versions/3.9/Python3' (no such file), '/System/Library/Frameworks/Python3.framework/Versions/3.9/Python3' (no such file, not in dyld cache)
error: test failed, to rerun pass `-p py-limbo --lib`
```

admittedly, this might not be the minimal solution
new tests reveal a deviation from sqlite behavior. sqlite uses the "TEXT" type for the test input's key, but the crate being used uses the "TEXTRAW" type.
@madejejej
Copy link
Contributor

Hey @seridescent, I only started working on the JSONB support. It might be a good idea to explore pros and cons of writing Limbo-specific code and using a crate.

I haven't even thought about searching for a crate, since this seems so niche 😅

I think an important consideration is the ability to implement JSONB-specific algorithms.

SQLite always uses the JSONB representation internally (for both JSON and JSONB) and the JSONB-specific algorithms seem more efficient because they do not allocate the whole JSON structure as a graph of objects internally. Instead, JSONB is kept as an array of bytes. For example, the jsonbArrayCount only iterates through the bytes array:

/*
** Given that a JSONB_ARRAY object starts at offset i, return
** the number of entries in that array.
*/
static u32 jsonbArrayCount(JsonParse *pParse, u32 iRoot){
  u32 n, sz, i, iEnd;
  u32 k = 0;
  n = jsonbPayloadSize(pParse, iRoot, &sz);
  iEnd = iRoot+n+sz;
  for(i=iRoot+n; n>0 && i<iEnd; i+=sz+n, k++){
    n = jsonbPayloadSize(pParse, i, &sz);
  }
  return k;
}

If we used a crate, it would need to expose the internal representation or implement all of SQLite's JSON functions.

@penberg @jussisaurio any thoughts on this?

@seridescent
Copy link
Author

seridescent commented Jan 17, 2025

correct me if i'm wrong, but your mention of how sqlite works with JSONB as bytes sounds like a performance vs. usability concern.

my understanding is that this crate happens to provide a significant set of the functionality for bytes -> useful data structure for implementation and vice versa.

one could avoid that bytes -> internal structure cost otherwise and work purely with bytes like sqlite does (with a variety of helping subroutines it looks like). however, if the serialization and deserialization cost between JSONB bytes and "Rust struct" doesn't end up that high, i think there's value in working with the existing Val struct instead of likely less ergonomic code that works directly on bytes.

i don't feel qualified to speak to that performance cost without doing any actual measurements, but the listed benchmarks suggest to me that the cost doesn't grow that much with N and might not be that big of a deal. seems worth exploring, but i suspect i would need help to do that measurement in a timely fashion.

other interesting places to look for inspiration might include the source of the databendlabs/jsonb crate?

again, correct me if i'm mistaken, but it sounds like another concern is with the use of a crate that isn't part of limbo.

wrt that and this draft, i'm not attached to the crate or anything; even if this was the approach taken, i think limbo would want to have its own sqlite JSONB serde implementation to work with. the crate's code is MIT licensed so i believe it could be pulled into the limbo codebase as a (possibly) useful starting point. there is a deviation present after all :')

that's my 2c, there could totally be something im misunderstanding though so id love to be corrected. like i commented in your PR, interested in helping out in other ways too.

anyway, i wouldn't take this draft too seriously, it was just the result of me exploring a tiny bit out of curiosity. like i said, i figured documenting an approach that had some issues could be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants