-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Json extract #524
base: main
Are you sure you want to change the base?
Json extract #524
Conversation
I hoped this feature wouldn't drop me into a rabbit hole 😅 Unfortunately, there are some quirks in the JSON syntax, ex: -- "\x61" is an escaped ASCII character for 'a', but this doesn't match in neither sqlite nor postgres:
SELECT json_extract('{"\x61": 1}', 'a');
-- you need to use the exact sequence of characters:
SELECT json_extract('{"\x61": 1}', '\x61')
1
-- the other way around also wouldn't work:
SELECT json_extract('{"a": 1}', '\x61') |
core/json/json_path.rs
Outdated
|
||
#[derive(Parser)] | ||
#[grammar_inline = r#" | ||
array_locator = @{ "[" ~ array_index ~ "]" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to copy a large part of grammar from core/json/de.rs
. Perhaps I could extract the grammar from there into a constant and add additional rules for parsing the path? However, this would leave us with a few unused rules.
The other way to DRY it out would be to extract only the common rules, but I feel like it's brittle and very unhandy if we ever have to patch the JSON grammar.
i merged #504 so base is now main |
84f74cc
to
b401178
Compare
cc089af
to
332b831
Compare
332b831
to
2e730ea
Compare
@@ -9,137 +9,7 @@ use std::f64; | |||
use crate::json::error::{self, Error, Result}; | |||
|
|||
#[derive(Parser)] | |||
#[grammar_inline = r#" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to a separate file
6a5d800
to
eb5bbe6
Compare
@petersooley I saw your PR with I like the idea of a smaller, hand-rolled parser but I also think the JSON grammar is pretty complicated, even in terms of what you can do with the JSON path in SQLite (see some tests attached). Let me know your thoughts. |
0b639f4
to
d970d00
Compare
d970d00
to
692301e
Compare
SQLite seems to be changing some quirky behavior from version to version:
|
// see https://spec.json5.org/#syntactic-grammar and | ||
// https://spec.json5.org/#lexical-grammar | ||
|
||
COMMENT = _{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" | "//" ~ (!line_terminator ~ ANY)* } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this grammar hand-written based on the above spec or sourced from somewhere else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, this file was moved instead of created here, so you might not know
struct Parser; | ||
|
||
#[derive(Clone, Debug, PartialEq)] | ||
pub enum PathElement { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also comment these using triple-slash comments? Each enum variant / struct field can also have its own triple-slash comment. IDEs pick those up which is helpful
pub elements: Vec<PathElement>, | ||
} | ||
|
||
// Parses path into a Vec of Strings, where each string is a key or an array locator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Triple slash, please
), | ||
} | ||
} | ||
_ => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this branch be hit? doesn't the parser produce discreet unions where we can handle every possible union variant?
match pair.as_rule() { | ||
Rule::EOI => (), | ||
Rule::root => result.push(PathElement::Root()), | ||
Rule::json_path_key => result.push(PathElement::Key(pair.as_str().to_string())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these be made into pascal case? Rule::JsonPathKey
Rule::negative_index_indicator => { | ||
let negative_offset = array_locator.next().unwrap(); | ||
// TODO: sqlite is able to parse arbitrarily big numbers, but they | ||
// always get overflown and cast to i32. Handle this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason not to handle it in this PR?
SELECT json_extract(1, '$') | ||
} {{1}} | ||
|
||
# TODO: fix me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we plan to keep these commented out when merging this, please make the comments descriptive enough that anyone can pick them up and work on them
_ => { | ||
let val = &state.registers[*start_reg]; | ||
let reg_values = | ||
&state.registers[*start_reg + 1..*start_reg + arg_count]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this range should be ..=
instead of ..
consider:
// start_reg = 0
// arg_count = 2
let reg_values = &state.registers[1..2]; // <-- only picks one arg since range end is not inclusive
In fact there seem to be no tests for json_extract()
with multiple valid paths - should probably add some?
|
||
for path in paths { | ||
match path { | ||
OwnedValue::Text(p) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this would be simpler if json_extract_single
took an &OwnedValue
instead of str
so we wouldn't have to pattern match here at all.
let mut result = "".to_string();
if paths.len() > 1 {
result.push('[');
}
for path in paths.iter() {
let extracted = json_extract_single(&json, path);
if paths.len() == 1 && extracted == Val::Null {
return OwnedValue::Null;
}
result.push_str(&crate::json::to_string(&extracted).unwrap());
}
etc
} | ||
|
||
if paths.len() > 1 { | ||
result.pop(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment saying that this removes the final comma
@madejejej The path parsing is much simpler than the JSON parsing, for sure. It's also very limited in sqlite (i.e. no glob/wildcard patterns). It's mostly array indexes and object property paths with a few extra cases. What I like about the hand-rolled solution is that it doesn't separate out the path parsing from accessing the value in the JSON. That allows returning early as soon as the JSON value has no match for the path. No matter which way we go, there's always a loop required to drill into the JSON and extract a value at the end of the given path. Both solutions are doing that loop anyway, it's just that the hand-rolled solution is doing it during path parsing. |
Yeah this is a decent point, we'd save some deserialization overhead by traversing the JSON object on demand while parsing the path. Maybe not the most important thing in the world and can be optimized later, but it's also nice to implement things right the first time around. I wouldn't block this PR from going forward with the eagerly-parsed version so I'll let @madejejej decide |
This is currently a WIP and rebased on top of #504