-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the ability to parse/unparse files #127
Conversation
Signed-off-by: Andrew Lamb <[email protected]>
255d512
to
f33545b
Compare
Signed-off-by: Andrew Lamb <[email protected]>
f33545b
to
9924a96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @xudong963
sqllogictest/src/parser.rs
Outdated
// the statement parser includes a newline between the items but the display | ||
// output does not, so add it here | ||
// Not sure about this one | ||
format!("{}\n", record) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this to be necessary to recreate the original .slt
file.
I think it would more naturally go in the impl Display
itself, however I didn't want to mess up any of your existing tests. What do you think @xxchan ?
Here is what the output looks like without this clause:
Click me
running 2 tests
test parser::tests::test_include_glob ... ok
test parser::tests::test_basic ... FAILED
failures:
---- parser::tests::test_basic stdout ----
*********
diff:
*********
statement ok
create table t(v1 int not null, v2 int not null, v3 int not null)
-
statement ok
insert into t values(1,4,2), (2,3,3), (3,4,4), (4,3,5)
-
query I
select * from example_basic
----
Alice
Bob
Eve
-
statement ok
drop table t
-
# The error message is: Hey you got FakeDBError!
statement error
give me an error
-
statement error FakeDB
give me an error
-
statement error (?i)fakedb
give me an error
-
statement error Hey you
give me an error
-
# statement is expected to fail with error: "Hey we", but got error: "Hey you got FakeDBError!"
# statement error Hey we
# give me an error
# query error is actually equivalent to statement error
query error Hey you
give me an error
-
hash-threshold 3
query I
select * from example_basic
----
Alice
Bob
Eve
-
hash-threshold 1
query I
select * from example_basic
----
3 values hashing to b5b44edac84d34d6af3be2a88bfae352
-
hash-threshold 0
query I
select * from example_basic
----
Alice
Bob
Eve
-
*********
output:
*********
statement ok
create table t(v1 int not null, v2 int not null, v3 int not null)
statement ok
insert into t values(1,4,2), (2,3,3), (3,4,4), (4,3,5)
query I
select * from example_basic
----
Alice
Bob
Eve
statement ok
drop table t
# The error message is: Hey you got FakeDBError!
statement error
give me an error
statement error FakeDB
give me an error
statement error (?i)fakedb
give me an error
statement error Hey you
give me an error
# statement is expected to fail with error: "Hey we", but got error: "Hey you got FakeDBError!"
# statement error Hey we
# give me an error
# query error is actually equivalent to statement error
query error Hey you
give me an error
hash-threshold 3
query I
select * from example_basic
----
Alice
Bob
Eve
hash-threshold 1
query I
select * from example_basic
----
3 values hashing to b5b44edac84d34d6af3be2a88bfae352
hash-threshold 0
query I
select * from example_basic
----
Alice
Bob
Eve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I understand the problem wrongly here.
I am aware of the trailing newline problem. AFAIK usually Display
doesn't contain trailing newline so that users can control it. e.g., println
won't contain an additional newline. So I decided to format Display
without trailing newline, and in sqllogictest-bin
, I append newline after a record, and insert newline between records.
sqllogictest-rs/sqllogictest-bin/src/lib.rs
Lines 628 to 632 in d753e4c
if format { | |
record.unparse(outfile)?; | |
writeln!(outfile)?; | |
return Ok(()); | |
} |
sqllogictest-rs/sqllogictest-bin/src/lib.rs
Lines 587 to 594 in d753e4c
if !*first_record { | |
writeln!(outfile)?; | |
} | |
update_record(outfile, &mut runner, record, format) | |
.await | |
.context(format!("failed to run `{}`", style(filename).bold()))?; | |
*first_record = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW why it doesn't work to add newline to every record here in record_to_string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it is because newline is consumed when parsing Statement
and Query
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I think it may be reasonable to add trailing newline in Display
for Query
and Statement
.. Since empty line is their terminator. Other records doesn't require empty lines in between.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it is because newline is consumed when parsing Statement and Query 🤔
Yes, I think this is the reason.
Logically a Statement
and Query
must always be followed by a blank line (that newline isn't the equivalent of println
). Much like a multi line comment contains a "\n" within it even though there impl Display
Or put another way, the following would not be parsed correctly.
query II
select a, b from foo
---
1 2
query II
select a, b from bar
---
3 4
Query
must always end with a newline (other than the last one in the file):
query II
select a, b from foo
---
1 2
select a, b from bar
---
3 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW why it doesn't work to add newline to every record here in record_to_string?
You if you do that, then there are extra newlines between every record (as I am already using writeln
)
sqllogictest/src/parser.rs
Outdated
// the statement parser includes a newline between the items but the display | ||
// output does not, so add it here | ||
// Not sure about this one | ||
format!("{}\n", record) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I understand the problem wrongly here.
I am aware of the trailing newline problem. AFAIK usually Display
doesn't contain trailing newline so that users can control it. e.g., println
won't contain an additional newline. So I decided to format Display
without trailing newline, and in sqllogictest-bin
, I append newline after a record, and insert newline between records.
sqllogictest-rs/sqllogictest-bin/src/lib.rs
Lines 628 to 632 in d753e4c
if format { | |
record.unparse(outfile)?; | |
writeln!(outfile)?; | |
return Ok(()); | |
} |
sqllogictest-rs/sqllogictest-bin/src/lib.rs
Lines 587 to 594 in d753e4c
if !*first_record { | |
writeln!(outfile)?; | |
} | |
update_record(outfile, &mut runner, record, format) | |
.await | |
.context(format!("failed to run `{}`", style(filename).bold()))?; | |
*first_record = false; |
/// Parses the specified file into Records, and ensures the | ||
/// results of unparsing them are the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess current implementation is still not enough to ensure unparse(parse())
is identity
,
because spaces can be anywhere in the records, not only between records.
e.g.,
# aa
query II
...
Currently it will be unparsed to
# aa
query II
But do we really need identical reconstruction (It can be hard)? If not, maybe we can just test parse(unparse)
is the same record?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But preserving newlines makes some sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we really need identical reconstruction (It can be hard)? If not, maybe we can just test parse(unparse) is the same record?
I don't really need exact reconstruction -- what I really want is to be able to "update" a file in place with expected output. If part of that update also ends up "normalizing" some of the commands (like collapsing query III
query III
) that would be fine with me
However I think removing all blank newlines makes the files harder to read so I would prefer to keep them in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not, maybe we can just test parse(unparse) is the same record?
Testing parse/unparse is a great idea -- I will add that
sqllogictest/src/parser.rs
Outdated
// the statement parser includes a newline between the items but the display | ||
// output does not, so add it here | ||
// Not sure about this one | ||
format!("{}\n", record) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW why it doesn't work to add newline to every record here in record_to_string
?
sqllogictest/src/parser.rs
Outdated
@@ -132,6 +132,7 @@ pub enum Record { | |||
}, | |||
Condition(Condition), | |||
Comment(Vec<String>), | |||
Whitespace(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the comment below, what about renaming it to Newline(usize)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To summarize, I think we can:
- Add trailing newline to
Display
forStatement
andQuery
- Rename
Whitespace
toNewline
- Test
parse(unparse(parse(..)))
instead ofunparse(parse(..))
Others LGTM. Thanks a lot!
I tried to add a test that However, it turns out that +
+ // The orignal and parsed records should be logically requivalent
+ let output_file = tempfile::NamedTempFile::new()
+ .expect("Error creating tempfile");
+ output_file.write_all(output_contents.as_bytes())
+ .expect("Unable to write file");
+
+ let output_path = output_file.into_temp_path();
+ let reparsed_records = parse(&output_path.to_string_lossy());
+ assert_eq!(records, reparsed_records);
+ |
Signed-off-by: Andrew Lamb <[email protected]>
Signed-off-by: Andrew Lamb <[email protected]>
2656e97
to
44ee05a
Compare
Thank you @xxchan !
Done
Done
I think there is value to testing I would like to add the test for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Here is the proposed mechanism for testing |
First of all, thank you again for this library -- it is really awesome.
Rationale
I am trying to implement "test script completion mode" in DataFusion. In order to do so, I would like to be able to update parsed
Records
with new output and then write them back to a file. Ideally the only changes in the file would be the new results and the original file will have all the comments, whitespace, etc.DataFusion ticket: apache/datafusion#4570, apache/datafusion#4570 (comment)
Changes
impl Display
for Record (refactorunparse
)Record::Whitespace
so the whitespace in the original files can be reconstructed (found during testing)Questions
Record::Statement
andRecordQuery
which I will note inline