Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the ability to parse/unparse files #127

Merged
merged 5 commits into from
Dec 10, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 114 additions & 29 deletions sqllogictest/src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ pub enum Record {
},
Condition(Condition),
Comment(Vec<String>),
Whitespace(String),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comment below, what about renaming it to Newline(usize)?

/// Internally injected record which should not occur in the test file.
Injected(Injected),
}
Expand All @@ -142,9 +143,15 @@ impl Record {
/// # Panics
/// If the record is an internally injected record which should not occur in the test file.
pub fn unparse(&self, w: &mut impl std::io::Write) -> std::io::Result<()> {
write!(w, "{}", self)
}
}

impl std::fmt::Display for Record {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
Record::Include { loc: _, filename } => {
write!(w, "include {}", filename)
write!(f, "include {}", filename)
}
Record::Statement {
loc: _,
Expand All @@ -153,21 +160,21 @@ impl Record {
sql,
expected_count,
} => {
write!(w, "statement ")?;
write!(f, "statement ")?;
match (expected_count, expected_error) {
(None, None) => write!(w, "ok")?,
(None, None) => write!(f, "ok")?,
(None, Some(err)) => {
if err.as_str().is_empty() {
write!(w, "error")?;
write!(f, "error")?;
} else {
write!(w, "error {}", err)?;
write!(f, "error {}", err)?;
}
}
(Some(cnt), None) => write!(w, "count {}", cnt)?,
(Some(cnt), None) => write!(f, "count {}", cnt)?,
(Some(_), Some(_)) => unreachable!(),
}
writeln!(w)?;
write!(w, "{}", sql)
writeln!(f)?;
write!(f, "{}", sql)
}
Record::Query {
loc: _,
Expand All @@ -179,63 +186,66 @@ impl Record {
sql,
expected_results,
} => {
write!(w, "query")?;
write!(f, "query")?;
if let Some(err) = expected_error {
writeln!(w, " error {}", err)?;
return write!(w, "{}", sql);
writeln!(f, " error {}", err)?;
return write!(f, "{}", sql);
}

write!(
w,
f,
" {}",
type_string.iter().map(|c| format!("{c}")).join("")
)?;
if let Some(sort_mode) = sort_mode {
write!(w, " {}", sort_mode.as_str())?;
write!(f, " {}", sort_mode.as_str())?;
}
if let Some(label) = label {
write!(w, " {}", label)?;
write!(f, " {}", label)?;
}
writeln!(w)?;
writeln!(w, "{}", sql)?;
writeln!(f)?;
writeln!(f, "{}", sql)?;

write!(w, "----")?;
write!(f, "----")?;
for result in expected_results {
write!(w, "\n{}", result)?;
write!(f, "\n{}", result)?;
}
Ok(())
}
Record::Sleep { loc: _, duration } => {
write!(w, "sleep {}", humantime::format_duration(*duration))
write!(f, "sleep {}", humantime::format_duration(*duration))
}
Record::Subtest { loc: _, name } => {
write!(w, "subtest {}", name)
write!(f, "subtest {}", name)
}
Record::Halt { loc: _ } => {
write!(w, "halt")
write!(f, "halt")
}
Record::Control(c) => match c {
Control::SortMode(m) => write!(w, "control sortmode {}", m.as_str()),
Control::SortMode(m) => write!(f, "control sortmode {}", m.as_str()),
},
Record::Condition(cond) => match cond {
Condition::OnlyIf { engine_name } => {
write!(w, "onlyif {}", engine_name)
write!(f, "onlyif {}", engine_name)
}
Condition::SkipIf { engine_name } => {
write!(w, "skipif {}", engine_name)
write!(f, "skipif {}", engine_name)
}
},
Record::HashThreshold { loc: _, threshold } => {
write!(w, "hash-threshold {}", threshold)
write!(f, "hash-threshold {}", threshold)
}
Record::Comment(comment) => {
let mut iter = comment.iter();
write!(w, "#{}", iter.next().unwrap().trim_end())?;
write!(f, "#{}", iter.next().unwrap().trim_end())?;
for line in iter {
write!(w, "\n#{}", line.trim_end())?;
write!(f, "\n#{}", line.trim_end())?;
}
Ok(())
}
Record::Whitespace(w) => {
write!(f, "{}", w)
}
Record::Injected(p) => panic!("unexpected injected record: {:?}", p),
}
}
Expand Down Expand Up @@ -391,6 +401,7 @@ fn parse_inner(loc: &Location, script: &str) -> Result<Vec<Record>, ParseError>
}

if line.is_empty() {
records.push(Record::Whitespace(line.to_string()));
continue;
}

Expand Down Expand Up @@ -603,11 +614,85 @@ fn parse_file_inner(loc: Location) -> Result<Vec<Record>, ParseError> {

#[cfg(test)]
mod tests {
use crate::parse_file;
use difference::{Changeset, Difference};

use super::*;

#[test]
fn test_include_glob() {
let records = parse_file("../examples/include/include_1.slt").unwrap();
assert_eq!(14, records.len());
assert_eq!(16, records.len());
}

#[test]
fn test_basic() {
parse_roundtrip("../examples/basic/basic.slt")
}

/// Parses the specified file into Records, and ensures the
/// results of unparsing them are the same
Comment on lines +635 to +636
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess current implementation is still not enough to ensure unparse(parse()) is identity,
because spaces can be anywhere in the records, not only between records.

e.g.,

    # aa
  query    II
...

Currently it will be unparsed to

# aa
query II

But do we really need identical reconstruction (It can be hard)? If not, maybe we can just test parse(unparse) is the same record?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But preserving newlines makes some sense.

Copy link
Contributor Author

@alamb alamb Dec 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we really need identical reconstruction (It can be hard)? If not, maybe we can just test parse(unparse) is the same record?

I don't really need exact reconstruction -- what I really want is to be able to "update" a file in place with expected output. If part of that update also ends up "normalizing" some of the commands (like collapsing query III query III) that would be fine with me

However I think removing all blank newlines makes the files harder to read so I would prefer to keep them in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, maybe we can just test parse(unparse) is the same record?

Testing parse/unparse is a great idea -- I will add that

///
/// Prints a hopefully useful message on failure
fn parse_roundtrip(filename: impl AsRef<Path>) {
let filename = filename.as_ref();
let input_contents = std::fs::read_to_string(filename).expect("reading file");

let records = parse_file(filename).expect("parsing to complete");

let unparsed = records
.into_iter()
.map(record_to_string)
.collect::<Vec<_>>();

let output_contents = unparsed.join("\n");

let changeset = Changeset::new(&input_contents, &output_contents, "\n");

assert!(
no_diffs(&changeset),
"Mismatch for {:?}\n\
*********\n\
diff:\n\
*********\n\
{}\n\n\
*********\n\
output:\n\
*********\n\
{}\n\n",
filename,
UsefulDiffDisplay(&changeset),
output_contents,
);
}

fn record_to_string(record: Record) -> String {
if matches!(&record, Record::Statement { .. } | Record::Query { .. }) {
// the statement parser includes a newline between the items but the display
// output does not, so add it here
// Not sure about this one
format!("{}\n", record)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this to be necessary to recreate the original .slt file.

I think it would more naturally go in the impl Display itself, however I didn't want to mess up any of your existing tests. What do you think @xxchan ?

Here is what the output looks like without this clause:

Click me
running 2 tests
test parser::tests::test_include_glob ... ok
test parser::tests::test_basic ... FAILED

failures:

---- parser::tests::test_basic stdout ----

*********
diff:
*********
statement ok
create table t(v1 int not null, v2 int not null, v3 int not null)
-   
statement ok
insert into t values(1,4,2), (2,3,3), (3,4,4), (4,3,5)

-   
query I
select * from example_basic
----
Alice
Bob
Eve
-   
statement ok
drop table t
-   
# The error message is: Hey you got FakeDBError!

statement error
give me an error
-   
statement error FakeDB
give me an error
-   
statement error (?i)fakedb
give me an error
-   
statement error Hey you
give me an error
-   
# statement is expected to fail with error: "Hey we", but got error: "Hey you got FakeDBError!"
# statement error Hey we
# give me an error

# query error is actually equivalent to statement error
query error Hey you
give me an error
-   
hash-threshold 3

query I
select * from example_basic
----
Alice
Bob
Eve
-   
hash-threshold 1

query I
select * from example_basic
----
3 values hashing to b5b44edac84d34d6af3be2a88bfae352
-   
hash-threshold 0

query I
select * from example_basic
----
Alice
Bob
Eve
-   


*********
output:
*********
statement ok
create table t(v1 int not null, v2 int not null, v3 int not null)
statement ok
insert into t values(1,4,2), (2,3,3), (3,4,4), (4,3,5)

query I
select * from example_basic
----
Alice
Bob
Eve
statement ok
drop table t
# The error message is: Hey you got FakeDBError!

statement error
give me an error
statement error FakeDB
give me an error
statement error (?i)fakedb
give me an error
statement error Hey you
give me an error
# statement is expected to fail with error: "Hey we", but got error: "Hey you got FakeDBError!"
# statement error Hey we
# give me an error

# query error is actually equivalent to statement error
query error Hey you
give me an error
hash-threshold 3

query I
select * from example_basic
----
Alice
Bob
Eve
hash-threshold 1

query I
select * from example_basic
----
3 values hashing to b5b44edac84d34d6af3be2a88bfae352
hash-threshold 0

query I
select * from example_basic
----
Alice
Bob
Eve

Copy link
Member

@xxchan xxchan Dec 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I understand the problem wrongly here.


I am aware of the trailing newline problem. AFAIK usually Display doesn't contain trailing newline so that users can control it. e.g., println won't contain an additional newline. So I decided to format Display without trailing newline, and in sqllogictest-bin, I append newline after a record, and insert newline between records.

if format {
record.unparse(outfile)?;
writeln!(outfile)?;
return Ok(());
}

if !*first_record {
writeln!(outfile)?;
}
update_record(outfile, &mut runner, record, format)
.await
.context(format!("failed to run `{}`", style(filename).bold()))?;
*first_record = false;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why it doesn't work to add newline to every record here in record_to_string?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it is because newline is consumed when parsing Statement and Query 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think it may be reasonable to add trailing newline in Display for Query and Statement.. Since empty line is their terminator. Other records doesn't require empty lines in between.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it is because newline is consumed when parsing Statement and Query 🤔

Yes, I think this is the reason.

Logically a Statement and Query must always be followed by a blank line (that newline isn't the equivalent of println). Much like a multi line comment contains a "\n" within it even though there impl Display

Or put another way, the following would not be parsed correctly.

query II
select a, b from foo
---
1 2
query II
select a, b from bar
---
3 4

Query must always end with a newline (other than the last one in the file):

query II
select a, b from foo
---
1 2

select a, b from bar
---
3 4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why it doesn't work to add newline to every record here in record_to_string?

You if you do that, then there are extra newlines between every record (as I am already using writeln)

} else {
record.to_string()
}
}

/// returns true if there are no differences in the changeset
fn no_diffs(changeset: &Changeset) -> bool {
changeset
.diffs
.iter()
.all(|diff| matches!(diff, Difference::Same(_)))
}

struct UsefulDiffDisplay<'a>(&'a Changeset);

impl<'a> std::fmt::Display for UsefulDiffDisplay<'a> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
self.0.diffs.iter().try_for_each(|diff| match diff {
Difference::Same(x) => writeln!(f, "{x}"),
Difference::Add(x) => writeln!(f, "+ {x}"),
Difference::Rem(x) => writeln!(f, "- {x}"),
})
}
}
}
1 change: 1 addition & 0 deletions sqllogictest/src/runner.rs
Original file line number Diff line number Diff line change
Expand Up @@ -546,6 +546,7 @@ impl<D: AsyncDB> Runner<D> {
}
Record::Include { .. }
| Record::Comment(_)
| Record::Whitespace(_)
| Record::Subtest { .. }
| Record::Halt { .. }
| Record::Injected(_)
Expand Down