Introduce location tracking in the tokenizer and parser #710

ankrgyl · 2022-11-15T17:16:35Z

This revives PR #288 and #514, rebased against the latest and done with fewer variable renames (limiting merge conflicts).

Closes #524

ankrgyl · 2022-11-15T17:19:44Z

@alamb @AugustoFKL this PR addresses #524.

This change is reasonably contained, but one thing that could make it even smaller would be maintaining the interface (i.e.next_token() returns a Token not a TokenWithLocation) and maybe adding something like last_location() to access the location just when you need it (e.g. for error messages).

alamb · 2022-11-16T20:14:31Z

Thanks @ankrgyl -- I'll try to look at this PR more carefully later this week

coveralls · 2022-11-16T20:18:45Z

Pull Request Test Coverage Report for Build 3623431070

203 of 269 (75.46%) changed or added relevant lines in 3 files are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage decreased (-0.08%) to 86.309%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/tokenizer.rs	69	77	89.61%
src/parser.rs	133	191	69.63%

Files with Coverage Reduction	New Missed Lines	%
src/parser.rs	1	83.17%

Totals
Change from base Build 3623318513:	-0.08%
Covered Lines:	12532
Relevant Lines:	14520

💛 - Coveralls

AugustoFKL · 2022-11-16T23:00:26Z

@ankrgyl @alamb sorry for the delay. This has been a hectic week for me. I'm on vacation next week, and will be able to review this properly

alamb

This PR looks great @ankrgyl -- thank you for proposing it; I apologize for the delay in finding time to review it

I think we can avoid many of the changes with an impl PartialEq as I mentioned

Otherwise, all this PR needs is some tests and I think it would be ready to merge.

Thanks again!

alamb · 2022-11-30T17:21:45Z

src/tokenizer.rs

+    pub column: u64,
+}
+
+/// A [Token] with [Location] attached to it


We can probably avoid many of the changes in this PR by implementing PartialEq between Token and TokenWithLocation:

impl PartialEq<Token>` for TokenWithLocation

Which would allow things like

if self.peek_nth_token(1).token == Token::RArrow {

to keep working

alamb · 2022-11-30T17:22:32Z

src/tokenizer.rs

@@ -321,58 +356,88 @@ impl fmt::Display for TokenizerError {
 #[cfg(feature = "std")]
 impl std::error::Error for TokenizerError {}

+struct State<'a> {


ankrgyl · 2022-12-01T06:41:07Z

@alamb updated with the PartialEq suggestion (great idea!). For tests -- do you have a suggestion on what would be best? I can tweak a few of the tests to tokenize w/ location and hardcode them in, or do something else?

alamb · 2022-12-01T14:34:21Z

For tests -- do you have a suggestion on what would be best? I can tweak a few of the tests to tokenize w/ location and hardcode them in, or do something else?

I think the goal should be that if we accidentally break the location parsing code some tests will fail. I think ensuring the locations are set on parse errors would be particularly helpful.

So perhaps you could add a few tests that assert locations of parsed tokens, and then a test that has a parsing error that retrieves the location of the error (if this last test needs more code changes, we could do it as a follow on PR)

Thank you so much for pushing this along @ankrgyl -- this is going to be a great addition (and is a long asked for feature)

ankrgyl · 2022-12-01T22:13:47Z

Added a few tests. We'll need to thread parser locations through as a follow up PR, but I added a test that fails in the parser due to a tokenizer error (and asserts the location) + a test that we can fill in with locations once the parser carries along that info.

alamb

I think this looks great -- thank you @ankrgyl

Would you like a chance to review @AugustoFKL ?

alamb · 2022-12-02T13:47:39Z

src/parser.rs

+        assert_eq!(
+            ast,
+            Err(ParserError::TokenizerError(
+                "Unterminated string literal at Line: 1, Column 5".to_string()


alamb · 2022-12-02T13:51:33Z

src/parser.rs

    /// The index of the first unprocessed token in `self.tokens`
    index: usize,
    dialect: &'a dyn Dialect,
 }

 impl<'a> Parser<'a> {
    /// Parse the specified tokens
+    /// To avoid breaking backwards compatibility, this function accepts


alamb · 2022-12-02T13:53:56Z

This appears to have a logical conflict with master.

I am planning on cutting a release soon -- maybe given the nature of this change I will make release right before this change, and then merge this one. That way people who use the tip of this repository can effectively beta test the changes before we make a release for everyone 🤔

ankrgyl · 2022-12-02T15:19:58Z

That seems wise to me. We may even want to flesh out the (potentially breaking) parser changes so that users who upgrade only need to adjust to breakages once. Happy to contribute those and collaborate with you on that.

ankrgyl · 2022-12-02T16:00:05Z

Additionally, I just rebased and updated the merge conflicts.

alamb · 2022-12-02T21:10:27Z

That seems wise to me. We may even want to flesh out the (potentially breaking) parser changes so that users who upgrade only need to adjust to breakages once. Happy to contribute those and collaborate with you on that.

That would be awesome. I plan to make a release soon (TM) -- either later today or over the weekend sometime. Then let's merge this one in and iterate.

alamb · 2022-12-05T19:44:25Z

@ankrgyl I took the liberty of fixing the clippy CI error and merging up from main to this branch.

alamb · 2022-12-05T19:47:43Z

That seems wise to me. We may even want to flesh out the (potentially breaking) parser changes so that users who upgrade only need to adjust to breakages once. Happy to contribute those and collaborate with you on that.

This is merged and should be included in the 0.29.0 release (eta in approximately 1 months time) I would love to start collaborating to fixup the parser. The first step might be to list / sketch out the changes that are needed.

Thanks again -- this is super exciting!

ankrgyl mentioned this pull request Nov 15, 2022

Add support of locations in AST #524

Closed

alamb reviewed Nov 30, 2022

View reviewed changes

alamb mentioned this pull request Nov 30, 2022

feat: add method to get current parsing index #728

Merged

ankrgyl force-pushed the with-locations branch from 5736020 to 414f261 Compare December 1, 2022 06:31

alamb approved these changes Dec 2, 2022

View reviewed changes

ankrgyl added 5 commits December 2, 2022 07:53

Add locations

8181307

Add PartialEq

e506be8

Add PartialEq

de2b69b

Add some tests

fc33e0f

Fix rebase conflicts

f367508

ankrgyl force-pushed the with-locations branch from af6c400 to f367508 Compare December 2, 2022 15:54

AugustoFKL approved these changes Dec 2, 2022

View reviewed changes

alamb added 2 commits December 5, 2022 14:41

Merge remote-tracking branch 'upstream/main' into with-locations

22b99c1

Fix clippy

4782636

alamb merged commit 813f4a2 into apache:main Dec 5, 2022

ankrgyl mentioned this pull request Dec 12, 2022

Plumb location info through the parser #757

Closed

alamb mentioned this pull request Dec 14, 2022

Avoid stack overflows via configurable with_recursion_limit #764

Merged

Introduce location tracking in the tokenizer and parser #710

Introduce location tracking in the tokenizer and parser #710

Uh oh!

Conversation

ankrgyl commented Nov 15, 2022 • edited by alamb Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ankrgyl commented Nov 15, 2022

Uh oh!

alamb commented Nov 16, 2022

Uh oh!

coveralls commented Nov 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 3623431070

💛 - Coveralls

Uh oh!

AugustoFKL commented Nov 16, 2022

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Nov 30, 2022

Choose a reason for hiding this comment

Uh oh!

alamb Nov 30, 2022

Choose a reason for hiding this comment

Uh oh!

ankrgyl commented Dec 1, 2022

Uh oh!

alamb commented Dec 1, 2022

Uh oh!

ankrgyl commented Dec 1, 2022

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Dec 2, 2022

Choose a reason for hiding this comment

Uh oh!

alamb Dec 2, 2022

Choose a reason for hiding this comment

Uh oh!

alamb commented Dec 2, 2022

Uh oh!

ankrgyl commented Dec 2, 2022

Uh oh!

ankrgyl commented Dec 2, 2022

Uh oh!

alamb commented Dec 2, 2022

Uh oh!

alamb commented Dec 5, 2022

Uh oh!

alamb commented Dec 5, 2022

Uh oh!

Uh oh!

ankrgyl commented Nov 15, 2022 •

edited by alamb

Loading

coveralls commented Nov 16, 2022 •

edited

Loading