Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added possibility to read non escapable sequences #34

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

npenin
Copy link

@npenin npenin commented Mar 11, 2021

Hi Seb,

I was missing the possibility to read string like in SQL where you escape using a double single quote (or as in C# when using the @string notation)

Please let me know if you need more that just the implementation.

PS: sorry for all the other changes, I have automated formatting when saving.

@sebastienros
Copy link
Owner

Coucou!

At least a little unit tests?


lastQuote = nextQuote + 1;
}
while(Cursor.Buffer.Length > lastQuote && Cursor.Buffer[lastQuote] == endSequenceChar);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Infinite loop, or if not then it should be a single if. I don't see any state changing in the loop.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose is to catch multiple cases, so it has to be a while. It would be an infinite loop if I did not have a break on line 511.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a ; at the end of the while line. No?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a do/while, not a while/do ;)

[InlineData("'Lorem \n ipsum'", "'Lorem \n ipsum'")]
[InlineData("'Lorem '' ipsum'", "'Lorem '' ipsum'")]
[InlineData(@"""Lorem """" ipsum""", "\"Lorem \"\" ipsum\"")]
public void ShouldReadNonEscapableString(string text, string expected)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing some tests that shows when it's invalid.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also when 2,3, more quotes are used. Or when more groups of n quotes are used.
Also when start and end char are not the same (though I mention I don't see the point in another comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when what is invalid ? I don't get it ?

Copy link
Owner

@sebastienros sebastienros Mar 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the tests are positive. You need to add some to show when it's not valid and Fails() is called. To be sure that it doesn't "match" when it shouldn't.

/// This method doesn't escape the string, but only validates its content is syntactically correct.
/// The resulting Span contains the original quotes.
/// </remarks>
public bool ReadNonEscapableSequence(char startSequenceChar, char endSequenceChar, out TokenResult result)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the need of endSequenceChar. If the goal of the PR is to read strings that are not escaped, then the method should be called ReadNonEscapableString and fetch the first char (delimiter) by itself, like the other ReadString methods are doing in this class.

This one looks more generic, but at the same time I am not convinced about the added complexity of being able to set a different start/end delimiter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the purpose is also to read sql escapes like [ ] in tsql or `` in mysql, ...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Any link to examples of things that are not word and still can be escaped by doubling it?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the[] example. It's more like a Between parser that exists if there is no escaping to check. Unless we can use [ between[] ? If not then I continue to think no arguments are necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only found a stackoverflow link https://stackoverflow.com/questions/4200351/what-characters-are-valid-in-an-sql-server-database-name, but I can tell you for sure that you can have almost anything as a character as long as you escape it with []. Here is a doc for mysql: https://dev.mysql.com/doc/refman/8.0/en/identifiers.html

var start = Cursor.Position;

// If the next escape if not before the next quote, we can return the string as-is
Cursor.Advance(nextQuote + 2 - startOffset);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nice optimization to call Advance only once, but then you are missing on EOF checks. So a string like 'foo will fail. Which shows some unit tests are missing.

@sebastienros
Copy link
Owner

If you are actually working on a SQL parser, we need to migrate this one to Parlot. https://github.com/OrchardCMS/OrchardCore/blob/dev/src/OrchardCore.Modules/OrchardCore.Queries/Sql/SqlGrammar.cs

@npenin
Copy link
Author

npenin commented Mar 13, 2021

I'm actually working on a usql-like parser, so a mix a sql and c#

npenin added 2 commits March 13, 2021 22:39
added some more examples for tsql like escapes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants