Retain escape sequences in lexer, fixes #5#21
Conversation
\\ is preserved in Lexer, leaving it to later stages to decide what masking is to be used.
julianladisch
left a comment
There was a problem hiding this comment.
I support this.
When releasing cql-java the change should be documented in the Changes file with an example.
MikeTaylor
left a comment
There was a problem hiding this comment.
I feel very dense looking at this code, but it doesn't really feel right to me.
| switch (rnd.nextInt(10)) { | ||
| case 0: return "cat"; | ||
| case 1: return "\"cat\""; | ||
| case 1: return "\\\"cat\\\""; |
There was a problem hiding this comment.
We discussed this briefly on Slack, but I still don't understand why the change. Yes this needs to round-trip correctly — but all strings in any CQL query (hence all strings that we generate in the query generator) need to round-trip correctly. So why do we care what this one is?
There was a problem hiding this comment.
This PR changes the term so that escape sequences are retained (preserved). If a term includes " it will always be preceded by backslash.
There was a problem hiding this comment.
Are you saying that only some terms are round-tripped correctly?
There was a problem hiding this comment.
yep. If a term contained a bare ", that would not be round-tripped correctly. But that would never be the result of parsing.
| str.indexOf('(') != -1 || | ||
| str.indexOf(')') != -1) { | ||
| str = '"' + str.replace("\"", "\\\"") + '"'; | ||
| str = '"' + str + '"'; |
There was a problem hiding this comment.
I'm probably missing something, but this looks wrong to me. It looks like it will render a"b as "a"b", when surely it should be "a\"b" as in the previous version?
There was a problem hiding this comment.
I repeat: " will always be preceded by backslash.
There was a problem hiding this comment.
You're asserting a precondition that where str contains a ", it is always immediately prefixed with a \ — right? If so, then this is OK, I guess, but feels fragile.
There was a problem hiding this comment.
The bare quote case is now considered.
| if (qi == ql - 1) { | ||
| break; //unterminated | ||
| } | ||
| buf.append(qs.charAt(qi)); |
There was a problem hiding this comment.
I don't understand the intent of this section at all. It looks like it can't lex "a\"b" at all, but will return the string a. Am I wrong?
There was a problem hiding this comment.
bin/CQLParser
"a\"b"
<searchClause>
<index>cql.serverChoice</index>
<relation>
<value>=</value>
</relation>
<term>a\"b</term>
</searchClause>
There was a problem hiding this comment.
OK, good to see. My code-reading foo is depressed.
MikeTaylor
left a comment
There was a problem hiding this comment.
All right, looks good. Thanks for your patence!
| str.indexOf('(') != -1 || | ||
| str.indexOf(')') != -1) { | ||
| str = '"' + str.replace("\"", "\\\"") + '"'; | ||
| str = '"' + str + '"'; |
There was a problem hiding this comment.
You're asserting a precondition that where str contains a ", it is always immediately prefixed with a \ — right? If so, then this is OK, I guess, but feels fragile.
| str.indexOf('(') != -1 || | ||
| str.indexOf(')') != -1) { | ||
| str = '"' + str.replace("\"", "\\\"") + '"'; | ||
| str = '"' + str + '"'; |
Backslash (
\) is preserved in Lexer, leaving it to later stages to decide what masking is to be used.As is done with yaz and cql-go.
https://github.com/indexdata/cql-go/blob/84f3837d60305e690103051c0b52fc3ff4c32500/cql/lexer.go#L94
https://github.com/indexdata/cql-go/blob/84f3837d60305e690103051c0b52fc3ff4c32500/cql/cql.go#L281
Example of CQL query:
which was converted to:
It is now converted to:
The same term without quotes are already preserved, eg:
becomes
So existing behavior was inconsistent WRT how " was treated.