-
Notifications
You must be signed in to change notification settings - Fork 154
Revert stream pattern method in V2 and implement SIMPLE_PATTERN in Calcite #3553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert stream pattern method in V2 and implement SIMPLE_PATTERN in Calcite #3553
Conversation
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
.toString() | ||
.toLowerCase(Locale.ROOT); | ||
if (patternMethod.equalsIgnoreCase(PatternMethod.SIMPLE_PATTERN.getName())) { | ||
return new Parse(ParseMethod.PATTERNS, sourceField, pattern, arguments); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we rename the Parse
node and change its doc? It's weird to use a node which says /** AST node represent Parse with regex operation. */
in its doc for PatternCommand
.
Looks like our definition of UnresolvedPlan is something between AST and LogicalPlan. Most are more like AST(e.g. Parse, Rename, Eval), while others are more like LogicalPlan(e.g. Window, Aggregation, Project, Filter).
Under current convention, seems we cannot figure out a proper name for it or determine which group it belong to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not doing this refactoring in this PR to introduce minimum change. I left comment to call out why do we keep that Parse plan for legacy patterns method. And call out there is opportunity of refactoring it into just Project operator.
RexNode sourceField = rexVisitor.analyze(node.getSourceField(), context); | ||
ParseMethod parseMethod = node.getParseMethod(); | ||
java.util.Map<String, Literal> arguments = node.getArguments(); | ||
String pattern = (String) node.getPattern().getValue(); | ||
// TODO: Support Grok parse method | ||
Pair<SqlOperator, String> operatorPatternPair = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be registered as functions in PPLFuncImpTable
?Seems we will have more method in future and it should be a true function in Calcite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Registered related calcite REGEXP
functions into PPLFuncImpTable
Signed-off-by: Songkan Tang <[email protected]>
REGEXP_EXTRACT(FunctionName.of("regexp_extract")), | ||
REGEXP_REPLACE_2(FunctionName.of("regexp_replace_2")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these functions all existing UDF in V2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, change them to internal functions with additional boolean to describe them.
RelNode root = getRelNode(ppl); | ||
|
||
String expectedLogical = | ||
"LogicalProject(ENAME=[$1], patterns_field=[REGEXP_REPLACE($1, '[A-H]')])\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think the '[A-H]'
here should change to '[A-H]':VARCHAR
if #3558 merged first Just call out for reminding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. In visitParse, it's changed to makeLiteral(pattern, VARCHAR).
RexNode sourceField = rexVisitor.analyze(node.getSourceField(), context); | ||
ParseMethod parseMethod = node.getParseMethod(); | ||
java.util.Map<String, Literal> arguments = node.getArguments(); | ||
String pattern = (String) node.getPattern().getValue(); | ||
String patternValue = (String) node.getPattern().getValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patterns
is a command, I didn't see the implementation code visitPatterns
here? How PatternsIT works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be changed in part II. Part I just reverts previous changes to V2. V2's patterns representation is Parse
.
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add tests for explain on patterns in ExplainIT
as well? May need 2 tests for cases w/wo stats command after patterns.
As discussed offline, it only supports pushdown for patterns with stats command after it. It would help us verify the feature of pushing down patterns on Calcite in the future.
Signed-off-by: Songkan Tang <[email protected]>
Sure, I added suggested two cases. But I feel that IT is a bit strange. It's better to assert on actual plan change. The pushed down script is encoded string that is not readable. |
Signed-off-by: Songkan Tang <[email protected]>
Please update |
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
Done. And fixed failed IT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add testPatterns
in PPLQueryDataAnonymizerTest
Signed-off-by: Songkan Tang <[email protected]>
Added |
Description
Revert stream pattern method in V2 and implement SIMPLE_PATTERN in Calcite
Related Issues
Resolves #3552
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.