Skip to content

Revert stream pattern method in V2 and implement SIMPLE_PATTERN in Calcite #3553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

songkant-aws
Copy link
Contributor

@songkant-aws songkant-aws commented Apr 16, 2025

Description

Revert stream pattern method in V2 and implement SIMPLE_PATTERN in Calcite

Related Issues

Resolves #3552

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

.toString()
.toLowerCase(Locale.ROOT);
if (patternMethod.equalsIgnoreCase(PatternMethod.SIMPLE_PATTERN.getName())) {
return new Parse(ParseMethod.PATTERNS, sourceField, pattern, arguments);
Copy link
Collaborator

@qianheng-aws qianheng-aws Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we rename the Parse node and change its doc? It's weird to use a node which says /** AST node represent Parse with regex operation. */ in its doc for PatternCommand.

Looks like our definition of UnresolvedPlan is something between AST and LogicalPlan. Most are more like AST(e.g. Parse, Rename, Eval), while others are more like LogicalPlan(e.g. Window, Aggregation, Project, Filter).

Under current convention, seems we cannot figure out a proper name for it or determine which group it belong to.

Copy link
Contributor Author

@songkant-aws songkant-aws Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not doing this refactoring in this PR to introduce minimum change. I left comment to call out why do we keep that Parse plan for legacy patterns method. And call out there is opportunity of refactoring it into just Project operator.

RexNode sourceField = rexVisitor.analyze(node.getSourceField(), context);
ParseMethod parseMethod = node.getParseMethod();
java.util.Map<String, Literal> arguments = node.getArguments();
String pattern = (String) node.getPattern().getValue();
// TODO: Support Grok parse method
Pair<SqlOperator, String> operatorPatternPair =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be registered as functions in PPLFuncImpTable?Seems we will have more method in future and it should be a true function in Calcite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Registered related calcite REGEXP functions into PPLFuncImpTable

@LantaoJin LantaoJin added the calcite calcite migration releated label Apr 16, 2025
Signed-off-by: Songkan Tang <[email protected]>
Comment on lines 207 to 208
REGEXP_EXTRACT(FunctionName.of("regexp_extract")),
REGEXP_REPLACE_2(FunctionName.of("regexp_replace_2")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these functions all existing UDF in V2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, change them to internal functions with additional boolean to describe them.

RelNode root = getRelNode(ppl);

String expectedLogical =
"LogicalProject(ENAME=[$1], patterns_field=[REGEXP_REPLACE($1, '[A-H]')])\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think the '[A-H]' here should change to '[A-H]':VARCHAR if #3558 merged first Just call out for reminding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. In visitParse, it's changed to makeLiteral(pattern, VARCHAR).

@LantaoJin LantaoJin added calcite calcite migration releated and removed calcite calcite migration releated labels Apr 18, 2025
RexNode sourceField = rexVisitor.analyze(node.getSourceField(), context);
ParseMethod parseMethod = node.getParseMethod();
java.util.Map<String, Literal> arguments = node.getArguments();
String pattern = (String) node.getPattern().getValue();
String patternValue = (String) node.getPattern().getValue();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patterns is a command, I didn't see the implementation code visitPatterns here? How PatternsIT works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be changed in part II. Part I just reverts previous changes to V2. V2's patterns representation is Parse.

Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
Signed-off-by: Songkan Tang <[email protected]>
Copy link
Collaborator

@qianheng-aws qianheng-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add tests for explain on patterns in ExplainIT as well? May need 2 tests for cases w/wo stats command after patterns.

As discussed offline, it only supports pushdown for patterns with stats command after it. It would help us verify the feature of pushing down patterns on Calcite in the future.

@songkant-aws
Copy link
Contributor Author

Could you please add tests for explain on patterns in ExplainIT as well? May need 2 tests for cases w/wo stats command after patterns.

As discussed offline, it only supports pushdown for patterns with stats command after it. It would help us verify the feature of pushing down patterns on Calcite in the future.

Sure, I added suggested two cases. But I feel that IT is a bit strange. It's better to assert on actual plan change. The pushed down script is encoded string that is not readable.

@LantaoJin
Copy link
Member

Please update visitWindow to visitPatterns in Analyzer.

@songkant-aws
Copy link
Contributor Author

Please update visitWindow to visitPatterns in Analyzer.

Done. And fixed failed IT.

Copy link
Member

@LantaoJin LantaoJin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add testPatterns in PPLQueryDataAnonymizerTest

@songkant-aws
Copy link
Contributor Author

songkant-aws commented Apr 29, 2025

Could you add testPatterns in PPLQueryDataAnonymizerTest

Added testPatterns for SIMPLE_PATTERN method. The BRAIN method will be added in another PR because it needs some signature change of unresolved plan.

@LantaoJin LantaoJin merged commit 788da98 into opensearch-project:main Apr 30, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
calcite calcite migration releated
Projects
None yet
3 participants