Skip to content

[SPARK-51611][SQL] New iteration of single-pass Analyzer functionality#50406

Closed
vladimirg-db wants to merge 1 commit intoapache:masterfrom
vladimirg-db:vladimir-golubev_data/single-pass-analyzer/new-iteration-of-features
Closed

[SPARK-51611][SQL] New iteration of single-pass Analyzer functionality#50406
vladimirg-db wants to merge 1 commit intoapache:masterfrom
vladimirg-db:vladimir-golubev_data/single-pass-analyzer/new-iteration-of-features

Conversation

@vladimirg-db
Copy link
Copy Markdown
Contributor

@vladimirg-db vladimirg-db commented Mar 26, 2025

What changes were proposed in this pull request?

New single-pass Analyzer features:

  • GROUP BY
  • ORDER BY
  • JOIN
  • EXCEPT
  • INTERSECT
  • Correlated subqueries
  • Other small features and bugfixes

Why are the changes needed?

To replace the fixed-point Analyzer with the single-pass one.

Does this PR introduce any user-facing change?

No, single-pass Analyzer is not yet enabled by default.

How was this patch tested?

Logical plans were compared in tests with ANALYZER_DUAL_RUN_LEGACY_AND_SINGLE_PASS_RESOLVER.

Was this patch authored or co-authored using generative AI tooling?

Yes.

@github-actions github-actions Bot added the SQL label Mar 26, 2025
@vladimirg-db vladimirg-db force-pushed the vladimir-golubev_data/single-pass-analyzer/new-iteration-of-features branch 5 times, most recently from bbb202d to cb6a664 Compare March 26, 2025 17:30
@vladimirg-db vladimirg-db changed the title [WIP][SPARK-51611][SQL] New iteration of single-pass Analyzer functionality [SPARK-51611][SQL] New iteration of single-pass Analyzer functionality Mar 26, 2025
@vladimirg-db vladimirg-db force-pushed the vladimir-golubev_data/single-pass-analyzer/new-iteration-of-features branch from cb6a664 to 3a039b0 Compare March 26, 2025 19:18
@vladimirg-db
Copy link
Copy Markdown
Contributor Author

Thanks @mihailoale-db and @mihailotim-db for this work!

@vladimirg-db vladimirg-db force-pushed the vladimir-golubev_data/single-pass-analyzer/new-iteration-of-features branch 3 times, most recently from 74f9e11 to 84ff648 Compare April 3, 2025 15:04
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI is green

@vladimirg-db vladimirg-db force-pushed the vladimir-golubev_data/single-pass-analyzer/new-iteration-of-features branch 2 times, most recently from 50945ae to d0a91cb Compare April 7, 2025 16:29
@vladimirg-db
Copy link
Copy Markdown
Contributor Author

Thanks @mihailom-db for EXCEPT/INTERSECT!

@vladimirg-db vladimirg-db force-pushed the vladimir-golubev_data/single-pass-analyzer/new-iteration-of-features branch from d0a91cb to e86d0ee Compare April 7, 2025 16:30
@cloud-fan
Copy link
Copy Markdown
Contributor

The test failures seem to be valid

@vladimirg-db
Copy link
Copy Markdown
Contributor Author

Yeah, sorry... let me fix that real quick...

@vladimirg-db vladimirg-db force-pushed the vladimir-golubev_data/single-pass-analyzer/new-iteration-of-features branch from e86d0ee to ab27723 Compare April 8, 2025 09:40
@vladimirg-db
Copy link
Copy Markdown
Contributor Author

@cloud-fan tests passed!

@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 9efddcf Apr 8, 2025
@iwanttobepowerful
Copy link
Copy Markdown

@vladimirg-db hi, does new Analyzer's principle similar to calcite's validator?

@vladimirg-db
Copy link
Copy Markdown
Contributor Author

vladimirg-db commented Apr 15, 2025

Hello @iwanttobepowerful. I never worked with Calcite.

The new Analyzer processes the logical plan in one* bottom-up traversal. For example, expression IDs get propagated bottom-up (ExpressionIdAssigner) or subqueries can find outer references from already resolved subtrees. That means that sometimes we have to build additional data-structures as we descend to leaf nodes before starting the bottom-up analysis, e.g. for CTE resolution we build a stack of lookup scopes.

  • We have one pass for the main algebra (95% of the Analysis), but there is also one pre-pass to resolve the metadata (blocking calls), and a couple of rules running after the Analysis before the Optimizer to stay compatible with the old Analyzer in PlanRewriter (e.g. CleanupAliases).

@iwanttobepowerful
Copy link
Copy Markdown

Hello @iwanttobepowerful. I never worked with Calcite.

The new Analyzer processes the logical plan in one* bottom-up traversal. For example, expression IDs get propagated bottom-up (ExpressionIdAssigner) or subqueries can find outer references from already resolved subtrees. That means that sometimes we have to build additional data-structures as we descend to leaf nodes before starting the bottom-up analysis, e.g. for CTE resolution we build a stack of lookup scopes.

  • We have one pass for the main algebra (95% of the Analysis), but there is also one pre-pass to resolve the metadata (blocking calls), and a couple of rules running after the Analysis before the Optimizer to stay compatible with the old Analyzer in PlanRewriter (e.g. CleanupAliases).

thanks!!! very nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants