Skip to content

Conversation

@AryanVBW
Copy link

This pull request introduces significant additions to the e-graph data structure and its associated components for the KQIR optimizer. The changes include the implementation of the e-graph itself, equivalence classes, nodes, and rewrite rules, as well as a new equality saturation pass for query optimization.
fix:#2561
Key changes include:

E-Graph Implementation:

  • Added EGraph, EClass, and ENode classes to represent the e-graph, equivalence classes, and nodes respectively. This includes methods for adding nodes, merging classes, and extracting the best query plan based on a cost model. (src/search/passes/egraph.h)

Rewrite Rules:

  • Introduced several rewrite rules (FilterPushDownRewrite, MergeFilterRewrite, SortPushDownRewrite, FilterMergeRewrite, CommonSubexpressionRewrite) to transform the e-graph and optimize query plans. (src/search/passes/egraph_saturation.h)

Equality Saturation Pass:

  • Added the EGraphSaturation class, which applies the rewrite rules to the e-graph until saturation is achieved and extracts the best query plan using a cost model. (src/search/passes/egraph_saturation.h)

These changes collectively enhance the query optimization capabilities of the KQIR optimizer by leveraging e-graph-based equality saturation techniques.
@git-hulk @aleksraiden
@PragmaTwice

…KQIR optimizer.

In order to improve the KQIR optimizer's capacity to produce more effective query plans, this commit presents an e-graph equality saturation framework. The following are included in the implementation:

Rewrite rules for query plan transformation; equality saturation algorithms for query optimization; and a new e-graph representation for KQIR nodes
A new optimization pass that integrates with the current PassManager and makes use of the e-graph framework

Through equality saturation, which can find equivalent query plans and choose the most effective one based on a cost model, the e-graph framework makes it possible for more potent term rewriting capabilities.

By choosing the best execution strategy and examining a wider range of equivalent query plans, this improvement will increase query performance.
@aleksraiden
Copy link
Contributor

@AryanVBW Thanks for your contribution!

@AryanVBW
Copy link
Author

@AryanVBW Thanks for your contribution!

Thank you, sir, It’s truly my pleasure to work with such humble people. I always love to contribute. Please let me know if there are any improvements I can make or any changes needed

@aleksraiden
Copy link
Contributor

@AryanVBW As I see, a clang-lint into CI have a some warning. Could you please run ./x.py format to fix it?

@AryanVBW
Copy link
Author

Ok

Copy link
Member

@PragmaTwice PragmaTwice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm it seems there's just some skeleton rather than a complete implementation.

It cannot work so I think it's hard to get it merged. Also we need some test cases for it.

@SharonIV0x86
Copy link
Contributor

Hey @AryanVBW , this PR looks like a good starting point, but it’s missing a some of the key parts to be a functional KQIR optimizer.

There’s no actual equality saturation algorithm, the rewrite rules don’t do anything yet, the cost model is just a placeholder, and it’s not integrated with Kvrocks' query engine (this can be done later ig). Plus, without tests or benchmarks, we can’t validate if it actually improves anything. I’d suggest making this a draft PR and continuing work on it, or creating a separate branch where we can properly develop it before merging.

To the best of my knowledge, this is going to be much more complex than standard SQL parsing. I’d recommend checking out some existing articles like KQIR: a query engine for Apache Kvrocks to get better understanding of how query optimization works in Kvrocks. A good next step would be to experiment with combining multiple operations like SCAN, ZRANGE, and GET. The optimizer can then use egraphs to explore different ways to merge or reorder these operations, in the end finding the most efficient execution path.

A really good resource: https://egraphs-good.github.io/
Very interesting introduction to egraphs: https://www.cole-k.com/2023/07/24/e-graphs-primer/

The search-tests directory contains build artifacts and should not be tracked in version control. This commit removes it to keep the repository clean.
The search-tests directory contains build artifacts and should not be tracked in version control. This commit removes it to keep the repository clean.
@AryanVBW
Copy link
Author

Hmmm it seems there's just some skeleton rather than a complete implementation.

It cannot work so I think it's hard to get it merged. Also we need some test cases for it.

Yes, sir. I initially started working on it but soon realized that it wasn’t a complete implementation. So, I continued working to complete it properly. Thank you so much, sir, for your review

@AryanVBW
Copy link
Author

Hey @AryanVBW , this PR looks like a good starting point, but it’s missing a some of the key parts to be a functional KQIR optimizer.

There’s no actual equality saturation algorithm, the rewrite rules don’t do anything yet, the cost model is just a placeholder, and it’s not integrated with Kvrocks' query engine (this can be done later ig). Plus, without tests or benchmarks, we can’t validate if it actually improves anything. I’d suggest making this a draft PR and continuing work on it, or creating a separate branch where we can properly develop it before merging.

To the best of my knowledge, this is going to be much more complex than standard SQL parsing. I’d recommend checking out some existing articles like KQIR: a query engine for Apache Kvrocks to get better understanding of how query optimization works in Kvrocks. A good next step would be to experiment with combining multiple operations like SCAN, ZRANGE, and GET. The optimizer can then use egraphs to explore different ways to merge or reorder these operations, in the end finding the most efficient execution path.

A really good resource: https://egraphs-good.github.io/ Very interesting introduction to egraphs: https://www.cole-k.com/2023/07/24/e-graphs-primer/

Thank you, sir. I really appreciate your detailed feedback and guidance. I truly enjoy working on this, and I understand that there’s still a lot to refine. I’ll start by creating a separate branch to continue developing a proper KQIR optimizer, ensuring that key components like equality saturation, rewrite rules, and cost models are implemented correctl

I'll also spend some time reading the recommended materials to learn more about query optimization in Kvrocks. I'm looking forward to gradually improving this. Once again, I appreciate your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants