feat(enhancement): Include the e-graph equality saturation framework in the KQIR optimizer. #2832

AryanVBW · 2025-03-16T11:04:34Z

This pull request introduces significant additions to the e-graph data structure and its associated components for the KQIR optimizer. The changes include the implementation of the e-graph itself, equivalence classes, nodes, and rewrite rules, as well as a new equality saturation pass for query optimization.
fix:#2561
Key changes include:

E-Graph Implementation:

Added EGraph, EClass, and ENode classes to represent the e-graph, equivalence classes, and nodes respectively. This includes methods for adding nodes, merging classes, and extracting the best query plan based on a cost model. (src/search/passes/egraph.h)

Rewrite Rules:

Introduced several rewrite rules (FilterPushDownRewrite, MergeFilterRewrite, SortPushDownRewrite, FilterMergeRewrite, CommonSubexpressionRewrite) to transform the e-graph and optimize query plans. (src/search/passes/egraph_saturation.h)

Equality Saturation Pass:

Added the EGraphSaturation class, which applies the rewrite rules to the e-graph until saturation is achieved and extracts the best query plan using a cost model. (src/search/passes/egraph_saturation.h)

These changes collectively enhance the query optimization capabilities of the KQIR optimizer by leveraging e-graph-based equality saturation techniques.
@git-hulk @aleksraiden
@PragmaTwice

…KQIR optimizer. In order to improve the KQIR optimizer's capacity to produce more effective query plans, this commit presents an e-graph equality saturation framework. The following are included in the implementation: Rewrite rules for query plan transformation; equality saturation algorithms for query optimization; and a new e-graph representation for KQIR nodes A new optimization pass that integrates with the current PassManager and makes use of the e-graph framework Through equality saturation, which can find equivalent query plans and choose the most effective one based on a cost model, the e-graph framework makes it possible for more potent term rewriting capabilities. By choosing the best execution strategy and examining a wider range of equivalent query plans, this improvement will increase query performance.

aleksraiden · 2025-03-16T11:07:39Z

@AryanVBW Thanks for your contribution!

AryanVBW · 2025-03-16T11:21:34Z

@AryanVBW Thanks for your contribution!

Thank you, sir, It’s truly my pleasure to work with such humble people. I always love to contribute. Please let me know if there are any improvements I can make or any changes needed

aleksraiden · 2025-03-16T17:43:34Z

@AryanVBW As I see, a clang-lint into CI have a some warning. Could you please run ./x.py format to fix it?

AryanVBW · 2025-03-16T18:06:50Z

Ok

PragmaTwice

Hmmm it seems there's just some skeleton rather than a complete implementation.

It cannot work so I think it's hard to get it merged. Also we need some test cases for it.

SharonIV0x86 · 2025-03-17T14:59:04Z

Hey @AryanVBW , this PR looks like a good starting point, but it’s missing a some of the key parts to be a functional KQIR optimizer.

There’s no actual equality saturation algorithm, the rewrite rules don’t do anything yet, the cost model is just a placeholder, and it’s not integrated with Kvrocks' query engine (this can be done later ig). Plus, without tests or benchmarks, we can’t validate if it actually improves anything. I’d suggest making this a draft PR and continuing work on it, or creating a separate branch where we can properly develop it before merging.

To the best of my knowledge, this is going to be much more complex than standard SQL parsing. I’d recommend checking out some existing articles like KQIR: a query engine for Apache Kvrocks to get better understanding of how query optimization works in Kvrocks. A good next step would be to experiment with combining multiple operations like SCAN, ZRANGE, and GET. The optimizer can then use egraphs to explore different ways to merge or reorder these operations, in the end finding the most efficient execution path.

A really good resource: https://egraphs-good.github.io/
Very interesting introduction to egraphs: https://www.cole-k.com/2023/07/24/e-graphs-primer/

…zation

The search-tests directory contains build artifacts and should not be tracked in version control. This commit removes it to keep the repository clean.

AryanVBW · 2025-03-17T15:49:16Z

Hmmm it seems there's just some skeleton rather than a complete implementation.

It cannot work so I think it's hard to get it merged. Also we need some test cases for it.

Yes, sir. I initially started working on it but soon realized that it wasn’t a complete implementation. So, I continued working to complete it properly. Thank you so much, sir, for your review

AryanVBW · 2025-03-17T15:53:31Z

Hey @AryanVBW , this PR looks like a good starting point, but it’s missing a some of the key parts to be a functional KQIR optimizer.

There’s no actual equality saturation algorithm, the rewrite rules don’t do anything yet, the cost model is just a placeholder, and it’s not integrated with Kvrocks' query engine (this can be done later ig). Plus, without tests or benchmarks, we can’t validate if it actually improves anything. I’d suggest making this a draft PR and continuing work on it, or creating a separate branch where we can properly develop it before merging.

To the best of my knowledge, this is going to be much more complex than standard SQL parsing. I’d recommend checking out some existing articles like KQIR: a query engine for Apache Kvrocks to get better understanding of how query optimization works in Kvrocks. A good next step would be to experiment with combining multiple operations like SCAN, ZRANGE, and GET. The optimizer can then use egraphs to explore different ways to merge or reorder these operations, in the end finding the most efficient execution path.

A really good resource: https://egraphs-good.github.io/ Very interesting introduction to egraphs: https://www.cole-k.com/2023/07/24/e-graphs-primer/

Thank you, sir. I really appreciate your detailed feedback and guidance. I truly enjoy working on this, and I understand that there’s still a lot to refine. I’ll start by creating a separate branch to continue developing a proper KQIR optimizer, ensuring that key components like equality saturation, rewrite rules, and cost models are implemented correctl

I'll also spend some time reading the recommended materials to learn more about query optimization in Kvrocks. I'm looking forward to gradually improving this. Once again, I appreciate your help!

PragmaTwice reviewed Mar 17, 2025

View reviewed changes

AryanVBW added 3 commits March 17, 2025 21:02

fix: Implement E-Graph equality saturation framework for query optimi…

050738b

…zation

chore: Remove search-tests build directory from version control

d5bfc5e

The search-tests directory contains build artifacts and should not be tracked in version control. This commit removes it to keep the repository clean.

chore: Remove search-tests build directory from version control

4569ac1

The search-tests directory contains build artifacts and should not be tracked in version control. This commit removes it to keep the repository clean.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(enhancement): Include the e-graph equality saturation framework in the KQIR optimizer. #2832

feat(enhancement): Include the e-graph equality saturation framework in the KQIR optimizer. #2832

Uh oh!

AryanVBW commented Mar 16, 2025

Uh oh!

aleksraiden commented Mar 16, 2025

Uh oh!

AryanVBW commented Mar 16, 2025

Uh oh!

aleksraiden commented Mar 16, 2025

Uh oh!

AryanVBW commented Mar 16, 2025

Uh oh!

PragmaTwice left a comment •

edited

Loading

Uh oh!

SharonIV0x86 commented Mar 17, 2025

Uh oh!

AryanVBW commented Mar 17, 2025

Uh oh!

AryanVBW commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(enhancement): Include the e-graph equality saturation framework in the KQIR optimizer. #2832

Are you sure you want to change the base?

feat(enhancement): Include the e-graph equality saturation framework in the KQIR optimizer. #2832

Uh oh!

Conversation

AryanVBW commented Mar 16, 2025

E-Graph Implementation:

Rewrite Rules:

Equality Saturation Pass:

Uh oh!

aleksraiden commented Mar 16, 2025

Uh oh!

AryanVBW commented Mar 16, 2025

Uh oh!

aleksraiden commented Mar 16, 2025

Uh oh!

AryanVBW commented Mar 16, 2025

Uh oh!

PragmaTwice left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SharonIV0x86 commented Mar 17, 2025

Uh oh!

AryanVBW commented Mar 17, 2025

Uh oh!

AryanVBW commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PragmaTwice left a comment •

edited

Loading