Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: core/translate: uncorrelated FROM clause subqueries #566

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jussisaurio
Copy link
Collaborator

@jussisaurio jussisaurio commented Dec 28, 2024

I will warn that this PR is quite big out of necessity, since subqueries are, as the name implies, queries within queries, so everything that works with a regular query should also work with a subquery, roughly speaking.


  • Adds support for:
    • uncorrelated subqueries in FROM clause (i.e. appear as a "table", and do not refer to outer tables). Example of this at the end of the PR description.
    • column and subquery aliasing (select sub.renamed from (select name as renamed from products) sub)
    • inner and outer filtering of subqueries (select sub.name from (select name from products where name = 'joe') sub, and, select sub.name from (select name from products) sub where sub.name = 'joe')
    • joining between regular tables and subqueries
    • joining between multiple subqueries
    • in general working with subqueries should roughly equal working with regular tables
  • Main idea: subqueries are just wrappers of a SelectPlan that never emit ResultRows, instead they Yield control back to the parent query, and the parent query can copy the subquery result values into a ResultRow. New variant SourceOperator::Subquery that wraps a subquery SelectPlan.
  • Plans can now not only refer to btree tables (select p.name from products) but also subqueries (select sub.foo from (select name as foo from products) sub. Hence this PR also adds support for column aliases which didn't exist before.
    • An Expr::Column that refers to a regular table will result in an Insn::Column (i.e. a read from disk/memory) whereas an Expr::Column that refers to a subquery will result in an Insn::Copy (from register to register) instead
  • Subquery handling is entirely unoptimized, there's no predicate pushdown from outer query to subqueries, or elimination of redundant subqueries (e.g. in the trivial example SELECT * FROM (SELECT * FROM users) sub the subquery can just be entirely removed)

This PR does not add support (yet) for:

  • subqueries in result columns: SELECT t.foo, (SELECT .......) as column_from_subquery FROM t
  • subqueries in WHERE clauses e.g. SELECT * FROM t1 WHERE t1.foo IN (SELECT ...)
  • subquery-related optimizations, of which there are plenty available. No analysis is done regarding e.g. whether predicates on the outer query level could be pushed into the subquery, or whether the subquery could be entirely eliminated. Both of the above can probably be done fairly easily for a bunch of trivial cases.

Example bytecode with comments added:

limbo> EXPLAIN SELECT p.name, sub.funny_name FROM products p JOIN (
  select id, concat(name, '-lol') as funny_name from products
) sub USING (id) LIMIT 3;

addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     31    0                    0   Start at 31

// Coroutine implementation starts at insn 2, jump immediately to 14
1     InitCoroutine      1     14    2                    0   

2     OpenReadAsync      0     3     0                    0   table=products, root=3
3     OpenReadAwait      0     0     0                    0   
4     RewindAsync        0     0     0                    0   
5     RewindAwait        0     13    0                    0   Rewind table products
6       RowId            0     2     0                    0   r[2]=products.rowid
7       Column           0     1     4                    0   r[4]=products.name
8       String8          0     5     0     -lol           0   r[5]='-lol'
9       Function         0     4     3     concat         0   r[3]=func(r[4..5])

// jump back to main loop of query (insn 20)
10      Yield            1     0     0                    0   

11    NextAsync          0     0     0                    0   
12    NextAwait          0     6     0                    0   
13    EndCoroutine       1     0     0                    0   
14    OpenReadAsync      1     3     0                    0   table=p, root=3
15    OpenReadAwait      0     0     0                    0   
16    RewindAsync        1     0     0                    0   
17    RewindAwait        1     30    0                    0   Rewind table p

// Since this subquery is the inner loop of the join, reinitialize it on every iteration of the outer loop
18      InitCoroutine    1     0     2                    0   

// Jump back to the subquery implementation to assign another row into registers
19      Yield            1     28    0                    0   

20      RowId            1     8     0                    0   r[8]=p.rowid

// Copy sub.id
21      Copy             2     9     0                    0   r[9]=r[2]

// p.id == sub.id?
22      Ne               8     9     27                   0   if r[8]!=r[9] goto 27
23      Column           1     1     6                    0   r[6]=p.name

// copy sub.funny_name
24      Copy             3     7     0                    0   r[7]=r[3]

25      ResultRow        6     2     0                    0   output=r[6..7]
26      DecrJumpZero     10    30    0                    0   if (--r[10]==0) goto 30
27      Goto             0     19    0                    0   
28    NextAsync          1     0     0                    0   
29    NextAwait          1     18    0                    0   
30    Halt               0     0     0                    0   
31    Transaction        0     0     0                    0   
32    Integer            3     10    0                    0   r[10]=3
33    Goto               0     1     0                    0 

@jussisaurio jussisaurio force-pushed the subquery branch 4 times, most recently from 6ba814b to 2e3fb8b Compare December 29, 2024 07:12
@jussisaurio jussisaurio changed the title wip uncorrelated subqueries feat: core/translate: uncorrelated FROM clause subqueries Dec 29, 2024
@jussisaurio jussisaurio force-pushed the subquery branch 3 times, most recently from a04850c to c21f13d Compare December 29, 2024 10:19
@jussisaurio jussisaurio marked this pull request as ready for review December 29, 2024 15:06
@jussisaurio jussisaurio force-pushed the subquery branch 2 times, most recently from 3ff39d3 to 49bbae9 Compare December 31, 2024 10:43
@jussisaurio
Copy link
Collaborator Author

squashed the entire PR into one commit to make rebase conflict resolution easier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant