Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SELECT: If the schema is strict and a projected field doesn't exist, we should return an error #465

Open
asdine opened this issue Jul 2, 2022 · 5 comments
Labels
bug Something isn't working
Milestone

Comments

@asdine
Copy link
Collaborator

asdine commented Jul 2, 2022

genji> create table foo(a int);
genji> insert into foo (a) values (1);
genji> select a, b from foo;
{
  "a": 1,
  "b": null
}

An error should be returned instead

The detection must be done during the preparation phase, before executing the query.

@asdine asdine added the bug Something isn't working label Jul 2, 2022
@asdine asdine added this to the v0.16.0 milestone Jul 2, 2022
@tzzed
Copy link
Contributor

tzzed commented Jul 11, 2022

This implies the WHERE clause too.

SELECT a from foo WHERE b > 10; 

should also returns an error.

@asdine
Copy link
Collaborator Author

asdine commented Jul 12, 2022

This implies the WHERE clause too.

Good point, anywhere an expression is used in the SELECT statement (WHERE, GROUP BY, etc.)

@Darkclainer
Copy link
Contributor

I was tackling issue #464 and found that it is more than related to this issue.

Consider we have changed behavior how document.Path (or expr.Path that is the same type) evaluated: now we treat non existing paths as errors. Studying broken tests I noticed next issues:

  1. There is at least one test that affected by the discussed issue. It uses next statement:

    SELECT COUNT(a) FROM test WHERE a < ? GROUP BY b ORDER BY a DESC LIMIT 5

    Subtle thing that there is no a in context of ORDER BY, but only count(a). PR select: return error if field doesn't exist on tables with strict schema #477 can detect error (not certain what error in particular, because query like SELECT COUNT(a) AS cnt FROM test WHERE a < 3 GROUP BY b ORDER BY cnt DESC LIMIT 5 also emit error)

  2. GROUP BY with complex expression become broken. I didn't examine why.

  3. Legit case where we refer to a non-existent field in the non-strict schema doesn't work (unsurprisingly).

So, the way how KVPairs evaluated is deeply tighten with how statements are evaluated too.

Probably we should think about these issue not like "forbid some behavior in this case", but more like "forbid using undefined identificators everywhere except where non-strict schema used".

@asdine
Copy link
Collaborator Author

asdine commented Oct 12, 2022

I think it might be easier and less expensive to do a static check only once during the preparation phase, right after the query has been parsed, rather than during the evaluation phase of KVPairs and other affected expressions.

Lifecycle of a query

First, here is a simplistic overview of the lifecycle of a query:

query string (ex: SELECT a FROM foo WHERE b > 1)
-> parsing (generates a statement object)
-> preparation phase (generates a stream object)
-> stream optimization
-> stream execution

During the preparation phase, we have a full view of the query right after it has been parsed, and we also have access to the table information (what is the schema, is it strict, etc.).
The other advantage is that any check done here is done only once, whereas the during the execution phase we might do checks for every document read from the stream.

Proposal

To do this we need to modify this SelectStmt and SelectCoreStmt structures (they are both in the same file): https://github.com/genjidb/genji/blob/487d4b4eec9331c86b4ead7d30aa4eb2e8b1e7f1/internal/query/statement/select.go#L146
https://github.com/genjidb/genji/blob/487d4b4eec9331c86b4ead7d30aa4eb2e8b1e7f1/internal/query/statement/select.go#L15

(to better understand the relationship between these two structures, see this schema in the documentation)

We then simply need to read the context received in the Prepare method and access the table information.
If the schema is strict, we use the expr.Walk function to walk over all the expressions of the statement, one by one.

Then we do the same to all other statements (UpdateStmt, DeleteStmt, etc.)

@Darkclainer
Copy link
Contributor

Darkclainer commented Aug 6, 2023

Hello, @asdine! Thanks for great explanation you provided earlier, that was a lot of help to me. We discussed this feature a while ago, but finally I have enough time to give it enough time for some completion.

I made some early draft only for select core case #504 and realized there are a lot of design questions that can affect the scope of PR (one or many) and because I'm by any means no maintainer I want to address them here and then continue to work on this feature, so if you are fine with this, there my thought on this feature.

  1. What should we do about named expressions? I have the feeling that they do not work at all and probably should be done first, but this issue can provide some backbone.

    UPDATE: I had felling that SQL have this feature, but seems I was very wrong, and WHERE not supposed to support aliased expressions. Interestingly this seems relatively easy to implement in Genji.

    CREATE TABLE t (a INT);
    INSERT INTO t (a) VALUES (1);
    
    # currently returns nothing, so this is probably correct, but current
    # validation can generate error to indicate that this is not allowed.
    SELECT a + 1 as b FROM t WHERE b > 1;
    # this works as expected
    SELECT a FROM t WHERE a + 1 > 1;
    

    Of course this generates a lot of interesting cases, for example with
    anonymous types:

    CREATE TABLE t ( a ( b INT ));
    INSERT INTO t (a) VALUES ({"b": 1});
    # this works
    SELECT a.b FROM t WHERE a.b > 0;
    # this don't
    SELECT a.b AS e FROM t WHERE e > 0;
    
  2. Should we do type validation? It certainly possible, but I feel like this is way out of scope and can be safely done later. I mean cases like this:

    CREATE TABEL t (a INT);
    
    # can generate error
    SELECT a == "string" from t;
    
    # also can generate error
    SELECT a FROM t WHERE a == "string";
    
  3. Probably validation error can be recognized in autocompletion? I didn't examine what feature provide autocompletion, but it can break it?

@asdine asdine modified the milestones: v0.16.0, v0.17.0 Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants