Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postgres: force generic plan for better nullability inference. #3541

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

joeydewaal
Copy link
Contributor

@joeydewaal joeydewaal commented Oct 3, 2024

Currently when using the query! and query_as! macro's sqlx uses explain (verbose, format json) execute {statement_id} to get the query plan of a prepared statement/query. Then looks through the plans and tries to be more accurate with the type inference.

If a prepared statement has query parameters it needs to fill then in with a valid argument and uses NULL for every one because it is always a valid argument.

This causes the query planner to give a query plan back that is optimized when the function arguments are all NULL, this is not always correct.

This pr forces postgres to use a generic execution plan when describing a prepared statement with function arguments. That way it won't make optimizations based on the given parameters.

Fixes #3539, #1265, #2796 (comment), #3408, #2127

@joeydewaal joeydewaal changed the title Postgres: improve query and query_as type inference. Postgres: force generic plan for better nullability inference. Oct 5, 2024
@abonander
Copy link
Collaborator

I think providing a constant value may end up with the same problem in different queries.

I'm wondering if at a certain point, it would be easier to parse the SQL, just to get an idea of its structure. We'd still want to use the database for type analysis though.

@joeydewaal
Copy link
Contributor Author

I think providing a constant value may end up with the same problem in different queries.

I'm wondering if at a certain point, it would be easier to parse the SQL, just to get an idea of its structure. We'd still want to use the database for type analysis though.

I just found out about force_generic_plan which gets the same result without providing a random value for a type.

@abonander
Copy link
Collaborator

abonander commented Oct 5, 2024

That sounds like it has potential, yeah. Reducing the planning overhead might help with compile times a bit too.

Make sure to do SET LOCAL so it doesn't persist for the lifetime of the connection (you'll need to do it in a transaction). describe() can be called at runtime too.

@joeydewaal joeydewaal marked this pull request as ready for review October 5, 2024 20:26
@abonander
Copy link
Collaborator

Technically breaking since it can change the output of analysis.

Also kind of why I'm starting to think maybe we should parse the SQL. The output of EXPLAIN isn't meant to be stable at all.

@joeydewaal
Copy link
Contributor Author

I think that's going to make the macro's quite a bit slower though

@abonander
Copy link
Collaborator

It'd be on-par with parsing the EXPLAIN output, and one less round-trip to the server (which is going to be where most of the overhead is anyway).

@joeydewaal
Copy link
Contributor Author

It'd be on-par with parsing the EXPLAIN output, and one less round-trip to the server (which is going to be where most of the overhead is anyway).

I had some time on my hands the last few weeks so I thought I'd try and experiment a bit with parsing the sql myself.
Currently sqlx needs to:

  1. Prepare the query.
  2. Fetch the attnotnull option.
  3. Get the query plan from the database (for Postgres).

Which is 3 queries per macro invocation.
Step 2 and 3 combined take roughly between 750μs-3ms from what i've tested (Postgres on localhost).

With my implementation:

  1. Prepare the query.
  2. Fetch the database tables and cols (1 query that can be cached.)
  3. Parse and go through the query.

So this means 1 query for every macro invocation and 1 query once for all the macro's (per database).
My implementation takes between 10μs-250μs in debug mode (depending on how big the query is).

So it's quite a bit faster and this also gives a lot more control over nullability which make the macro's more correct. My implementation is however not complete and far from ready, it doesn't support schema's also the integration in sqlx with query parameters is difficult but it's enough to experiment a bit.

so it's definitely feasible but a lot of work to get right.

Link to repo
Implementation in sqlx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Left join with a WHERE condition falsely marks columns as non-nullable
2 participants