-
Notifications
You must be signed in to change notification settings - Fork 1.4k
OVER, PARTITION BY and WINDOW #2618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This looks amazing! We are currently focused on getting Elixir v1.7 out but we plan to review this in a week or two. Thanks for this great work!
For MySQL, we should emit the correct SQL and then it will fail if they are using an old version. |
I want to go over this a little more in depth, but this all looks pretty great! Very excited about this feature. (Also hi everyone! I still read all the issues, I'm just constantly busy) |
lib/ecto/query/builder.ex
Outdated
@@ -266,6 +266,12 @@ defmodule Ecto.Query.Builder do | |||
{expr, params_acc} | |||
end | |||
|
|||
def escape({:row_number, _, [arg]}, type, params_acc, vars, env) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If row_number is a regular "function", you don't need this here. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. you can just remove this function and the default transformations will apply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I just remove this escape
function, we will get "is not a valid query expression."
row_number
is a kind of aggregation function, but it can be invoked only over a window, so we can't just mix it with regular aggregators. And there are a bunch of functions like this (rank
, dense_rank
, first_value
, etc) which also have to be implemented. Should I add a call_type
instead of escape
for them?
btw, do you have any plans to extend the list of aggregate functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I add a call_type instead of escape for them?
You are right. call_type
is the way to go here.
btw, do you have any plans to extend the list of aggregate functions?
@Anber we could add more functions but it would be nice to know how they are handled across databases. Which ones are supported by all databases (PG, MySQL and MSSQL)? Which ones are PG specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which ones are supported by all databases
Maybe it can be made extendable? DB driver (or even user's code) can provide supported functions to the builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is that we need the type information to be able to use in the query syntax correctly. So I think for now we should support what is common. Folks can use fragments for everything else.
Hi @Anber! I did some review of the codebase and of the feature and it looks great! The I have only two comments for now. You added this example:
This feature (interpolation) is actually really hard to implement. That's because if However, we do have another feature that is very similar: the
My suggestion is the following: remove |
test/ecto/adapters/postgres_test.exs
Outdated
|
||
test "count over unknown window" do | ||
query = Schema | ||
|> window([r], w as partition_by r.x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use as
as operator here is really smart but a bit foreign here since it is not an actual operator. My suggestion is the following:
windows(r, w: partition_by(r.x))
i.e. rename the macro to windows
and allow it to be a keyword list of windows names and their definitions. Similar to how order_by
itself works. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to find a clear syntax for WINDOW w AS …
, but I missed this obvious variant. Thank you!
@josevalim, fields = [:y, :z]
Schema |> select([r], count(r.x) |> over(partition_by(r.x, order_by: ^fields))) |
[" WINDOW " | | ||
intersperse_map(windows, ", ", fn | ||
{name, definition} -> | ||
[Atom.to_string(name), " AS ", partition_by(definition, sources, query)] end)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to use quoted_name
or similar instead of Atom.to_string(name)
to avoid SQL injection attacks.
end | ||
|
||
defp partition_by(%PartitionByExpr{fields: fields, order_bys: order_bys}, sources, %Query{} = query) do | ||
partition_query = %Query{ query | order_bys: order_bys } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No spaces after {
and before }
.
end | ||
|
||
defp expr({:over, _, [agg, name]}, sources, %Ecto.Query{ windows: windows } = query) when is_atom(name) do | ||
if Keyword.has_key?(windows, name) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation should happen in Ecto.Query.Planner. The adapter should always assume that query is correct (unless it doesn't support a particular construct).
defp expr({:over, _, [agg, name]}, sources, %Ecto.Query{ windows: windows } = query) when is_atom(name) do | ||
if Keyword.has_key?(windows, name) do | ||
aggregate = expr(agg, sources, query) | ||
[aggregate, " OVER ", Atom.to_string(name)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to quote the name
here too.
lib/ecto/query/builder.ex
Outdated
{{:{}, [], [:over, [], [aggregate, nil]]}, params_acc} | ||
end | ||
|
||
def escape({:over, _, [aggregate, {window_name, _, nil}]}, type, params_acc, vars, env) when is_atom(window_name) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering... now that we changed to windows
, should we make the window name an atom here instead of a binding? So it would be over(count(p.foo), :window_name)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you support the atom clause below, so maybe it is a matter of getting rid of this one.
lib/ecto/query/builder.ex
Outdated
@@ -463,6 +489,8 @@ defmodule Ecto.Query.Builder do | |||
defp merge_fragments([h1], []), | |||
do: [{:raw, h1}] | |||
|
|||
@over_agg_1 ~w(row_number rank dense_rank percent_rank cume_dist)a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of those return integers, don't they? for those, I think the correct return would be {:any, :integer}
. We will also need to document those as aggregates in Ecto.Query.API
.
def build(binding, expr, env) do | ||
{fields, opts} = escape(expr) | ||
{fields, {_, :acc}} = Builder.escape(fields, :any, {%{}, :acc}, binding, env) | ||
order_bys = order_by(binding, opts, env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be Enum.map(opts, &order_by(&1, binding, env)
and we would be able to remove the loop from order_by
?
lib/ecto/query/builder/windows.ex
Outdated
Builder.apply_query(query, __MODULE__, [windows], env) | ||
end | ||
|
||
@spec apply(Ecto.Queryable.t, Tuple.t) :: Ecto.Query.t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@spec
in the wrong place. :)
lib/ecto/query/builder/windows.ex
Outdated
@spec apply(Ecto.Queryable.t, Keyword.t) :: Ecto.Query.t | ||
def apply(%Ecto.Query{} = query, definitions) do | ||
definitions | ||
|> Enum.reduce(query, &apply_window/2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer something like:
def apply(%{windows: windows} = query, definitions) do
%{query | windows: windows ++ Enum.map(definitions, &validate_window!(&1, windows))}
end
Otherwise we are accessing the query and prepending the list of windows multiple times.
lib/ecto/query/inspect.ex
Outdated
@@ -210,6 +238,18 @@ defimpl Inspect, for: Ecto.Query do | |||
{:type, [], [value, tag]} |> expr(names, part) | |||
end | |||
|
|||
defp expr_to_string({:over, _, [agg, nil]}, _string, names, part) do | |||
"#{expr(agg, names, part)} |> over" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use pipes when inspecting.
lib/ecto/query/inspect.ex
Outdated
|
||
defp partition_by(%{ order_bys: [] } = definition, names) do | ||
fields = expr(definition, names) | ||
"partition_by #{fields}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use parens around partition_by
here as we always use parens on inspect.
test/ecto/adapters/postgres_test.exs
Outdated
|> select([r], r.x) | ||
|> windows([r], w: partition_by r.x) | ||
|
||
# FIXME: move to the builder specs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be moved the planner tests. :)
Hi @Anber! I have done a more careful review now. They are mostly minor details but there is one big issue: you have not changed In order to make this work, there are a couple things you need to have in mind:
Although I wrote them as 1 -> 2 -> 3, probably the best order of tackling them is 3 -> 2 -> 1, as you need 3 to build 2 and so on. I am really excited about this! If you have any questions, please let me know! |
I have updated the list above. :) |
It turned out to be a bit more complicated than I thought :) |
lib/ecto/query/builder.ex
Outdated
dense_rank: [:integer], | ||
percent_rank: [:float], | ||
cume_dist: [:float], | ||
ntile: [:integer, :integer], # TODO: what should we do with functions which require exactly integer type? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @josevalim! I have some questions here.
The first question is about integer
and bigint
. In some cases, they are interchangeable but if function requires integer
we can't just cast a value to bigint
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Anber we don't make a distinction from this side of things, so just :integer
should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't, but queries like Schema |> select([r], nth_value(r.x, "42") |> over)
throw a error function nth_value(integer, bigint) does not exist
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. You need to cast or pass a proper value. There is not much we can do here because Postgres is thinking that it is expecting a bigint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we can't do anything here right now, but maybe there is a possibility to add more strict type system in the future?
lib/ecto/query/builder.ex
Outdated
@@ -282,6 +309,17 @@ defmodule Ecto.Query.Builder do | |||
{{:{}, [], [:coalesce, [], [left, right]]}, params_acc} | |||
end | |||
|
|||
def escape({:over, _, [aggregate | args]}, type, {params, acc}, vars, env) do | |||
{aggregate, {params, acc}} = escape(aggregate, type, {params, Map.put(acc, :over, true) }, vars, env) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second question is about acc
argument of escape/5
. I want to know whether window function is called inside over
or not. So, I use acc
here for storing a flag which shows that we are in over
scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we get to do that: why do you need to know if the window function is called inside over
or not? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because window function without over
can't be called? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to check this. Let the SQL engine fail later on. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not? We can prevent runtime error just by adding a few lines of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the goal, then it is better to not mixture those functions with escape_call
at all and explicitly check for them in the over
clause. We cannot use the accumulator because the accumulator can be of any type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be checked in over
clause, because there is no over
in case of error :)
I don't see any usages of acc
. How and when it can be used? Maybe I can just override acc
for over
branch of AST and restore it to the original value before exit from escape(:over, …)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't be checked in over clause, because there is no over in case of error :)
What I meant is that we will only traverse those functions if we are inside over. So over should not call escape
on the first argument, it should rather call escape_window_function
or something similar.
I don't see any usages of acc. How and when it can be used?
@Anber the accumulator is used by select
to collect take/2 calls. So please don't change the semantics of the accumulator. It is used in some places to collect information about the AST being traversed. Even if it is not an issue now, it may be an issue in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it's not a context. I've got it. Also, your solution with escape_window_function
sounds better. Thank you :)
lib/ecto/query/builder.ex
Outdated
escape_call(call, List.duplicate(type, arity), params, vars, env) | ||
end | ||
|
||
defp escape_call({name, _, args}, types, params, vars, env) when is_list(types) and length(types) >= length(args) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also added support of function with different arity. If you approve it, we can replace tuples here to lists like [out_type | in_types]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Anber I don't think we should generalize the length(types) >= length(args)
as we are getting a behaviour specific to two functions and generalizing it to the whole codebase. This is bound to cause issues further down the road. I think you should move the zip logic to before you call escape_call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I would prefer if for now we only added row_number
and the *_rank
functions, treat them as aggregators and keep this simple for now. Otherwise there is a chance for this PR to only continue growing in size, making it harder to review and merge. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it doesn't matter how many window functions we will implement, because even row_number
has zero arity and it can't be implemented by using current call_type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Anber I see. So maybe we should do this:
-
Change call_type to return
{[in], out}
-
Have a specific annotation that tells all functions with variadic arity. For example
@varargs [:lag, :lead, :row_number]
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modification of call_type
definitely looks better than my solution, but I didn't want to touch it without your blessing :)
Maybe I can just modify my @over_agg
and use it in call_type
?
defp call_type(agg, 1) when agg in @over_agg_names, do: Map.get(@over_agg, agg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Anber that's a linear cost. We should implement the proper call_type/2
clauses and not rely on the map.
However, given the previous discussion about over+window functions, maybe we shouldn't use call_type
at all and keep them completely distinct, as they seem to have a bunch of behaviour that is specific to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a linear cost
And what if we unfold call_type on compile-time?
for {agg, types} <- @over_agg do
defp call_type(unquote(agg), 1), do: unquote(types)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine but given the solution above, I think we won't be overriding call_type anyway, will we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we won't.
Thank you for your patience!
Hi @josevalim!
I hope I have fixed it, but I don't like how I escape windows. Typically macros are injected to the AST on escape stage for expanding to |
@Anber looks great to me! Now we need to go ahead with MySQL support and docs. :) |
Done! |
Hi @josevalim |
@Anber shall we discuss it in another issue? I also think this behaviour may been fixed on master. Btw, we should merge this PR next week, thanks! :) |
@josevalim It definitely will be another issue and another PR :) |
Hi @josevalim |
@Anber it is blocking on us right now. No worries. :) |
@Anber damn you. This is awesome PR and obsoletes all my work that I have done on supporting window functions via fragments. Now I need to focus on adding grouping sets to Ecto. |
@hauleth btw, I haven't implemented |
Hi @josevalim! |
eb00b50
to
3306b30
Compare
I have started working on this on a branch and it should be merged later today. :) |
{expr_cache, {params, cacheable?}} = | ||
Enum.map_reduce exprs, {params, true}, fn {_, expr}, {params, cacheable?} -> | ||
{params, current_cacheable?} = cast_and_merge_params(:windows, query, expr, params, adapter) | ||
{expr_to_cache(expr), {params, cacheable? and current_cacheable?}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Key must be part of the cache.
@@ -1023,6 +1049,29 @@ defmodule Ecto.Query.Planner do | |||
{type, [expr | fields], from} | |||
end | |||
|
|||
# OVER () | |||
defp collect_fields({:over, _, [call, nil]} = expr, fields, from, query, take) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably make this arity 1.
escape(quote(do: lag(:a, 1)), [], __ENV__) | ||
end | ||
|
||
assert_raise Ecto.Query.CompileError, ~r"window function lag/0 is undefined.", fn -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing dot.
assert all(query) == ~s{SELECT count(s0."x") OVER "w2" FROM "schema" AS s0 WINDOW "w1" AS (PARTITION BY s0."a"), "w2" AS (PARTITION BY s0."x" ORDER BY s0."a", s0."b" DESC)} | ||
end | ||
|
||
test "count over unknown window" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be mover to the planner suite.
assert all(query) == ~s{SELECT count(s0.`x`) OVER `w2` FROM `schema` AS s0 WINDOW `w1` AS (PARTITION BY s0.`a`), `w2` AS (PARTITION BY s0.`x` ORDER BY s0.`a`, s0.`b` DESC)} | ||
end | ||
|
||
test "count over unknown window" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be mover to the planner suite.
Thank you @Anber! I have added some notes but I can tackle them locally as they are minor. Amazing work overall! |
❤️ 💚 💙 💛 💜 |
Hi @Anber! I have noticed that SQL databases allow you to specify
This looks very nice with over:
So if you want to go ahead and add support for ranges, the infrastructure is there and we can start this discussion. The only thing left from this PR is to allow dynamic fields in |
Hi @josevalim! It looks like you've rewritten almost whole my commit :) So, now I need some time to rewrite our application from fork to master. It will take some hours and if everything is ok, I will start to implement ranges. |
Hi
PR implements window functions for PostgreSQL.
It can be used instead of fragments for the more clear and flexible solution of problems like coderplanets/coderplanets_server#16 or absinthe-graphql/absinthe_relay#100 (comment)
MySQL also supports window functions since 8.0, but we don't know version at compile time. So I can add the same implementation, but it will throw runtime errors if a user uses MySQL <8, or I can add compile-time errors like "… is not supported by MySQL".
If you are ok with this feature, I will add documentation and some more unit tests.
:windows