-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: support timestamp subtractions #1346
Conversation
bigframes/core/compile/compiler.py
Outdated
@@ -58,6 +58,7 @@ def compile_sql( | |||
# TODO: get rid of output_ids arg | |||
assert len(output_ids) == len(list(node.fields)) | |||
node = set_output_names(node, output_ids) | |||
node = nodes.bottom_up(node, rewrites.op_dynamic_dispatch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this will need to be top-down rather than bottom-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, though I think it shouldn't matter much, because all node schemas are already stable at this point.
bigframes/core/compile/compiler.py
Outdated
# Need to dispatch op before compilation to keep it consistent with the compile_sql() call | ||
return self._compile_node(nodes.bottom_up(node, rewrites.op_dynamic_dispatch)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets not run this on every node, instead, lets revive the dead _preprocess
helper and apply all the pre-transforms there to the entire tree before running compile_node
on the root
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG. Moved the code to _preprocess
bigframes/core/rewrite/__init__.py
Outdated
from bigframes.core.rewrite.order import pull_up_order | ||
from bigframes.core.rewrite.slices import pullup_limit_from_slice, rewrite_slice | ||
|
||
__all__ = [ | ||
"legacy_join_as_projection", | ||
"try_row_join", | ||
"rewrite_slice", | ||
"op_dynamic_dispatch", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like "convert_duration_to_int" capture the high level intent best
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named it "rewrite_timedelta_ops" to better indicate that we are replacing the operators, not the values.
# TODO(b/394354614): FilterByNode and OrderNode also contain expressions. Need to update them too. | ||
return root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as long as we get support those nodes before anybody starts using this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR soon to follow!
if isinstance(expr, ex.OpExpression): | ||
updated_inputs = tuple( | ||
map(lambda x: _rewrite_expressions(x, schema), expr.inputs) | ||
) | ||
return _rewrite_op_expr(expr, updated_inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this will also need to be top-down rather than bottom-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's possible to do this top-down, because we cannot get the input types by first processing the parent node. The parent node output type can only be decided once we have rewrite all the subtrees.
bigframes/operations/datetime_ops.py
Outdated
if not dtypes.is_datetime_like(input_types[0]): | ||
raise TypeError("expected timestamp input") | ||
|
||
return dtypes.TIMEDETLA_DTYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. I'm glad we haven't officially announced this feature
bigframes/series.py
Outdated
def sub( | ||
self, other: float | int | pandas.Timestamp | datetime.datetime | Series | ||
) -> Series: | ||
return self._apply_binary_op(other, ops.sub_op) | ||
|
||
def rsub(self, other: float | int | Series) -> Series: | ||
def rsub( | ||
self, other: float | int | pandas.Timestamp | datetime.datetime | Series | ||
) -> Series: | ||
return self._apply_binary_op(other, ops.sub_op, reverse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to consider giving up on annotating other
allowed dtypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense. The operators themselves will perform type check for us anyway.
bigframes/series.py
Outdated
def _has_timestamp_type(input: typing.Any) -> bool: | ||
if isinstance(input, Series): | ||
return bigframes.dtypes.is_datetime_like(input.dtype) | ||
|
||
return isinstance(input, (pandas.Timestamp, datetime.datetime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
* chore: support timestamp subtractions * Fix format * use tree rewrites to dispatch timestamp_diff operator * add TODO for more node updates * polish the code and fix typos * fix comment * add rewrites to compile_raw and compile_peek_sql
* feat: add GeoSeries.from_xy * add from_xy test and update ibis types * update geoseries notebook with from_xy * Update docstring example * fix doctstring lint error * return GeometryDtype() for all ibis geo types * chore: support timestamp subtractions (#1346) * chore: support timestamp subtractions * Fix format * use tree rewrites to dispatch timestamp_diff operator * add TODO for more node updates * polish the code and fix typos * fix comment * add rewrites to compile_raw and compile_peek_sql * chore: add a tool to upload tpcds data to bigquery. (#1367) * chore: add a tool to upload tpcds data to bigquery. * update error type * update docstring --------- Co-authored-by: Shenyang Cai <[email protected]> Co-authored-by: Huan Chen <[email protected]>
* feat: add GeoSeries.from_xy * add from_xy test and update ibis types * update geoseries notebook with from_xy * Update docstring example * fix doctstring lint error * return GeometryDtype() for all ibis geo types * chore: support timestamp subtractions (#1346) * chore: support timestamp subtractions * Fix format * use tree rewrites to dispatch timestamp_diff operator * add TODO for more node updates * polish the code and fix typos * fix comment * add rewrites to compile_raw and compile_peek_sql * chore: add a tool to upload tpcds data to bigquery. (#1367) * chore: add a tool to upload tpcds data to bigquery. * update error type * update docstring --------- Co-authored-by: Shenyang Cai <[email protected]> Co-authored-by: Huan Chen <[email protected]>
This PR enables subtraction operations for for Timestamp and datetime types.
We don't support mix-match timestamp and datetime values in the same operations. It's not allowed in Ibis anyway.