Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: support timestamp subtractions #1346

Merged
merged 13 commits into from
Feb 5, 2025
Merged

chore: support timestamp subtractions #1346

merged 13 commits into from
Feb 5, 2025

Conversation

sycai
Copy link
Contributor

@sycai sycai commented Jan 31, 2025

This PR enables subtraction operations for for Timestamp and datetime types.

We don't support mix-match timestamp and datetime values in the same operations. It's not allowed in Ibis anyway.

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jan 31, 2025
@sycai sycai marked this pull request as ready for review January 31, 2025 00:59
@sycai sycai requested review from a team as code owners January 31, 2025 00:59
@sycai sycai requested a review from shobsi January 31, 2025 00:59
@sycai sycai requested review from tswast and TrevorBergeron and removed request for shobsi January 31, 2025 00:59
@sycai sycai changed the title chore: support timestamp subtractions chore: support timestamp subtractions for series Feb 4, 2025
@sycai sycai changed the title chore: support timestamp subtractions for series chore: support timestamp subtractions Feb 4, 2025
@@ -58,6 +58,7 @@ def compile_sql(
# TODO: get rid of output_ids arg
assert len(output_ids) == len(list(node.fields))
node = set_output_names(node, output_ids)
node = nodes.bottom_up(node, rewrites.op_dynamic_dispatch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this will need to be top-down rather than bottom-up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, though I think it shouldn't matter much, because all node schemas are already stable at this point.

Comment on lines 113 to 114
# Need to dispatch op before compilation to keep it consistent with the compile_sql() call
return self._compile_node(nodes.bottom_up(node, rewrites.op_dynamic_dispatch))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not run this on every node, instead, lets revive the dead _preprocess helper and apply all the pre-transforms there to the entire tree before running compile_node on the root

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG. Moved the code to _preprocess

from bigframes.core.rewrite.order import pull_up_order
from bigframes.core.rewrite.slices import pullup_limit_from_slice, rewrite_slice

__all__ = [
"legacy_join_as_projection",
"try_row_join",
"rewrite_slice",
"op_dynamic_dispatch",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think something like "convert_duration_to_int" capture the high level intent best

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named it "rewrite_timedelta_ops" to better indicate that we are replacing the operators, not the values.

Comment on lines +39 to +40
# TODO(b/394354614): FilterByNode and OrderNode also contain expressions. Need to update them too.
return root
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as long as we get support those nodes before anybody starts using this!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR soon to follow!

Comment on lines +51 to +55
if isinstance(expr, ex.OpExpression):
updated_inputs = tuple(
map(lambda x: _rewrite_expressions(x, schema), expr.inputs)
)
return _rewrite_op_expr(expr, updated_inputs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this will also need to be top-down rather than bottom-up.

Copy link
Contributor Author

@sycai sycai Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible to do this top-down, because we cannot get the input types by first processing the parent node. The parent node output type can only be decided once we have rewrite all the subtrees.

if not dtypes.is_datetime_like(input_types[0]):
raise TypeError("expected timestamp input")

return dtypes.TIMEDETLA_DTYPE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I'm glad we haven't officially announced this feature

Comment on lines 808 to 816
def sub(
self, other: float | int | pandas.Timestamp | datetime.datetime | Series
) -> Series:
return self._apply_binary_op(other, ops.sub_op)

def rsub(self, other: float | int | Series) -> Series:
def rsub(
self, other: float | int | pandas.Timestamp | datetime.datetime | Series
) -> Series:
return self._apply_binary_op(other, ops.sub_op, reverse=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to consider giving up on annotating other allowed dtypes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. The operators themselves will perform type check for us anyway.

Comment on lines 2089 to 2093
def _has_timestamp_type(input: typing.Any) -> bool:
if isinstance(input, Series):
return bigframes.dtypes.is_datetime_like(input.dtype)

return isinstance(input, (pandas.Timestamp, datetime.datetime))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@sycai sycai requested a review from TrevorBergeron February 5, 2025 01:34
@sycai sycai enabled auto-merge (squash) February 5, 2025 18:59
@sycai sycai merged commit 86b7e72 into main Feb 5, 2025
21 of 23 checks passed
@sycai sycai deleted the sycai_timestamp_diff branch February 5, 2025 20:34
arwas11 pushed a commit that referenced this pull request Feb 6, 2025
* chore: support timestamp subtractions

* Fix format

* use tree rewrites to dispatch timestamp_diff operator

* add TODO for more node updates

* polish the code and fix typos

* fix comment

* add rewrites to compile_raw and compile_peek_sql
tswast pushed a commit that referenced this pull request Feb 6, 2025
* feat: add GeoSeries.from_xy

* add from_xy test and update ibis types

* update geoseries notebook with from_xy

* Update docstring example

* fix doctstring lint error

* return GeometryDtype() for all ibis geo types

* chore: support timestamp subtractions (#1346)

* chore: support timestamp subtractions

* Fix format

* use tree rewrites to dispatch timestamp_diff operator

* add TODO for more node updates

* polish the code and fix typos

* fix comment

* add rewrites to compile_raw and compile_peek_sql

* chore: add a tool to upload tpcds data to bigquery. (#1367)

* chore: add a tool to upload tpcds data to bigquery.

* update error type

* update docstring

---------

Co-authored-by: Shenyang Cai <[email protected]>
Co-authored-by: Huan Chen <[email protected]>
shuoweil pushed a commit that referenced this pull request Feb 6, 2025
* feat: add GeoSeries.from_xy

* add from_xy test and update ibis types

* update geoseries notebook with from_xy

* Update docstring example

* fix doctstring lint error

* return GeometryDtype() for all ibis geo types

* chore: support timestamp subtractions (#1346)

* chore: support timestamp subtractions

* Fix format

* use tree rewrites to dispatch timestamp_diff operator

* add TODO for more node updates

* polish the code and fix typos

* fix comment

* add rewrites to compile_raw and compile_peek_sql

* chore: add a tool to upload tpcds data to bigquery. (#1367)

* chore: add a tool to upload tpcds data to bigquery.

* update error type

* update docstring

---------

Co-authored-by: Shenyang Cai <[email protected]>
Co-authored-by: Huan Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants