Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: support addition between a timestamp and a timedelta #1369

Merged
merged 20 commits into from
Feb 10, 2025

Conversation

sycai
Copy link
Contributor

@sycai sycai commented Feb 6, 2025

No description provided.

@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Feb 6, 2025
@sycai sycai marked this pull request as ready for review February 6, 2025 17:48
@sycai sycai requested review from a team as code owners February 6, 2025 17:48
@sycai sycai requested a review from tswast February 6, 2025 17:48
@@ -402,7 +404,12 @@ def literal_to_ibis_scalar(
return bigframes_vendored.ibis.null()

scalar_expr = bigframes_vendored.ibis.literal(literal)
if ibis_dtype:
if isinstance(literal, datetime.timedelta):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow pandas, numpy, and pyarrow scalars here, too?

Suggested change
if isinstance(literal, datetime.timedelta):
if isinstance(literal, (datetime.timedelta, numpy.timedelta64, pandas.Timedelta, pyarrow.DurationScalar)):

Copy link
Contributor Author

@sycai sycai Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Pandas timedelta is a subclass of datetime.timedelta, but I can make it more explicit.

Added support for numpy.

I eventually decided not to implement pyarrow duration because:

  • It causes more unrelated failures than I expected in Python 3.9 and 3.10 envs for some reasons.
  • Pandas does not support adding pyarrow duration literals to series

Maybe we can add support for pyarrow in the future, but at this moment it may not be worth it

def timedelta_to_micros(td: typing.Union[pd.Timedelta, datetime.timedelta]) -> int:
if isinstance(td, pd.Timedelta):
# td.value returns total nanoseconds.
return td.value // 1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like pandas.Timedelta has units: https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.html

Please make this robust to various units.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is for the constructor arg, but as per the docs, "The .value attribute is always in ns."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Pandas always converts the input to values in nanoseconds, no matter what unit we use in the constructor.

@@ -578,6 +580,8 @@ def _infer_dtype_from_python_type(type: type) -> Dtype:
return DATE_DTYPE
if issubclass(type, datetime.time):
return TIME_DTYPE
if issubclass(type, datetime.timedelta):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, do we need numpy, pandas, and pyarrow object detection here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Left the PyArrow out for this one.

@sycai sycai force-pushed the sycai_timestamp_add branch from 146c690 to c680948 Compare February 7, 2025 04:23
@sycai sycai force-pushed the sycai_timestamp_add branch from c9a5021 to b5b69cc Compare February 7, 2025 17:21
Comment on lines 420 to 426
if isinstance(
literal,
(datetime.timedelta, pd.Timedelta, numpy.timedelta64),
):
# numpy timedelta is compatible with Ibis, so we process them separately.
return bigframes_vendored.ibis.literal(
utils.timedelta_to_micros(literal), ibis_dtype
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just replace literal expressions (as well as column defs in leaves) in the tree when we replace all the ops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I moved the conversion to the rewrite module

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to rename the module rewrite.operators to something like rewrite.timedelta_expression to better reflect its behavior, perhaps in another PR

Comment on lines +745 to +747
@scalar_op_compiler.register_binary_op(ops.timestamp_add_op)
def timestamp_add_op_impl(x: ibis_types.TimestampValue, y: ibis_types.IntegerValue):
return x + y.to_interval("us")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this syntax look like from this? Is it a clean TIMESTAMP_ADD(timestamp_col, INTERVAL duration MICROSECOND)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions#timestamp_add

Though this has less to do with SQL syntax than the ease of wiring code. Ibis does support interval + timestamp and timestamp + interval, but that means I also need to check which side of the input is integer here.

If I can standardize the order of types in the rewrite module, then I don't need to check ibis type here :)

@sycai sycai requested a review from tswast February 7, 2025 18:34
@sycai sycai requested a review from TrevorBergeron February 7, 2025 18:34
@sycai sycai merged commit b598aa8 into main Feb 10, 2025
22 of 23 checks passed
@sycai sycai deleted the sycai_timestamp_add branch February 10, 2025 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants