Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clib.conversion._to_numpy: Add tests for Python sequence of datetime-like objects #3758

Merged
merged 13 commits into from
Jan 13, 2025

Conversation

seisman
Copy link
Member

@seisman seisman commented Jan 9, 2025

Description of proposed changes

This PR adds tests for Python sequence (list or tuple) of datetime-like objects, which usually can't be converted to np.datetime64 dtype directly by np.ascontiguousarray.

Tested datetime-like objects include:

  • ISO8601 strings
  • Python datetime/date objects
  • np.datetime64
  • pd.Timestamp
  • mixed types

Need to need that, this PR still can't convert a sequence of pyarrow timestamp to np.datetime64, as shown below:

In [1]: import numpy as np

In [2]: import pyarrow as pa

In [3]: import datetime

In [4]: array =  [
   ...:     pa.scalar(datetime.datetime(2018, 1, 1)),
   ...:     pa.scalar(datetime.datetime(2018, 2, 1)),
   ...:     pa.scalar(datetime.datetime(2018, 3, 1)),
   ...:     pa.scalar(datetime.datetime(2018, 4, 1, 1, 2, 3)),
   ...: ]

In [5]: array
Out[5]: 
[<pyarrow.TimestampScalar: '2018-01-01T00:00:00.000000'>,
 <pyarrow.TimestampScalar: '2018-02-01T00:00:00.000000'>,
 <pyarrow.TimestampScalar: '2018-03-01T00:00:00.000000'>,
 <pyarrow.TimestampScalar: '2018-04-01T01:02:03.000000'>]

In [6]: np.ascontiguousarray(array)
Out[6]: 
array([<pyarrow.TimestampScalar: '2018-01-01T00:00:00.000000'>,
       <pyarrow.TimestampScalar: '2018-02-01T00:00:00.000000'>,
       <pyarrow.TimestampScalar: '2018-03-01T00:00:00.000000'>,
       <pyarrow.TimestampScalar: '2018-04-01T01:02:03.000000'>],
      dtype=object)

In [7]: np.ascontiguousarray(array, dtype=np.datetime64)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 1
----> 1 np.ascontiguousarray(array, dtype=np.datetime64)

ValueError: Could not convert object to NumPy datetime

Preview for affected tutorial: https://pygmt-dev--3758.org.readthedocs.build/en/3758/tutorials/advanced/date_time_charts.html#using-python-s-datetime

@weiji14
Copy link
Member

weiji14 commented Jan 9, 2025

Can confirm that this patch should fix the incorrect tutorial plot reported in #3757 (comment). See https://pygmt-dev--3758.org.readthedocs.build/en/3758/tutorials/advanced/date_time_charts.html#using-python-s-datetime

image

However, it doesn't fix the list[pd.Timestamp] bug originally reported in #3757 (comment)? Edit: Sorry, it does fix the bug, accidentally checked out the wrong branch.

pygmt/tests/test_clib_to_numpy.py Show resolved Hide resolved
pygmt/tests/test_clib_to_numpy.py Show resolved Hide resolved
pygmt/tests/test_clib_to_numpy.py Outdated Show resolved Hide resolved
@seisman seisman added the maintenance Boring but important stuff for the core devs label Jan 9, 2025
@seisman seisman changed the title clib.conversion._to_numpy: Add tests for arrays with Python built-in datetimes or ISO-strings clib.conversion._to_numpy: Add tests for arrays with Python sequence of datetime-like objects Jan 9, 2025
@seisman seisman changed the title clib.conversion._to_numpy: Add tests for arrays with Python sequence of datetime-like objects clib.conversion._to_numpy: Add tests for Python sequence of datetime-like objects Jan 9, 2025
@seisman seisman marked this pull request as ready for review January 9, 2025 02:51
@seisman
Copy link
Member Author

seisman commented Jan 9, 2025

This PR is now ready for review. I feel we should split this PR into two small PRs, one labelled as "bug", and another one labeled as "maintenance".

I guess we also need to release v0.14.1.

@weiji14
Copy link
Member

weiji14 commented Jan 9, 2025

I'm looking into why our unit tests didn't catch the bug at #3757, these lines in test_plot_datetime actually uses a list[datetime.datetime] object:

# the Python built-in datetime and date
x = [datetime.date(2018, 1, 1), datetime.datetime(2019, 1, 1)]
y = [8.5, 9.5]
fig.plot(x=x, y=y, style="i0.2c", pen="1p")

Maybe we should extend the unit test to include pandas.Timestamp? E.g. adding these lines:

    # list of pandas.Timestamp
    x = [pd.Timestamp("2020-1-1"), pd.Timestamp("2021-1-1")]
    y = [10.5, 11.5]
    fig.plot(x=x, y=y, style="h0.2c", pen="1p")

I guess we also need to release v0.14.1.

Yes, wish we hadn't merged some of those deprecation PRs yet 🙂 We could either revert those, or branch off of commit b1c984a to make a release perhaps? Maybe start a new issue for this.

@seisman
Copy link
Member Author

seisman commented Jan 9, 2025

Here is the callstack: virtualfile_from_vectors->vectors_to_arrays->_to_numpy.

The issue is that, any datetime-like Python sequences will be converted to np.object_ dtype, then the following codes convert np.object_ to np.str_:

# Check if a np.object_ array can be converted to np.str_.
if array.dtype == np.object_:
with contextlib.suppress(TypeError, ValueError):
return np.ascontiguousarray(array, dtype=np.str_)

So, we're actually passing datetime strings into GMT. In the test_plot_datetime test, x = [datetime.date(2018, 1, 1), datetime.datetime(2019, 1, 1)] is converted to strings array ["2018-01-01", "2019-01-01"], which can be recognized by GMT, but in our docs and the pd.timestamp case, the array is converted to a string array with whitespaces (e.g., ['2024-12-01 00:00:00', '2024-12-15 00:00:00']), so it can't be recognized by GMT.

@seisman
Copy link
Member Author

seisman commented Jan 9, 2025

Maybe we should extend the unit test to include pandas.Timestamp? E.g. adding these lines:

    # list of pandas.Timestamp
    x = [pd.Timestamp("2020-1-1"), pd.Timestamp("2021-1-1")]
    y = [10.5, 11.5]
    fig.plot(x=x, y=y, style="h0.2c", pen="1p")

See the explanations above. As long as _to_numpy can convert a datetime-like object into np.datetime64 array correctly, PyGMT should be able to plot/process it. So, I think the newly added tests are enough.

@seisman seisman added this to the 0.15.0 milestone Jan 9, 2025
@seisman seisman added the needs review This PR has higher priority and needs review. label Jan 9, 2025
@seisman
Copy link
Member Author

seisman commented Jan 9, 2025

This PR is now ready for review. I feel we should split this PR into two small PRs, one labelled as "bug", and another one labeled as "maintenance".

I've opened a separate PR at #3760 and also changed the base branch to fix/datetime.

@seisman seisman changed the base branch from main to fix/datetime January 9, 2025 11:15
@seisman seisman added the skip-changelog Skip adding Pull Request to changelog label Jan 9, 2025
@weiji14 weiji14 modified the milestones: 0.15.0, 0.14.1 Jan 9, 2025
Copy link
Contributor

github-actions bot commented Jan 10, 2025

Summary of changed images

This is an auto-generated report of images that have changed on the DVC remote

Status Path
modified pygmt/tests/baseline/test_plot_datetime.png

Image diff(s)

Added images

Modified images

Path Old New
test_plot_datetime.png

Report last updated at commit 8b8767e

@seisman seisman added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Jan 11, 2025
Base automatically changed from fix/datetime to main January 13, 2025 00:11
@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Jan 13, 2025
@seisman seisman merged commit 900e8d4 into main Jan 13, 2025
16 checks passed
@seisman seisman deleted the to_numpy/datetime branch January 13, 2025 00:22
seisman added a commit that referenced this pull request Jan 17, 2025
seisman added a commit that referenced this pull request Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs skip-changelog Skip adding Pull Request to changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants