Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Investigate scipy-doctest for more convenient doctests #29027

Open
lesteve opened this issue May 16, 2024 · 7 comments · May be fixed by #30496
Open

DOC Investigate scipy-doctest for more convenient doctests #29027

lesteve opened this issue May 16, 2024 · 7 comments · May be fixed by #30496

Comments

@lesteve
Copy link
Member

lesteve commented May 16, 2024

I learned about scipy-doctest recent release in the Scientific Python Discourse announcement. Apparently, scipy-doctest has been used internally in numpy and scipy for doctests for some time. In particular it allows floating point comparisons.

After a bit of work from us setting everything up, it would allow to have a few sprint / first good issues.

There is quite a few places where we used the doctest ellipsis, the quick and dirty following regexp finds 595 lines:

git grep -P '\d+\.\.\.' | wc -l

If you are not sure what I am talking about, this is the ... for doctest in rst for docstrings e.g. the last line of this snippet:

>>> from sklearn import svm, datasets
>>> from sklearn.model_selection import cross_val_score
>>> X, y = datasets.load_iris(return_X_y=True)
>>> clf = svm.SVC(random_state=0)
>>> cross_val_score(clf, X, y, cv=5, scoring='recall_macro')
array([0.96..., 0.96..., 0.96..., 0.93..., 1.        ])

An example of a doctest with a spurious failure recently: #29140 (comment)

If you are wondering about the difference to pytest-doctestplus look at this. This does seem a bit unfortunate to have scipy/scipy_doctest and scientific-python/pytest-doctestplus but oh well (full disclosure I did not have time to look into the history) ...

@ev-br
Copy link

ev-br commented Dec 15, 2024

If there's still interest in scipy-doctest, feel free to ping me

@lesteve
Copy link
Member Author

lesteve commented Dec 16, 2024

Thanks! There is still some interest, I did have a quick go at one point but failed to make it work in the amount of time I had 😅

Maybe you can comment on the general approach? In particular, I am guessing that you can still run the previous doctests unchanged with scipy-doctest?

Otherwise I guess I (or someone else) would need to look at how numpy or scipy is doing it and adapt it for scikit-learn (for example we already have some logic to skip all doctests for numpy<2 or if matplotlib is not installed):

https://github.com/numpy/numpy/blob/8e6914d1d1586248aac518082d632839767eab91/numpy/conftest.py#L153-L232

https://github.com/scipy/scipy/blob/2fde036ec404512d695568a4259ba16f6c9156b5/scipy/conftest.py#L395-L551

After a quick look it looks like the main things are:

  • warnings control (probably because warnings are turned into errors)
  • managing a list of tests to skip or xfail
  • ignore some tests during collection

@ev-br
Copy link

ev-br commented Dec 16, 2024

Yes, the goal is to be able to run doctests unmodified and make them whitespace insensitive, be able to use # may vary instead of # doctest: +SKIP, not worry about numpy 2.2 vs 1.x formatting etc. Basically, just focus on examples being for the reader, instead of wrangling with the tools.

To set it up:

When scipy-doctest is pip-installed in the environment, it automatically hooks into --doctest-modules.

The bulk of scipy/numpy conftest.py is indeed,

  • warnings control (probably because warnings are turned into errors)

Yes, that. And we just like to filter out irrelevant known warnings in some cases. Those parts are not essential.

managing a list of tests to skip or xfail

Indeed. The list of things to skip is kept in the tool, not in the docstrings. Because this is irrelevant to a reader of the docs, or a function is deprecated (so its import emits a DeprecationWarning) etc.

ignore some tests during collection

Yes, service parts of the library. In numpy, it's distutils for example; in scipy's dev.py, there's an additional list. pytest-extra-ignore is just syntax sugar on top of pytest --ignore path/to/ignore.

I don't know if scikit-learn has ReST tutorials. If you do and if you want them to be doctested, numpy has $ spin check-tutorials.


If you want, we can have a chat in real time.

@ev-br
Copy link

ev-br commented Dec 16, 2024

Also, while we're on "general approach" topic: the setup in both scipy and numpy is to maintain a clear separation between doctests and actual unit testing. For one, --doctest-modules only runs doctests, not regular unit tests.

@lesteve
Copy link
Member Author

lesteve commented Dec 16, 2024

Thanks a lot for the details, I need to take a closer look and I'll definitely ping you if I get stuck.

@lesteve lesteve linked a pull request Dec 17, 2024 that will close this issue
@lesteve
Copy link
Member Author

lesteve commented Dec 17, 2024

Turns out it was slightly easier than anticipated, I opened #30496 which seems to work well enough and do not require any changes to our doctests. For now, it seems like we don't need as many tweaks as numpy or scipy, which is good.

The next step would be to open a meta-issue to invite contributors clean things up, one file at a time to remove ... in doctests by relying on built-in scipy-doctest floating point comparison

@ev-br
Copy link

ev-br commented Dec 17, 2024

Yay, great!

Re removing ... --- most likely tangentially related if at all --- scipy/scipy_doctest#147 was thinking about actually supporting doctest.ellipsis with numpy 2.x output format.

@lesteve lesteve changed the title DOC Investigate scipy-doctest for better doctests DOC Investigate scipy-doctest for more convenient doctests Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants