Generate table of scikit-image's entire runtime API #6905

lagru · 2023-04-21T16:41:47Z

Description

In previous discussions, it was suggested to create a resource that spells out scikit-image's complete Python API. This is meant as a first step in identifying common patterns in our API to help with the skimage2 transition and also with typing our API (see the proposal at scikit-image/skimage-archive#29). @jni suggested to aggregate this information in a spreadsheet and pointed to similar work in scikit-image/boilerplate-utils#5. Depending on how this shakes out, I think this could be a very useful tool in general to inspect and measure our API.

While still work in progress, I'm already happy to share the current state. Some notes on the current behavior:

The script assumes that there is exactly one unique source path for each object, but multiple discovery paths are possible. E.g. skimage.util.unique.unique_rows is the source path, while skimage.util.unique_rows and skimage.util.unique.unique_rows are discovery paths.
Methods that aren't overwritten in inherited classes, are considered additional discovery paths. Currently the script just uses the first discovered path as the source path here.
Decorated function (having a __wrapped__ attribute) are treated as an additional discovery path.
Variable declarations are detected and listed as well, e.g. ski.color.ahx_from_rgb. This was tricky to do.
lazy.attach_stub overwrites the the __dir__ function of modules. Because this function is used under the hood to crawl the object tree, things that are actually reachable but not discover-able at runtime are hidden from the script! E.g. img_as_float64 is available in the current skimage/__init__.py but not discover-able because it's not included in the matching PYI file. I'm slightly in favor of arguing that the script should only list objects that are part of the object tree as defined __dir__. So this is actually working as intended.

Checklist

Docstrings for all functions
Gallery example in ./doc/examples (new features only)
Benchmark in ./benchmarks, if your changes aren't covered by an
existing benchmark
Unit tests
Clean style in the spirit of PEP8
Descriptive commit messages (see below)

For reviewers

Check that the PR title is short, concise, and will make sense 1 year
later.
Check that new functions are imported in corresponding __init__.py.
Check that new features, API changes, and deprecations are mentioned in
doc/release/release_dev.rst.
There is a bot to help automate backporting a PR to an older branch. For
example, to backport to v0.19.x after merging, add the following in a PR
comment: @meeseeksdev backport to v0.19.x
To run benchmarks on a PR, add the run-benchmark label. To rerun, the label
can be removed and then added again. The benchmark output can be checked in
the "Actions" tab.

Class properties don't seem to be picked up yet...

Attributes of modules or classes whose module cannot be determined at runtime are still included as long as the module of the parent can be determined. This makes public variables visible. The visit_api function should probably be refactored and simplified going forward. Currently, print_full_api returns 14484 objects.

stefanv · 2023-04-21T20:41:23Z

I wasn't part of the initial conversation, so I'm not sure of the goal here. How is this different from what we publish in the API reference guide?

The basic guide is: __all__ on any submodule, and only one level deep, except for a few notable exceptions: filters.rank, future.* etc.

jni · 2023-04-24T03:06:25Z

@stefanv the goal here is to ensure that semantically-similar functions have a consistent API, so that it is easy to experiment switching one for another. (Same name for required arguments, same name and input types for similar optional keyword arguments, same output types, etc.) Kind of a functional equivalent to sklearn's consistent Classifier API having identical fit and predict methods.

Historically we have struggled with this — for example, watershed used to be in morphology instead of segmentation.

The first step is to identify all our public functions and group them according to the (currently imprecisely defined) criterion of semantic similarity.

This script is able to iterate the member tree of skimage. The discovered information can be exported into two CSV files, one on the general discovered tree and one on discovered parameters and return objects of callables.

lagru · 2023-04-24T20:41:05Z

How is this different from what we publish in the API reference guide?

Basically, I view this as a very useful tool and start to inspect our API. This script is able to walk the entire member tree of a module that is given to it. So the degree of automation is significantly higher and its easy to keep up with changes to our API. Additionally, the implementation is able to record public attributes (of modules and classes) that are not functions or classes themselves (e.g. the color tuples such as skimage.color.rgb_colors.bisque) which are not visible in our HTML API documentation.

This might also be something that we or even other projects can build on. In addition to this, I can think of a few applications of the bat that might find this useful:

Semi-automatically generate a diff of the API between releases. Might be something we want to include in some way in the release notes.
Check if API as defined by __all__ is consistent with what is reachable via public "import paths". E.g. the function warn_inconsistent__all__ in this script already does this. In the future I would want to go in a direction were there is no difference between both "public API definitions". This script can help with testing these kind of assumptions.
Find signatures with missing type annotations / automated testing of these.

I've created a Google Sheet with the output from python tools/inspect_api.py --csv -v -- skimage here: https://docs.google.com/spreadsheets/d/1sfF0MsotNDqOHA3WldyKYk0gZKRhqAbR-HVVGfLLcJY
For some reason the filters are not showing, though you can create temporary filter in the Data > Filter Views menu.

stefanv · 2023-04-24T20:45:44Z

I have a vague memory that @Carreau has worked on something similar.

stefanv · 2023-04-24T20:46:25Z

This might also be something that we or even other projects can build on.

If we can figure out how this can be generally useful, we can host it under scientific-python.

stefanv · 2023-04-24T20:47:14Z

I also think @Erotemic may be interested.

lagru · 2023-04-24T20:53:51Z

If we can figure out how this can be generally useful, we can host it under scientific-python.

Happy to do so. I can already apply this to SciPy, NumPy, Matplotlib, etc (though I have to limit recursion depth for the former two). With a bit of polishing this should be generally applicable. 😊

Carreau · 2023-04-25T07:36:52Z

@stefanv yes I originally worked on https://github.com/carreau/frappuccino, but I would suggest looking at https://github.com/mkdocstrings/griffe which seem more maintained and does so.

lagru · 2023-04-25T09:20:58Z

Wow, griffe looks very cool! Thanks for recommending it. I wasn't able to find any of these tools when I checked but should have known that I was reinventing the wheel a little bit. 🙈

griffe already seems to collect all the information that this little script does. It exports to JSON but it shouldn't be to hard to transform it into a tabular format if we deem something like https://docs.google.com/spreadsheets/d/1sfF0MsotNDqOHA3WldyKYk0gZKRhqAbR-HVVGfLLcJY useful...

jni · 2023-05-02T01:05:48Z

@lagru cool!

Other automated columns I'd add:

number of required arguments
number of keyword-only arguments

We then have to think about manual (?) columns for our input types and return types, based on the typing stuff we've talked about: image, binary image, label image, coordinates, indices (different from coordinates — e.g. rr, cc tuple), spacing, dtype, etc.

jni · 2023-05-02T01:06:06Z

(or maybe we do typing first and then add types to the automated columns)

lagru · 2023-05-02T22:23:00Z

Other automated columns I'd add

Can do. After talking a bit with @tlambert03 on Zulip, I'm currently having a naive agenda in mind. It's not really a well defined or detailed list of bullet points yet, but useful to communicate my thoughts anyway 😅 :

Start a draft PR that adds a type to every public signature. Each type that we use goes into a list in skimage.types (or similar).
- The intention is to accumulate a list of used types, that forces and enables us to think about which types we want to support in our API.
- I think it's more natural to use our existing workflow and accumulate this list in a Python file. That way we can review and use code insight to jump between type declaration and their usages. Though, I intend to use the generated table to check for missing types, and comparison with our docstring type annotation.
- I don't intend to apply a type checker at this stage yet. PEP 646 is not supported by mypy yet. Instead I want to use simple placeholders, to get the discussion started about where we want to go. E.g. something like RGBArrayLike = ArrayLike[*Shape, L[3], *Shape] (not sure if that will be possible) but you get the idea.
Once we've settled on a set of types we intend to use, start moving our API and documentation in that direction.
- We don't have to wait for 1. to be finished here. Agreed on stuff can already be updated (or postponed to skimage2); though definitely in separate PRs.
At this stage support for things like PEP 646 will hopefully be more mature and we can start to actually apply and enforce type checking.

Of those 3 points, 1. is probably the one that will require the most communication and consensus seeking effort. That's why I want to start that as early as possible.

stefanv · 2023-05-02T23:23:36Z

The intention is to accumulate a list of used types, that forces and enables us to think about which types we want to support in our API.

I presume here we will not focus so much on the underlying type (class), but rather of the intent of the parameter (often an array). E.g., you would annotate with Image, Labels, Mask, etc.? Otherwise, you'll have a bunch of Array or Any.

This reverts commit 7576912. I think the runtime based approach might be more useful in the long-term. This might make it easier for packages such as NumPy to apply this script as well if they are interested.

Also fix the bug where None would show up as an empty default for parameters.

lagru · 2023-05-25T23:01:42Z

@jni, I added the suggested columns to the script and updated https://docs.google.com/spreadsheets/d/1sfF0MsotNDqOHA3WldyKYk0gZKRhqAbR-HVVGfLLcJY accordingly. I intend to stay with the runtime based approach for now (as opposed to using griffe's static approach) as this actually checking what the user sees. This might not make much of a difference for us but e.g. for a project like NumPy it would.

github-actions · 2023-11-23T02:25:57Z

Hello scikit-image core devs! There hasn't been any activity on this PR for more than 180 days. I have marked it as "dormant" to make it easy to find.
To our contributors, thank you for your contribution and apologies if this contribution fell through the cracks! Hopefully this ping will help bring some fresh attention to the review. If you need help, you can always reach out on our forum
If you think that this PR is no longer relevant, you may close it, or we may do it at some point (either way, it will be done manually). If you think the PR is valuable but you no longer have the bandwidth to update it, please let us know, so that someone can take it over. 🙏

lagru · 2023-11-23T09:53:27Z

Haha, I'm sure this will come in handy at some point. Though, rather as a standalone tool. Until I have the time to transfer it, I'd keep this open. :)

lagru added 3 commits April 21, 2023 12:08

Add first draft of api_table

ca7973a

Class properties don't seem to be picked up yet...

Deal with discovery loops and decorated objects

520cb6a

lagru added the 🤖 type: Infrastructure CI, packaging, tools and automation label Apr 21, 2023

Fix typo in print_public_api

d75f309

Complete alpha version of the script

15ef597

This script is able to iterate the member tree of skimage. The discovered information can be exported into two CSV files, one on the general discovered tree and one on discovered parameters and return objects of callables.

Include signature in _tree.csv as well

a398b83

lagru added 5 commits May 24, 2023 16:29

Merge branch 'main' into api-table

6c60eaa

WIP Use griffe instead of runtime walker

7576912

Revert "WIP Use griffe instead of runtime walker"

f88c252

This reverts commit 7576912. I think the runtime based approach might be more useful in the long-term. This might make it easier for packages such as NumPy to apply this script as well if they are interested.

Add param and status columns to tree table

7babea2

Also fix the bug where None would show up as an empty default for parameters.

Simplify script

e192ca7

Fix is_alias detection

a133865

lagru mentioned this pull request Oct 6, 2023

Include py.typed file in distribution (PEP 561) #7073

Merged

github-actions bot added the 😴 Dormant No recent activity label Nov 23, 2023

github-actions bot removed the 😴 Dormant No recent activity label Nov 24, 2023

lagru mentioned this pull request Oct 1, 2024

Basic infrastructure for dispatching to a backend #7520

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate table of scikit-image's entire runtime API #6905

Generate table of scikit-image's entire runtime API #6905

lagru commented Apr 21, 2023

stefanv commented Apr 21, 2023

jni commented Apr 24, 2023

lagru commented Apr 24, 2023 •

edited

Loading

stefanv commented Apr 24, 2023

stefanv commented Apr 24, 2023

stefanv commented Apr 24, 2023

lagru commented Apr 24, 2023

Carreau commented Apr 25, 2023

lagru commented Apr 25, 2023

jni commented May 2, 2023

jni commented May 2, 2023

lagru commented May 2, 2023

stefanv commented May 2, 2023 •

edited

Loading

lagru commented May 25, 2023

github-actions bot commented Nov 23, 2023

lagru commented Nov 23, 2023

Generate table of scikit-image's entire runtime API #6905

Are you sure you want to change the base?

Generate table of scikit-image's entire runtime API #6905

Conversation

lagru commented Apr 21, 2023

Description

Checklist

For reviewers

stefanv commented Apr 21, 2023

jni commented Apr 24, 2023

lagru commented Apr 24, 2023 • edited Loading

stefanv commented Apr 24, 2023

stefanv commented Apr 24, 2023

stefanv commented Apr 24, 2023

lagru commented Apr 24, 2023

Carreau commented Apr 25, 2023

lagru commented Apr 25, 2023

jni commented May 2, 2023

jni commented May 2, 2023

lagru commented May 2, 2023

stefanv commented May 2, 2023 • edited Loading

lagru commented May 25, 2023

github-actions bot commented Nov 23, 2023

lagru commented Nov 23, 2023

lagru commented Apr 24, 2023 •

edited

Loading

stefanv commented May 2, 2023 •

edited

Loading