-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate table of scikit-image's entire runtime API #6905
base: main
Are you sure you want to change the base?
Conversation
Class properties don't seem to be picked up yet...
Attributes of modules or classes whose module cannot be determined at runtime are still included as long as the module of the parent can be determined. This makes public variables visible. The visit_api function should probably be refactored and simplified going forward. Currently, print_full_api returns 14484 objects.
I wasn't part of the initial conversation, so I'm not sure of the goal here. How is this different from what we publish in the API reference guide? The basic guide is: |
@stefanv the goal here is to ensure that semantically-similar functions have a consistent API, so that it is easy to experiment switching one for another. (Same name for required arguments, same name and input types for similar optional keyword arguments, same output types, etc.) Kind of a functional equivalent to sklearn's consistent Classifier API having identical fit and predict methods. Historically we have struggled with this — for example, watershed used to be in morphology instead of segmentation. The first step is to identify all our public functions and group them according to the (currently imprecisely defined) criterion of semantic similarity. |
This script is able to iterate the member tree of skimage. The discovered information can be exported into two CSV files, one on the general discovered tree and one on discovered parameters and return objects of callables.
Basically, I view this as a very useful tool and start to inspect our API. This script is able to walk the entire member tree of a module that is given to it. So the degree of automation is significantly higher and its easy to keep up with changes to our API. Additionally, the implementation is able to record public attributes (of modules and classes) that are not functions or classes themselves (e.g. the color tuples such as This might also be something that we or even other projects can build on. In addition to this, I can think of a few applications of the bat that might find this useful:
I've created a Google Sheet with the output from |
I have a vague memory that @Carreau has worked on something similar. |
If we can figure out how this can be generally useful, we can host it under |
I also think @Erotemic may be interested. |
Happy to do so. I can already apply this to SciPy, NumPy, Matplotlib, etc (though I have to limit recursion depth for the former two). With a bit of polishing this should be generally applicable. 😊 |
@stefanv yes I originally worked on https://github.com/carreau/frappuccino, but I would suggest looking at https://github.com/mkdocstrings/griffe which seem more maintained and does so. |
Wow, griffe looks very cool! Thanks for recommending it. I wasn't able to find any of these tools when I checked but should have known that I was reinventing the wheel a little bit. 🙈 griffe already seems to collect all the information that this little script does. It exports to JSON but it shouldn't be to hard to transform it into a tabular format if we deem something like https://docs.google.com/spreadsheets/d/1sfF0MsotNDqOHA3WldyKYk0gZKRhqAbR-HVVGfLLcJY useful... |
@lagru cool! Other automated columns I'd add:
We then have to think about manual (?) columns for our input types and return types, based on the typing stuff we've talked about: image, binary image, label image, coordinates, indices (different from coordinates — e.g. rr, cc tuple), spacing, dtype, etc. |
(or maybe we do typing first and then add types to the automated columns) |
Can do. After talking a bit with @tlambert03 on Zulip, I'm currently having a naive agenda in mind. It's not really a well defined or detailed list of bullet points yet, but useful to communicate my thoughts anyway 😅 :
Of those 3 points, 1. is probably the one that will require the most communication and consensus seeking effort. That's why I want to start that as early as possible. |
I presume here we will not focus so much on the underlying type (class), but rather of the intent of the parameter (often an array). E.g., you would annotate with |
This reverts commit 7576912. I think the runtime based approach might be more useful in the long-term. This might make it easier for packages such as NumPy to apply this script as well if they are interested.
Also fix the bug where None would show up as an empty default for parameters.
@jni, I added the suggested columns to the script and updated https://docs.google.com/spreadsheets/d/1sfF0MsotNDqOHA3WldyKYk0gZKRhqAbR-HVVGfLLcJY accordingly. I intend to stay with the runtime based approach for now (as opposed to using griffe's static approach) as this actually checking what the user sees. This might not make much of a difference for us but e.g. for a project like NumPy it would. |
Hello scikit-image core devs! There hasn't been any activity on this PR for more than 180 days. I have marked it as "dormant" to make it easy to find. |
Haha, I'm sure this will come in handy at some point. Though, rather as a standalone tool. Until I have the time to transfer it, I'd keep this open. :) |
Description
In previous discussions, it was suggested to create a resource that spells out scikit-image's complete Python API. This is meant as a first step in identifying common patterns in our API to help with the skimage2 transition and also with typing our API (see the proposal at scikit-image/skimage-archive#29). @jni suggested to aggregate this information in a spreadsheet and pointed to similar work in scikit-image/boilerplate-utils#5. Depending on how this shakes out, I think this could be a very useful tool in general to inspect and measure our API.
While still work in progress, I'm already happy to share the current state. Some notes on the current behavior:
skimage.util.unique.unique_rows
is the source path, whileskimage.util.unique_rows
andskimage.util.unique.unique_rows
are discovery paths.__wrapped__
attribute) are treated as an additional discovery path.ski.color.ahx_from_rgb
. This was tricky to do.lazy.attach_stub
overwrites the the__dir__
function of modules. Because this function is used under the hood to crawl the object tree, things that are actually reachable but not discover-able at runtime are hidden from the script! E.g.img_as_float64
is available in the currentskimage/__init__.py
but not discover-able because it's not included in the matching PYI file. I'm slightly in favor of arguing that the script should only list objects that are part of the object tree as defined__dir__
. So this is actually working as intended.Checklist
./doc/examples
(new features only)./benchmarks
, if your changes aren't covered by anexisting benchmark
For reviewers
later.
__init__.py
.doc/release/release_dev.rst
.example, to backport to v0.19.x after merging, add the following in a PR
comment:
@meeseeksdev backport to v0.19.x
run-benchmark
label. To rerun, the labelcan be removed and then added again. The benchmark output can be checked in
the "Actions" tab.