-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting lists of dependent packages #3
Comments
If there is interest, I can make a PR here with the script used to create the above. Otherwise I can just put it in a gist and link to it from here. Given that it is based on web scraping assuming specific HTML element names and not an official API, I am sure it is pretty fragile and will likely break at some point in the future when GitHub redesigns their website. Ideally this info will be made available through the official API at some future point. |
Awesome! A PR in this repo would be great I think. |
I exported the list above to a markdown table (
|
It is useful to know who is using scikit-image when planning funding proposals. I was looking a little into how one can extract information such as that presented in GitHub's dependents view.
So, far it seems that it is possible to query the dependencies of scikit-image via an experimental API, but there is no public API for querying the dependent packages. You can browse it manually, but that is tedious given that there are > 1,000 packages in our case!
However, I found that with some modifications to the web scraping script from this stackoverflow post, we can extract this information into a list of packages along with the # of stars and forks for each dependencies.
We can then combine that with use of PyGitHub to retrieve "topics" associated with each of these packages, so that we can sort by number of stars and filter out to only those packages containing certain terms in the repository name or topic list (e.g. "brain, cell, mri, microscopy, etc.").
Running this script on scikit-image gave a list of 857 packages that depend on scikit-image and are active (i.e. are not represented by a "ghost" icon in the web interface). Of these:
The numbers above are for ALL application areas. I excluded packages with < 5 stars and then filtered to retain only those that have names/topics related to bioimaging, microscopy, medical imaging, etc. This results in a final list of
Topic Terms Used to Determine Biological Application Status
bioimage_search_terms = [ 'airways', 'anatomy', 'arteries', 'astrocytes', 'atomic-force-microscopy', 'afm', 'axon', 'bioimage-informatics', 'bioinformatics', 'biologists', 'biomedical-image-processing', 'bionic-vision', 'biophysics', 'brain-connectivity', 'brain-imaging', 'brain-mri', 'brain-tumor-segmentation', 'brats', 'calcium', 'cancer-research', 'cell-biology', 'cell-detection', 'cell-segmentation', 'computational-pathology', 'connectome', 'connectomics', 'cryo-em', 'ct-data', 'deconvolution-microscopy', 'dicom', 'dicom-rt', 'digital-pathology-data', 'digital-pathology', 'digital-slide-archive', 'dmri', 'electron-microscopy', 'electrophysiology', 'fluorescence', 'fluorescence-microscopy-imaging', 'fmri', 'fmri-preprocessing', 'functional-connectomes', 'healthcare-imaging', 'histology', 'voxel', 'microorganism-colonies', 'microscopy', 'microscopy-images', 'neuroimaging', 'medical', 'medical-image-computing', 'medical-image-processing', 'medical-images', 'medical-imaging', 'mri', 'myelin', 'neural-engineering', 'neuroanatomy', 'neuroimaging', 'neuroimaging-analysis', 'neuropoly', 'neuroscience', 'nih-brain-initiative', 'openslide', 'pathology', 'pathology-image', 'radiation-oncology', 'radiation-physics', 'raman', 'retinal-implants', 'scanning-probe-microscopy', 'scanning-tunnelling-microscopy', 'single-cell-imaging', 'slide-images', 'spectroscopy', 'spinalcord', 'stm', 'stem', 'stitching', 'structural-connectomes', 'tissue-localization', 'tomography', 'volumetric-images', 'whole-slide-image', 'whole-slide-imaging', ]Search terms in project name string
reponame_terms = [ 'brain', 'cell', 'ecg', 'eeg', 'medi', 'mri', 'neuro', 'pathol', 'retin', 'slide', 'spectro', 'tissue', 'tomo',]A detailed list of dependent biology-related packages with 5 or more stars is given in the table in next comment
Two caveats:
1.) The above list is probably a lower bound. There may be other packages that did not list any "topic" terms and did not use an obvious biology-related term in the project name.
2.) The above list is only downstream Packages. There are probably an order of magnitude more one-off repositories of individual users that are making use of scikit-image, but not packaging/distributing their code.
The text was updated successfully, but these errors were encountered: