-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Larger than memory images: performant and scalable distributed implementation for workstations and clusters #1062
Open
GFleishman
wants to merge
56
commits into
MouseLand:main
Choose a base branch
from
GFleishman:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+929
−14
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…tributed update my local working branch with package changes, hopefully fixes logging
Merge branch 'main' of https://github.com/MouseLand/cellpose into distributed
…into distributed
…ogs. (1) io.logger_setup modified to accept alternative log file to stdout stream (2) distributed_eval creates datetime stamped log directory (3) individual workers create their own log files tagged with their name/index
…by default; no additional coding needed to leverage workstations with gpus
…tifffile - tifffile.imread(..., aszarr=True, ...) returns non-serializable array with single tiff input
…ust releases gpus and hard codes 1 cpu per worker - stitching is cheap, this will always fit
…anelia LSF cluster cases
…cases in best way available given limitations of tiff files
This reverts commit 509ffca.
This reverts commit 767b752.
…ytorch v2+ untested but may also require this change
…version of pytorch; should be conditional and submitted in separate PR
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR solves #1061 by adding
distributed_segmentation.py
, a self-contained module that provides the ability to segment larger-than-memory images on a workstation or cluster. Images are partitioned into overlapping blocks that are each processed separately, in parallel or in series (e.g. if you only have a single gpu). Per-block results are seamlessly stitched into a single segmentation of the entire larger-than-memory image.Windows, Linux, and MacOS workstations, as well as LSF clusters, are automatically supported.
Other cluster managers, such as SLURM or SGE, require implementing your own dask cluster class, which is a good opportunity to submit a PR and be added to the author list. I am happy to advise anyone doing this.
The preferred input is a Zarr or N5 array, however folders full of tiff images are also supported. Single large tiff files can be converted to Zarr with the module itself. Your workstation or cluster can be arbitrarily partitioned into workers with arbitrary resources, e.g. "10 workers, 2 cpu cores each, 1 gpu each" or if you have a workstation with a single gpu, "1 worker with 8 cpu cores and 1 gpu." Computation never exceeds the given worker specification - so you can process huge datasets without occupying your entire machine.
Compatible with any Cellpose model. Small crops can be tested before committing to a big data segmentation by calling the function which runs on each individual block directly. A Foreground mask can be provided ensuring no time is wasted on voxels that do not contain sample. An arbitrary list of preprocessing steps can be distributed along with Cellpose itself, so if you need to smooth or sharpen or anything else before segmenting, you don't need to do it in advance and save a processed version of your large data - you can just distribute those preprocessing functions along with the segmentation.
Installation from scratch in a fresh conda environment tested successfully by @snoreis on a machine with the following specs:
OS: Windows 11 Pro
CPU: 16-core Threadripper PRO 3955WX
GPU: NVIDIA RTX A5000
Of course also tested in my own environments.
Workstation:
OS: Rocky Linux 9.3
CPU: 8-core Intel Sky Lake
GPU: 1x NVIDIA Tesla L4 15GB
Cluster:
OS: Rocky Linux 9.3
CPU: 100 cores Intel Sky Lake
GPU: 100x NVIDIA Tesla L4 15GB
List of functions provided, all have verbose docstrings covering all inputs and outputs:
distributed_eval
: run cellpose on a big image on any machineprocess_block
: the function that is run on each block from a big dataset, can be called on its own for testingnumpy_array_to_zarr
: create a zarr array, preferred input todistributed_eval
wrap_folder_of_tiffs
: represent folder of tiff files as zarr array without duplicating dataNew dependencies are correctly set and install successfully with source:
pip install -e .[distributed]
Examples
Run distributed Cellpose on half the resources of a workstation with 16 cpus, 1 gpu, and 128GB system memory:
Run distributed Cellpose on an LSF cluster with 128 GPUs (e.g. Janelia cluster)
(Note this example is identical to the previous one, with only a few small changes to the
cluster_kwargs
; i.e. it is easy to go back and forth between workstations and clusters.)Testing a single block before running a distributed computation:
Wrap a folder of tiff images/tiles into a single Zarr array:
Converting a large single tiff image to Zarr: