Releases: ml-struct-bio/cryodrgn
v3.4.3: Making movies, improving filtering interface, and fixes to landscape analysis
This is a minor release in which we are introducing a new utility for making volume movies using model analysis results, as well as making some fixes and improvements to existing features:
New visualizations
There is a new command cryodrgn_utils make_movies
that automatically searches through the output folders created by commands such as cryodrgn analyze
and cryodrgn analyze_landscape
and produces .mp4
movies of reconstructed volumes using ChimeraX (which must be installed separately). For example, if volumes corresponding to k-means clusters were produced by cryodrgn analyze ... --ksample 50
, make_movies
will add movie.mp4
under analyze.<epoch>/kmeans50/
with an animation across the fifty k-means volumes:
movie.mp4
See cryodrgn_utils make_movies -h
for more details! We have also added some new types of plots (scree plots and grid plots of PCA components) to the landscape analysis Jupyter notebooks.
Improving interactive filtering
Thanks to some help and feedback from the folks at Doppio (see #425, #426) we improved the interface for the interactive particle filtering command cryodrgn filter
by adding buttons for choosing to save the selection (or not) rather than requiring an additional query step through the command-line:
Addressing known issues
v3.4.2: AMP for ab-initio reconstruction; faster landscape analysis and pose parsing
In this patch release we have drastically improved the runtimes of several existing features, as well as addressed some known issues and bugs:
Improving Runtimes
- extended the use of mixed precision training (as implemented in torch.cuda.amp), already the default for
train_nn
andtrain_vae
, to the ab-initio reconstruction commandsabinit_homo
andabinit_het
, resulting in observed speedups of 2-4x - vectorized rotation matrix computation in
parse_pose_star
for a ~100x speedup of this step and a 2x speedup of the command as a whole (#143) - returned volume evaluation in
analyze_landscape_full
to the GPU resulting in 10x speedup (#405)
Fixing Known Issues
- incorrect batch processing causing out-of-memory issues when using chunked output in
downsample
(#412) - error when using
--flip
inanalyze_landscape_full
(#409) parse_mrc
bug in landscape analysis notebook (#413)
Please let us know if you have any feedback or comments!
v3.4.1: Support for float16-formatted input
This is a patch release to address some minor issues and improve compatibility of cryoDRGN with the default output number format used by the most recent versions of RELION:
- adding support for
np.float16
format input .mrcs files, which are now cast tonp.float32
as necessary for Fourier transform operations (#404) models.PositionalDecoder.eval_volume()
now keeps volumes in GPU- better progress log messages in
backproject_voxel
; improved control over logging using--log-interval
to match other reconstruction commands:
(INFO) (lattice.py) (03-Oct-24 10:52:54) Using circular lattice with radius=150
(INFO) (backproject_voxel.py) (03-Oct-24 10:52:55) fimage 0 — 0.0% done
(INFO) (backproject_voxel.py) (03-Oct-24 10:54:02) fimage 200 — 4.0% done
(INFO) (backproject_voxel.py) (03-Oct-24 10:55:10) fimage 400 — 8.0% done
(INFO) (backproject_voxel.py) (03-Oct-24 10:56:18) fimage 600 — 12.0% done
(INFO) (backproject_voxel.py) (03-Oct-24 10:57:26) fimage 800 — 16.0% done
filter_cs
to replacewrite_cs
, which is now considered deprecated with a suitable warning message, and fixing issues with filtering .cs files produced by the most recent cryoSPARC versions (#150)- using 0.5 * 0.143-threshold of the “No Mask” FSC curve to start applying phase-randomization correction to the “Tight Mask” FSC curve instead of 0.75 * 0.143-threshold of the “Tight Mask” FSC curve when the tight mask curve never crosses the 0.143 threshold (previously defaulted to the Nyquist limit):
v3.4.0 | v3.4.1 |
---|---|
- fixing bug with relative output paths given to
cryodrgn downsample
- addressing
grid_sample
warning messages concerning unspecifiedalign_corner
argument - extending
analyze_landscape
to accept non-binary masks, ensuring compatibility with e.g.cryodrgn_utils gen_mask
- harmonizing use of
datadir
in .cs files with use for .star files - better error and log messages for mask operations,
filter_pkl
- fixing
do_pose_sgd
error for interactive filtering - using virtual environments for release GitHub workflow actions
release
andbeta_release
, getting rid of unnecessarywheel
upgrading
v3.4.0: Plotting class labels, RELION 3.1 support, and phase-randomization for FSCs
In this minor release we are adding several new features and commands, as well as expanding a few existing ones and introducing some key refactorings to the codebase to make these changes easier to implement.
New features
-
full support for RELION 3.1
.star
files with optics values stored in a separate grouped table before or after the main table (#241, #40, #10)- refactored
Starfile
class now has properties.apix
and.resolution
that return particle-wise optics values for commonly used parameters, as well as methods.get_optics_values()
and.set_optics_values()
for any parameter- these methods automatically use the optics table if available
cryodrgn parse_ctf_star
can now load all particle-wise optics values from the .star file itself instead of the current behavior of relying upon user input for parameters such as A/px, resolution, voltage, spherical aberration, etc., or just taking the first value found in the file
- refactored
-
backproject_voxel
now computes FSC threshold values corrected for mask overfitting using high resolution phase randomization as done in cryoSPARC, as well as showing FSC curves and threshold values for various types of masks:
-
cryodrgn_utils plot_classes
for creating plots of cryoDRGN results colored by a given set of particle class labels-
for now, only creates 2D kernel density plots of the latent space embeddings clustered using UMAP and PCA, but more plots will be added in the future:
$ cryodrgn_utils plot_classes 002_train-vae_dim.256 9 --labels published_labels_major.pkl --palette viridis --svg
analyze.9/umap_kde_classes.png
-
Improvements to existing features
-
backproject_voxel
also now creates a new directory using-o/--outdir
into which it places output files, instead of naming all files after the output reconstructed volume-o/--outfile
- files within this directory will always have the same names across runs:
backproject.mrc
the full reconstructed volumehalf_map_a.mrc
,half_map_b.mrc
reconstructed half-maps using an odd/even particle splitfsc-vals.txt
all five FSC curves in space-delimited formatfsc-plot.png
a plot of these five FSC curves as shown above
- files within this directory will always have the same names across runs:
-
downsample
can now downsample each of the individual files in a stack referenced by a .star or .txt file, returning a new .star file or .txt file referencing the new downsampled stack- used by specifying a .star or .txt file as
-o/--outfile
when using a .star or .txt file as input:cryodrgn downsample my_particle_stack.star -D 128 -o particles.128.star --datadir folder_with_subtilts/ --outdir my_new_datadir/
- used by specifying a .star or .txt file as
-
cryodrgn_utils fsc
can now take three volumes as input, in which case the first volume will be used to generate masks to produce cryoSPARC-style FSC curve plots including phase randomization for the “tight” mask (see New features above) -
cryodrgn_utils plot_fsc
is now more flexible with the types of input files it can accept for plotting, including.txt
files with the new type of cryoSPARC-style FSC curve output frombackproject_voxel
-
cryodrgn filter --force
for less interactivity after the selection has been made -
filter_mrcs
prints both original and new number of particles; generates output file name automatically if not given -
cryodrgn abinit_het
savesconfigs
alongside model weights inweights.pkl
for easier access and output checkpoint identification
Addressing bugs and other issues
- better axis labels for FSC plotting, passing Apix values from
backproject_voxel
(#385) cryodrgn filter
doesn’t show particle indices in hover text anymore, as this proved visually distracting; we now show these indices in a text box in the corner of the plotcryodrgn filter
saves chosen indices as anp.array
instead of Python standardlist
to prevent type issues in downstream analysescommands_utils.translate_mrcs
was not working (was assumingparticles.images()
returned a numpy array instead of a torch Tensor) — this has been fixed and tests added for translations of image stacks- going back to listing modules to be included in the
cryodrgn
andcryodrgn_utils
command line interfaces explicitly, as Python will sometimes install older modules into the corresponding folders which confuses automated scanning for command modules - fixing parsing of 8bit and 16bit .mrc files produced using e.g.
--outmode=int8
in EMAN2 (#113) - adding support and continuous integration testing for Python 3.11
Refactoring classes that parse input files
There were some updates we wanted to make to the ImageSource
class and its children which was introduced in a refactoring of the processes used to load and parse input datasets in v3.0.0. We also sought to simplify and clean up the code in the methods used to parse .star file and .mrcs file data in cryodrgn.starfile
and cryodrgn.mrc
respectively.
-
the code for the
ImageSource
base class and its children classes incryodrgn.source
have been cleaned up to improve code style, remove redundancies, and support theStarfile
andmrcfile
refactorings described below- more consistent and sensible parsing of filenames with
datadir
for_MRCDataFrameSource
classes such asTxtFileSource
andStarfileSource
(#386)- all of this logic is now contained in a new method
_MRCDataFrameSource.parse_filename
which is applied in__init__
:- If the
filename
by itself points to a file that exists, usefilename
. - Otherwise, if
os.path.join(datadir, newname)
exists, use that. - Finally, try
os.path.join(datadir, os.path.basename(newname))
. - If that doesn’t exist, throw an error!
- If the
- all of this logic is now contained in a new method
- adding
ImageSource.orig_n
attribute which is often useful for accessing the original number of particles in the stack before filtering was applied - adding
ImageSource.write_mrc()
, to avoid having to useMRCFile.write()
forImageSource
objects;MRCFile.write()
use case for arrays has been replaced bymrcfile.write_mrc
(see below)- see use in a refactored
cryodrgn downsample
for batch writing to.mrc
output
- see use in a refactored
- adding
MRCFileSource.write()
, a wrapper formrcfile.write_mrc()
- adding
MRCFileSource.apix
property for convenient access to header metadata - getting rid of
ArraySource
, whose behavior can be subsumed intoImageSource
withlazy=False
- improving error messages in
ImageSource.from_file()
,._convert_to_ndarray()
,images()
ImageSource.lazy
is now a property, not an attribute, and is dynamically dependent on whetherself.data
has actually been loaded or not- adding
_MRCDataFrameSource.sources
convenience iterator property StarfileSource
now inherits directly from theStarfile
class (as well as_MRCDataFrameSource
) for better access to .star utilities than using aStarfile
object as an attribute (.df
in the old v3.3.3 class)
- more consistent and sensible parsing of filenames with
-
.star file methods have been refactored to establish three clear ways of accessing and manipulating .star data for different levels of features, with RELION3.1 operations now implemented in
Starfile
class methods:-
cryodrgn.starfile.parse_star
andwrite_star
to get and perform simple operations on the main data table and/or the optics table
e.g. infilter_star
:stardf, data_optics = parse_star(args.input) ... write_star(args.o, data=filtered_df, data_optics=new_optics)
-
cryodrgn.starfile.Starfile
for access to .star file utilities like generating optics values for each particle in the main data table using parameters saved in the optics table
e.g. inparse_ctf_star
:stardata = Starfile(args.star) logger.info(f"{len(stardata)} particles") apix = stardata.apix resolution = stardata.resolution ... ctf_params[:, i + 2] = ( stardata.get_optics_values(header) if header not in overrides else overrides[header] )
-
cryodrgn.source.StarfileSource
for access to .star file utilities along with access to the images themselves usingImageSource
methods like.images()
-
see our more detailed write-up for more information:
Starfile Refactor
-
-
for .mrc files, we removed
MRCFile
as there are no analogues presently for the kinds of methods supported byStarfile
; the operations on the image array requiring data from the image header are presently contained withinMRCFileSource
, reflecting the fact that .mrcs files are the image data themselves and not pointers to other files containing the dataMRCFile
, which consisted solely of staticparse
andwrite
methods, has been replaced by the old names of these methods (parse_mrc
andwrite_mrc
)MRCFile.write(out_mrc, vol)
→write_mrc(out_mrc, vol)
- in the case of when
vol
is anImageSource
object, we now doImageSource.write_mrc()
- in general,
parse_mrc
andwrite_mrc
are for using the entire image stack as an array, whileMRCFileSource
is for accessing batches of images as tensors mrc
module is now namedmrcfile
for better verbosity and to matchstarfile
module which ...
v3.3.3: RELION3.1 .star filtering, interactive tilt series filtering, and fixes to backprojection
This patch release fixes several outstanding issues:
- the
--ntilts
argument tobackproject_voxel
did not do anything, and all tilts were always used; this flag now behaves as expected #379 cryodrgn_utils filter_star
now includes the (filtered) input optics table in the output if present in the input #370cryodrgn filter
now accepts experiment outputs using tilt series particles #335- fixing a numerical rounding bug showing up in transformations to poses used by
backproject_voxel
#380
We have also done more work to consolidate and expand our CI testing suite, with all of the pytest
tests under tests/
now using new data loading fixtures that allow for tests to be run in parallel using pytest-xdist
. Datasets used in testing have also been moved from testing/data/
to tests/data/
to reflect that the old tests run using command-line under the former are now deprecated and are being replaced and rewritten as pytest
tests in the latter folder.
Finally, we removed some remaining vestiges of the old way of handling large datasets difficult to fit into memory via cryodrgn preprocess
(#348) as well as improving the docstrings for several modules.
v3.3.2: fixing notebook filtering, parse_pose_star optics groups, .txt inputs for write_star
This patch release makes some improvements to tools used in writing and parsing .star files, as well as addressing a few bugs that have recently come to our attention:
- the filtering notebook
cryoDRGN_filtering
was very slow to run when applied to experiments using--ind
; we tracked this down to using an incorrect approach to loading the dataset (#374) - nicer FSC plots in
backproject_voxel
using code refactored to apply the methods used infsc
andplot_fsc
- fixing an issue when the total particle count was module one the batch size, causing dimensionality errors with the final singleton batch due to how some
torch
andnumpy
operations handle singleton dimensions (#351) - creating a stopgap for #346 while we figure out what upstream problems could be causing these issues with
analyze
- adding
linspace=True
to thenp.linspace
operation inpc_traversal
for completeness - properly supporting
.txt
files forwrite_star
, with the correct file names now being written to the output, as well as--ind
working correctly - adding support for RELION 3.1 input files with multiple optics groups in
parse_pose_star
We have also consolidated and improved upon several aspects of our continuous integration testing setup, including new tests covering the cases described above, refactoring the data fixtures used in existing tests, and testing across multiple torch
versions after finding issues specific to v1.8
in analyze_landscape_full
.
v3.3.1: fixes to backprojection and tilt with indices; per tomo star filtering
This is a patch release to address several bugs and issues that have come to our attention:
- adding
--micrograph-files
argument tofilter_star
to create separate output files for each_rlnMicroGraphName
encountered in the file --ind
with--encode-mode=tilt
wasn’t working in the case where all particles had the same number of tilts due todtype=object
patch introduced earlier- fixed by storing particle→tilt index produced by
TiltSeriesData.parse_particle_tilt()
as a list instead of an array; this is more robust in general and all downstream cases are agnostic (see tests below)
- fixed by storing particle→tilt index produced by
backproject_voxel
was producing errors when trying to calculate threshold FSC values due to deprecated code used to parse FSC matrix (#371)- fixed by copying over code already used in
commands/fsc
- fixed by copying over code already used in
train_nn
andtrain_vae
would error out if inputs were not divisible by 8 when using AMP optimization (e.g. #353)- a warning here suffices as AMP optimization is the default and this is frustrating for many users
- better error message when CTF file is missing from
write_star
inputs - better error message when
backproject_voxel
output is not.mrc
- bug in
ET_viz
notebook when--ind
not specified caused by inconsistent definition ofind0
- bug in filtering notebook caused by using
ind=ind_orig
when loading dataset and then trying to filter again (#363) ZeroDivisionError
bugs in all notebooks when using small training datasets- updating template analysis notebooks to use the given
kmeans
value in the copied-over notebook, similarly to out auto-updating of notebook epoch numbers
In addition to making the required fixes, we have expanded and improved our deployment tests to cover these cases and close some gaps in our testing coverage:
- adding a stand-alone test of backprojection under
test_reconstruct
applying both.mrcs
and.star
inputs - more testing of
train_nn
cases with different--amp
,--batch-size
,--poses
values - fixing
check=True
issue inutils.run_command()
that was allowing tests of backprojection to fail silently - new deployment task schedule
- the
main
deployment task has been split intotests
andstyle
for tests of code integrity and code linting respectively - run
tests
andstyle
along withbeta-release
any time a patch version tag[0-9]+\.[0-9]+\.[0-9]+-*
is pushed to any branch to trigger a verified upload to TestPyPI- also run
tests
andstyle
for any push todevelop
branch to allow for testing before beta release
- also run
- update
release
to only run when a stable version tag (^[0-9]+\.[0-9]+\.[0-9]+$
) is pushed tomain
tests
andstyle
run on any push tomain
to allow for testing prior to release
- the
Other changes include:
- applying
tmpdir_factory
to improve thetrain_dir
andAbinitioDir
fixtures used in tests with more robust setup and teardowns - CodeFactor badge and nicer TestPyPI installation command in
README
- dynamic update of plotted point sizes in
cryoDRGN_filtering.ipynb
interactive filtering widget, useful for smaller datasets for which the default is too small for points to be seen - using
plt.close()
afteranalyze
plotting for better memory management
v3.3.0: direct traversal, improved notebooks, TestPyPI auto-deployment
New Features
cryodrgn direct_traversal
, a tool for interpolating a path in the latent conformation space connecting two points in a direct line- making the package available for installation using the TestPyPI distribution service:
pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ "cryodrgn<=3.3.0" --pre
- this will let us to make both the development and stable versions of the package available for easy download using
pip
, as opposed to having to usegit clone
for the former.
- this will let us to make both the development and stable versions of the package available for easy download using
Improving Existing Features
- updated interfaces for
cryodrgn graph_traversal
andcryodrgn pc_traversal
so that the arguments, argument formats, and help docstrings between all three traversal methods are as clear and consistent as possible--ind
indirect_traversal
is replaced with--anchors
as ingraph_traversal
, allowing both to take a list of integers as well as files containing lists of integers-o
now also has a more verbose alias--outtxt
ingraph_traversal
anddirect_traversal
; updating its behavior ingraph
to save the latent space co-ordinates and updating--outind
to save path indices; similarly verbose alias--outdir
inpc_traversal
-o
now also has a default value that is used when the flag is given with no argument across all three traversal commands to mean that we want to save output but don't have a file name- when
-o
is not given, all three commands display a prettier log message to screen with traversal output
- epoch numbers are automatically updated to the epoch used in
cryodrgn analyze
in copied-over demo notebooks - improving package status badges shown in GitHub README: available versions, PyPI downloads
Addressing Issues and Bugs
- adding the
--datadir
flag tocryodrgn abinit_homo
, addressing an oversight that complicated using.star
files with this command (#343) - fixing bugs and other issues found in our demonstration Jupyter notebooks (#363)
analysis.plot_projections()
doesn't fail if # of imgs is two or one
- makeover of GitHub deployment workflow actions to fix errors and simplify release infrastructure
master
->main
branch names- removing remaining errors in continuous integration testing action so that it is again a useful tool for checking pull requests and protecting our
main
branch, especially with the now expanded coverage of notebooks, traversal, etc.- last
pytest
bug fixed (n=tilts
ineval_images
) - switching off
pyright
for now as type checks are not essential - leftover
pre-commit
formatting issue incommands.filter
- last
- more lightweight Docs action by only releasing new Sphinx autodocs version when a new version tag is pushed — not nuking these docs for now (#350)
- new Beta Release action for automatically deploying a release to TestPyPI whenever a new version tag is pushed
- existing Release action still not working (needs updated credentials) but is now also only deployed automatically when a new version tag is pushed
- fixing
ntilts=10
default behaviour bug ineval_images
which was activating tilt mode - officially removing support for outdated Python versions 3.7 and 3.8 (already implicitly not supported)
Testing
- renaming
test_quick
totest_integration
and improving the coverage of the reconstruction pipeline integration tests contained therein- adding integration tests for Jupyter demonstration notebooks to check that they execute successfully upon running
cryodrgn analyze
aftercryodrgn train_vae
with different types of inputs and parameters
- adding integration tests for Jupyter demonstration notebooks to check that they execute successfully upon running
- expanded fidelity and unit tests for all three traversal commands
- adding
CODEOWNERS
letting @michal-g be e.g. automatically added to new issues
Version 3.2.0-beta: cleaning, half-map FSCs, mask generation, and RELION 3.1
New Features
- introducing
cryodrgn_utils clean
, a new tool for removing extraneous output files from completed experiments (#297) backproject_voxel
now produces half-maps and a half-map FSC by default (#329)- creating
cryodrgn_utils fsc
from thefsc
analysis script for calculating Fourier shell correlations between two.mrc
volume files, and likewisecryodrgn_utils plot_fsc
based onplotfsc
; making the latter available through the former using-p
- creating
cryodrgn_utils gen_mask
based oncryoem_tools.gen_mask.py
, now with reparametrization in Angstroms
Addressing Issues and Bugs
- fixing #358 and improving the I/O interface in both
cryodrgn_utils flip_hand
andcryodrgn_utils invert_contrast
so that the name of the output file and any parent directories are created automatically, with more unit tests for each - making
write_star
use RELION 3.1 format by default with optics groups generated from image size, pixel size, voltage, spherical aberration, and amplitude contrast;-relion30
to use old format (#324) - updating install setup to prevent use of Python 3.11 (#306)
abinit_homo
now saves aconfig.yaml
with a summary of parameters used, likeabinit_het
,train_vae
, andtrain_nn
- fixing
filter_star
to accept tilt series as well (#335) - fixing
affinity
bug inanalyze_landscape
(#345) - fixing beta value bug in
train_vae
(#356) - removing references to
scipy.ndimage.morphology
which is deprecated - fixing
dtype=object
warning message inTiltSeries.parse_particle_tilt()
User Interface
- cleaner implementation of command-line interface, defining both
cryodrgn
andcryodrgn_utils
commands in one filecryodrgn/command_line.py
, and removing e.g. manually defined lists of modules with commands in them - better doc strings with some usage examples for commands (e.g.
cryodrgn abinit_homo -h
), with module-level doc strings being included explicitly in the automatically generated help screen
Testing
- using
conftest.py
to define a new setup/teardown routine for experiment output directories created by tests - writing new tests for
abinit
andtrain
methods by applying these routines - fixing
test_dataset
to account for changes withinmake_dataloader
- updating unit tests that use
argparse.ArgumentParser()
directly for commands in which the__main__
method was removed - updating tests for new and updated commands
fsc
,clean
,gen_mask
, etc.
Version 3.1.0-b: interactive filtering
We have introduced a number of small fixes and feature updates since our last release v3.0.1-beta
:
- creating a new interactive command-line interface
cryodrgn filter
as an alternative to the buggy interface in the Jupyter filtering notebook (#323) - making
cryodrgn analyze
produce a plot of the learning curve (#304) - adding cell in
cryoDRGN_filtering
jupyter notebook returned bycryodrgn analyze
for filtering by UMAP/PC values (#313) - fixing bugs with deprecated signatures in plotting functions (#322) and numpy dependency versioning (#318)