Initial commit

insitro · Oct 20, 2023 · ea89fba · ea89fba
commit ea89fba
Show file tree

Hide file tree

Showing 118 changed files with 20,213 additions and 0 deletions.
diff --git a/.redun/redun.ini b/.redun/redun.ini
@@ -0,0 +1,11 @@
+# redun configuration.
+
+[backend]
+db_uri = sqlite:///redun.db
+
+[executors.sweep_agent]
+type = local
+max_workers = 20
+mode = thread
+
+# can add custom executors below (eg AWS batch executor)
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1 @@
+Copyright (C) 2023 Insitro, Inc. This software and any derivative works are licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International Public License (CC-BY-NC 4.0), accessible at https://creativecommons.org/licenses/by-nc/4.0/legalcode
diff --git a/README.md b/README.md
@@ -0,0 +1,75 @@
+# Sparse Additive Mechanism Shift VAE (SAMS-VAE)
+
+Code accompanying "Modeling Cellular Perturbations with Sparse Additive Mechanism Shift Variational Autoencoder" (Bereket & Karaletsos, NeurIPS 2023)
+
+### Install Environment
+
+Linux
+```
+conda create --name sams_vae --file env/conda-linux-64.lock
+conda activate sams_vae
+pip install -e .
+```
+
+Mac
+```
+conda create --name sams_vae --file env/conda-osx-arm64.lock
+conda activate sams_vae
+pip install -e .
+```
+
+The results in the paper were generated using the Linux environment.
+
+### Download datasets
+
+The perturbseq datasets analyzed in our paper can be downloaded by running:
+```commandline
+python download_datasets.py [--replogle] [--norman]
+```
+The Replogle dataset is approximately 550MB, and the Norman dataset is approximately 1.6GB. Each dataset will be saved to the directory `datasets/`
+
+To reuse these cached files while running experiments, set the environment variable `SAMS_VAE_DATASET_DIR` to the absolute path of `datasets/`
+
+To avoid having to repeatedly set the variable, the following script can be used to set the variable when activating the `sams_vae` environment. Make sure to replace the path on your machine in the script:
+```commandline
+conda activate sams_vae
+
+cd $CONDA_PREFIX
+mkdir -p ./etc/conda/activate.d
+mkdir -p ./etc/conda/deactivate.d
+
+echo "#\!/bin/sh" > ./etc/conda/activate.d/env_vars.sh
+
+
+### replace {/sams_vae_path} with the absolute path to this repository
+echo "export SAMS_VAE_DATASET_DIR={/sams_vae_path}/datasets/" >> ./etc/conda/activate.d/env_vars.sh
+
+echo "#\!/bin/sh" > ./etc/conda/deactivate.d/env_vars.sh
+echo "unset SAMS_VAE_DATASET_DIR" >> ./etc/conda/deactivate.d/env_vars.sh
+
+# Need to reactivate the environment to see the changes
+conda activate sams_vae
+```
+
+## Training models
+
+The easiest way to train a model is specify a config file (eg `tests/models/sams_vae_correlated.yaml`) with data, model, and training hyperparameters
+(including whether to record results locally or remotely on Weights and Biases). To train using a specified config, run
+
+```python
+python train.py --config [path/to/config.yaml]`
+```
+
+For larger experiments, we provide support for wandb sweeps using redun. To launch a training sweep, run
+```commandline
+redun run launch_sweep.py launch_sweep --config-path [path/to/sweep_config/yaml] --num-agents [max-agents]
+```
+redun can be used to run jobs in parallel on a compute cluster. To do so, add a redun executor in `.redun/redun.ini` and update the executors in `launch_sweep.py` (see https://insitro.github.io/redun/executors.html for more info on defining an executor).
+By default, training jobs are run locally.
+
+
+## Replicating results
+
+We provide sweep configurations, python scripts, and jupyter notebooks to replicate each analysis from the paper in the `paper/experiments/` directory.
+Additionally, we provide our precomputed metrics and checkpoints for download to allow exploration of the results without rerunning all experiments.
+Detailed instructions for replicating each analysis are available in the README files of the `paper/experiments/` directory.
diff --git a/download_datasets.py b/download_datasets.py
@@ -0,0 +1,18 @@
+import argparse
+
+from sams_vae.data.norman.download import download_norman_dataset
+from sams_vae.data.replogle.download import download_replogle_dataset
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--replogle", action="store_true")
+    parser.add_argument("--norman", action="store_true")
+    args = parser.parse_args()
+
+    print(args)
+
+    if args.replogle:
+        download_replogle_dataset()
+
+    if args.norman:
+        download_norman_dataset()
diff --git a/env/conda-linux-64.lock b/env/conda-linux-64.lock
diff --git a/env/conda-osx-arm64.lock b/env/conda-osx-arm64.lock
diff --git a/env/environment-linux.yml b/env/environment-linux.yml
@@ -0,0 +1,26 @@
+name: sams_vae
+channels:
+  - pytorch
+  - nvidia
+  - conda-forge
+  - bioconda
+  - defaults
+dependencies:
+  - anndata
+  - jupyter
+  - leidenalg
+  - numpy
+  - pandas
+  - pyarrow
+  - pyro-ppl
+  - pytest
+  - python=3.9.*
+  - pytorch
+  - pytorch-cuda=11.7
+  - pytorch-lightning
+  - redun
+  - scanpy
+  - scipy
+  - scikit-learn
+  - seaborn
+  - wandb
diff --git a/env/environment-osx.yml b/env/environment-osx.yml
@@ -0,0 +1,26 @@
+name: sams_vae
+channels:
+  - pytorch
+  - nvidia
+  - conda-forge
+  - bioconda
+  - defaults
+dependencies:
+  - anndata
+  - awscli>=2.0
+  - jupyter
+  - leidenalg
+  - numpy
+  - pandas
+  - pyarrow
+  - pyro-ppl
+  - pytest
+  - python=3.9.*
+  - pytorch
+  - pytorch-lightning
+  - redun
+  - scanpy
+  - scipy
+  - scikit-learn
+  - seaborn
+  - wandb
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		Copyright (C) 2023 Insitro, Inc. This software and any derivative works are licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International Public License (CC-BY-NC 4.0), accessible at https://creativecommons.org/licenses/by-nc/4.0/legalcode