Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
theofanis-insitro authored and ctk3b committed Oct 20, 2023
0 parents commit ea89fba
Show file tree
Hide file tree
Showing 118 changed files with 20,213 additions and 0 deletions.
11 changes: 11 additions & 0 deletions .redun/redun.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# redun configuration.

[backend]
db_uri = sqlite:///redun.db

[executors.sweep_agent]
type = local
max_workers = 20
mode = thread

# can add custom executors below (eg AWS batch executor)
1 change: 1 addition & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Copyright (C) 2023 Insitro, Inc. This software and any derivative works are licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International Public License (CC-BY-NC 4.0), accessible at https://creativecommons.org/licenses/by-nc/4.0/legalcode
75 changes: 75 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Sparse Additive Mechanism Shift VAE (SAMS-VAE)

Code accompanying "Modeling Cellular Perturbations with Sparse Additive Mechanism Shift Variational Autoencoder" (Bereket & Karaletsos, NeurIPS 2023)

### Install Environment

Linux
```
conda create --name sams_vae --file env/conda-linux-64.lock
conda activate sams_vae
pip install -e .
```

Mac
```
conda create --name sams_vae --file env/conda-osx-arm64.lock
conda activate sams_vae
pip install -e .
```

The results in the paper were generated using the Linux environment.

### Download datasets

The perturbseq datasets analyzed in our paper can be downloaded by running:
```commandline
python download_datasets.py [--replogle] [--norman]
```
The Replogle dataset is approximately 550MB, and the Norman dataset is approximately 1.6GB. Each dataset will be saved to the directory `datasets/`

To reuse these cached files while running experiments, set the environment variable `SAMS_VAE_DATASET_DIR` to the absolute path of `datasets/`

To avoid having to repeatedly set the variable, the following script can be used to set the variable when activating the `sams_vae` environment. Make sure to replace the path on your machine in the script:
```commandline
conda activate sams_vae

cd $CONDA_PREFIX
mkdir -p ./etc/conda/activate.d
mkdir -p ./etc/conda/deactivate.d

echo "#\!/bin/sh" > ./etc/conda/activate.d/env_vars.sh


### replace {/sams_vae_path} with the absolute path to this repository
echo "export SAMS_VAE_DATASET_DIR={/sams_vae_path}/datasets/" >> ./etc/conda/activate.d/env_vars.sh

echo "#\!/bin/sh" > ./etc/conda/deactivate.d/env_vars.sh
echo "unset SAMS_VAE_DATASET_DIR" >> ./etc/conda/deactivate.d/env_vars.sh

# Need to reactivate the environment to see the changes
conda activate sams_vae
```

## Training models

The easiest way to train a model is specify a config file (eg `tests/models/sams_vae_correlated.yaml`) with data, model, and training hyperparameters
(including whether to record results locally or remotely on Weights and Biases). To train using a specified config, run

```python
python train.py --config [path/to/config.yaml]`
```

For larger experiments, we provide support for wandb sweeps using redun. To launch a training sweep, run
```commandline
redun run launch_sweep.py launch_sweep --config-path [path/to/sweep_config/yaml] --num-agents [max-agents]
```
redun can be used to run jobs in parallel on a compute cluster. To do so, add a redun executor in `.redun/redun.ini` and update the executors in `launch_sweep.py` (see https://insitro.github.io/redun/executors.html for more info on defining an executor).
By default, training jobs are run locally.


## Replicating results

We provide sweep configurations, python scripts, and jupyter notebooks to replicate each analysis from the paper in the `paper/experiments/` directory.
Additionally, we provide our precomputed metrics and checkpoints for download to allow exploration of the results without rerunning all experiments.
Detailed instructions for replicating each analysis are available in the README files of the `paper/experiments/` directory.
18 changes: 18 additions & 0 deletions download_datasets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
import argparse

from sams_vae.data.norman.download import download_norman_dataset
from sams_vae.data.replogle.download import download_replogle_dataset

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--replogle", action="store_true")
parser.add_argument("--norman", action="store_true")
args = parser.parse_args()

print(args)

if args.replogle:
download_replogle_dataset()

if args.norman:
download_norman_dataset()
413 changes: 413 additions & 0 deletions env/conda-linux-64.lock

Large diffs are not rendered by default.

343 changes: 343 additions & 0 deletions env/conda-osx-arm64.lock

Large diffs are not rendered by default.

26 changes: 26 additions & 0 deletions env/environment-linux.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: sams_vae
channels:
- pytorch
- nvidia
- conda-forge
- bioconda
- defaults
dependencies:
- anndata
- jupyter
- leidenalg
- numpy
- pandas
- pyarrow
- pyro-ppl
- pytest
- python=3.9.*
- pytorch
- pytorch-cuda=11.7
- pytorch-lightning
- redun
- scanpy
- scipy
- scikit-learn
- seaborn
- wandb
26 changes: 26 additions & 0 deletions env/environment-osx.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: sams_vae
channels:
- pytorch
- nvidia
- conda-forge
- bioconda
- defaults
dependencies:
- anndata
- awscli>=2.0
- jupyter
- leidenalg
- numpy
- pandas
- pyarrow
- pyro-ppl
- pytest
- python=3.9.*
- pytorch
- pytorch-lightning
- redun
- scanpy
- scipy
- scikit-learn
- seaborn
- wandb
Loading

0 comments on commit ea89fba

Please sign in to comment.