Run

To run the code, make sure you have nextflow installed. we tested nextflow on version 24.04.4.

Sanity Test Run

To run the code on a small set and check that everything works:

make run_sample

this will create nf_output_sample folder with the result for the mols_small.csv.

Main run

To run on the main data:

make run

How to Run the Code with custom options

The main way to run the workflow is to run the following command:

nextflow run ./main.nf

If you need to calculate more variations like the MCES or motif based, you can use argument options.

Options

You can customize the run by adding options. Here are some common options:

-resume: To resume a job.
-c <config file>: To pass a config file.
-w <work directory>: To define the work directory of nextflow.
--batch_x <int value>: the number of rows when dividing the work into batches.
--batch_y <int value>: the number of columns when dividing the work into batches.
--calculate_mces <1 or 0>: If 1, the myopic-MCES distance is also calculated.
--calculate_motif_based <1 or 0>: if 1, the motif based edit distance is also calculated.
--mols_csv <path to csv file>: The path to csv file containing the molecules data. The CSV columns must contain the keys: 'Smiles' and 'INCHI'.
motifs_csv <path to csv file>: The path to csv file containing the SMARTS for the motifs, the CSV columns must contain the key 'smarts'.
output_dir <directory>: The directory for the output result.

Example:

nextflow run ./main.nf -resume -c nextflow.config --batch_x 10 --batch_y 10 \ --output_dir nf_output_test --calculate_mces 1 --calculate_motif_based 1 --mols_csv 'data/mols.csv' --motifs_csv 'data/motifs.csv'

File Descriptions

main_edit_distance.py: The main script to calculate the edit distance, it recieves all the data, the cached mols, how the data is divided to grid and the grid index and outputs a csv file for the pairwise edit distance in that grid.
motif_base_edit_distance: Contains the functions for the motif base edit distance.
mol_utils.py: Utility functions used throughout the code to work with rdkit molecules.
data/: Directory containing sample input data files.
combine_csvs.py & combine_csvs2.py: Simple files to combine CSV files generated by each grid cell process.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
combine_csvs.py		combine_csvs.py
create_cache_files.py		create_cache_files.py
environment.yml		environment.yml
example.ipynb		example.ipynb
main.nf		main.nf
main_edit_distance.py		main_edit_distance.py
merge_csvs2.py		merge_csvs2.py
mol_utils.py		mol_utils.py
motif_base_edit_distance.py		motif_base_edit_distance.py
nextflow.config		nextflow.config
nextflow_slurm.config		nextflow_slurm.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Run

Sanity Test Run

Main run

How to Run the Code with custom options

Options

File Descriptions

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

mshahneh/Edit_Distance_Workflow

Folders and files

Latest commit

History

Repository files navigation

Run

Sanity Test Run

Main run

How to Run the Code with custom options

Options

File Descriptions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages