ONT-TCRconsensus creates and counts high accuracy full-length unique TCR molecule consensus sequences.
- Clone the repo:
git clone https://github.com/schumacherlab/nanopore_tcr_consensus.git
- Navigate to the project directory:
cd ONT-TCRconsensus - Create conda environment:
conda env create -f ont_tcr_consensus.yml conda activate ont_tcr_consensus_env - Install dorado
ONT-TCRconensus performance on a ~70 M reads Promethion R10.4.1 run:
| CPU model | # CPUs | Memory G | ~Run time h |
|---|---|---|---|
| Intel Xeon Silver | 110 | 275G | 20-24 h |
| Intel Xeon Gold | 128 | 800G | 5-6 h |
We refer to dorado for basecalling performance.
-
Run basecalling using dorado:
sbatch run_basecall_pipeline_multi-gpu.sh -
Generate a reference.fa file with the TCRtoolbox package
generate_assembly_nanopore_nt_refsfunction. -
Update run_config.json by inserting paths and run ONT-TCRconsensus:
# SLURM cluster (conda): sbatch run_tcr_consensus_slurm.sh run_config.json # Any cluster (conda): mkdir -p ./logs source ~/miniconda3/etc/profile.d/conda.sh conda activate ont_tcr_consensus tcr_consensus run_config.json -
After ONT-TCRconsensus pipeline is done running, for each barcode run QC plots and a filtered UMI count .csv can be generated in a common
outsdirectory generated in the nanopore run directory withanalysis.ipynb:- Open
analysis.ipynband changenanopore_project_dir =to path to nanopore run directory - Add a libraries.csv:
# Without ref_library_name barcode,library_name,ref_library_name,log_umi_counts_filter_threshold barcode02,baseline_b,,1 barcode05,ylq_cd69_pos_b,,1 barcode08,glc_cd69_pos_b,,1 barcode11,glc_cd69_neg_b,,1 # With ref_library_name (TCR library identifier in reference.fa fasta header names when multiple libraries are multiplexed in a single ONT run) barcode,library_name,ref_library_name,log_umi_counts_filter_threshold barcode14,NSCLC57,NSCLC57,2.5 barcode15,N03LAM397,N03LAM397,1.5 barcode16,YWE,ywe,1.5 barcode17,blos_c,blos_c,1 barcode18,blos_p,blos_p,1 barcode19,mlm_m,mlm_m,2 barcode20,mlm_d,mlm_d,1 barcode21,mlm_t,mlm_t,2 barcode22,str_b,str_b,1 - Adjust the following plotting parameters if needed:
umi_count_xlim_max = 900 umi_count_bin_size = 10 umi_count_ylim_max = 0.1 log_umi_count_xlim_max = 8 log_umi_count_bin_size = 0.25 log_umi_count_ylim_max = 1.0 most_similar_blast_id_threshold = 0.99925 # for zooming - (Optional) add a custom reference name set to additionaly color in plots:
gil_plate_1_duplicate_1_ref_set = set() pattern = r"^[A-H](?:[1-9]|1[0-9]|2[0-4])$" for ref in non_n_ref_names_set: if (ref.endswith("GILGFVFTL") and ref.startswith("1_1") and re.match(pattern, ref.split("_")[2])): gil_plate_1_duplicate_1_ref_set.add(ref) custom_tcr_set_dict = {"gil_plate_1_tcrs": gil_plate_1_duplicate_1_ref_set}- Run QC plotting and count generation code. Results are written to
outs.
- Open
Distributed under the Apache 2.0 License.