Prepare config files for samples with biological replicates

PASS data must be processed with replicates together.

A configuration (config) file is a file in JSON format that specifies input parameters required to run the ATAC-seq pipeline. Find comprehensive documentation of definable parameters here.

This tutorial walks you through the steps to quickly generate config files for a large number of samples with the same runtime parameters (e.g. memory requirements, reference genome).

Prerequisites:

R
R data.table package
R optparse package
R bit64 package

You will need a few things for this tutorial:

gitdir: The absolute path to this repository, e.g. ~/ATAC_PIPELINE/motrpac-atac-seq-pipeline
base_json: A trucated JSON file with paramaters that are constant for all samples in this batch. Find an example here. /path/to/genome.tsv refers to the path to either "motrpac_rn6.tsv" or "hg38.tsv" file generated in Step 3. Note that you must include the following parameters for consistency within MoTrPAC:

    "atac.genome_tsv" : "/path/to/genome.tsv",
    "atac.multimapping" : 4,
    "atac.auto_detect_adapter" : true,
    "atac.enable_idr" : true,
    "atac.enable_tss_enrich" : true,
    "atac.paired_end" : true,

dmaqc_meta: A copy of the DMAQC metadata corresponding to the samples in this batch (e.g. ANI830-10009.csv). Each CAS site has a batching officer who is able to retrieve this metadata from the web API. Note that you may have to concatenate multiple DMAQC metadata files into a single file if you are generating config files for a NovaSeq run where the sequenced samples were received in multiple tranches.
ref_standards: A copy of the Reference Standards metadata from Russ, converted from an Excel file to a TXT file (e.g. Stanford_StandardReferenceMaterial_0129191.txt). Note that you may have to concatenate multiple Reference Standard metadata files into a single file if you are generating config files for a NovaSeq run where the sequenced samples were received in multiple tranches.
fastq_dir: The path to the FASTQ files for all samples in this batch. Note that FASTQ files should be named by vial label, e.g. 90013015505_R1.fastq.gz.
config_dir: The path to the desired output directory for the generated config files.

When you have the absolute file paths to all of the files mentioned above, run the following command:

$ Rscript src/make_json_replicates.R  -g ${gitdir} \
                                        -j ${base_json} \
                                        -m ${dmaqc_meta} \
                                        -r ${ref_standards} \
                                        -f ${fastq_dir} \
                                        -o ${config_dir}

Add the --gcp flag if ${fastq_dir} points to a GCP bucket, not a local path.

The result will be a single JSON-formatted config file in ${config_dir} for every tissue, sex, timepoint, intervention/exercise protocol combination of the samples included in ${fastq_dir}. Each file will be named ${sampleTypeCode}_${Protocol}_${intervention}_${sex}_${sacrificeTime} according to the corresponding DMAQC metadata. Click here to see an example of what these config files should look like.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replicate_config.md

replicate_config.md

Prepare config files for samples with biological replicates

Files

replicate_config.md

Latest commit

History

replicate_config.md

File metadata and controls

Prepare config files for samples with biological replicates