The docker of MetaGP is available in, https://hub.docker.com/r/aritramhp/metagp
Use docker pull aritramhp/metagp:latest
to download and install the image
-
Input: All the data should be kept in
<input_directory>/Data
directory. -
Requirement: To run
wrapper_metagp.py
python3.8
or more andpandas
-
Execution: All the functions are implemented in the metagp_core.py file. Each function performs a specific task present in the docker. For easy to execute,
wrapper_metagp.py
file has been provided. These files can be downloaded fromhttps://github.com/aritramhp/MetaGP
. -
Basic command:
python wrapper_metagp.py -i <input_directory> [option]
The options can be
--pre|--qc|--taxo|--div|--func
For advanced arguments:
Use
python wrapper_metagp.py -h
The previous steps are enough to run MetaGP. For further understanding, each step that works behind is explained here.
This script makes a mapping file and a configuration file that will be used in the next steps
-
Basic command:
bash MetaGP config -i <input_directory>
-
Output:
<input_directory>/Analysis/config.info
<input_directory>/Analysis/mapping_file.tab
-
For advanced arguments:
bash MetaGP config -h
Or,
python make_config.py -h
This is an optional step, but is recommended to check the quality of data and to determine if some low quality nucleotides from 5’ end need to be trimmed.
-
Basic command:
bash MetaGP pre_exec -s <samplename> -f <forward_file> -r <reverse_file> -c <config_file>
-
Output:
All the output stores in the directory
<input_directory>/Analysis/1_quality_control/1.0_rawdata/
*.html
– FastQC reports for each sample.
-
For advanced arguments:
bash MetaGP pre_exec -h
Or,
python quality_check.py -h
This script generates a table and a plot to represent the number of reads.
-
Basic command:
python qcheck_stats.py -c <config_file>
-
Output:
All the output stores in the directory <input_directory>/Analysis/1_quality_control/1.0_rawdata/
-
raw_readcount.tab
– A table with the number of reads of each sample -
raw_readcount.png
– Plot of the number of reads for each sample
-
-
For advanced arguments:
python qcheck_stats.py -h
The configuration file may need to be modified based on the pre-execution check. A configuration file contains all the parameter values required to run the following steps. It automatically generates from the previous step and comes with default values. In the configuration file, a user can choose which part of the pipeline they want to execute. By default it is set to pre-execution checking. However, after performing the step, a user needs to choose their next step along with the parameters. To decontaminate the host reads, currently the pipeline is able to remove human and mouse reads based on the reference genomes HG38 and C57BL_6NJ. However, it can be easily modified by putting an updated reference genome(s) in the host database directory.
-
Basic command:
bash MetaGP pre_qc -s <samplename> -f <forward_file> -r <reverse_file> -c <config_file>
-
Output:
All the output stores in the directory
<input_directory>/Analysis/1_quality_control/
1.1_adapter_trimming/
1.2_decontamination/
-
For advanced arguments:
bash MetaGP pre_qc -h
Or,python quality_control.py -h
This script generates a table and a plot with the number of reads remaining after each QC step.
-
Basic command:
python qcheck_stats.py -c <config_file> -p
-
Output:
All the output stores in the directory <input_directory>/Analysis/1_quality_control/
-
readcounts.tab
– A table with the read counts after each step of QC -
readcounts.png
– A plot representing the read counts statistics -
samples_to_process.tab
– A list of samples that has more reads than a threshold after the QC steps.
-
-
For advanced arguments:
python qcheck_stats.py -h
This script takes the list of samples that have a considerable number of reads (based on the threshold) after QC. These samples are listed in samples_to_process.tab
(see previous step). The taxonomy profiles are generated using MetaPhlAn4 with known and unknown SGBs separately.
-
Basic command:
bash MetaGP exec_taxprof -s <samplename> -f <forward_file> -r <reverse_file> -c <config_file>
-
Output:
All the output stores in the directory
<input_directory>/Analysis/2_taxonomic_profile/
-
ignore_usgb/
– Taxonomy profiles without unknown SGB -
usgb/
– Taxonomy profiles with unknown SGB
-
-
For advanced arguments:
bash MetaGP exec_taxprof -h
Or,
python taxonomic_profiling.py -h
This script merges the taxonomy profiles for each sample and bins based on the taxonomy ranks. Finally, plot the profiles for different taxonomy ranks. If a metadata is provided, this script also combines the profiles based on the groups.
-
Basic command:
python taxoprof_stats.py -c <config_file>
-
Output:
All the output stores in the directory
<input_directory>/Analysis/2_taxonomic_profile/
OTUtable.rel_abundance.tab
– A table with the taxonomy profiles of all the samples.*/Taxonomic_binning/
– Stores the profiles for each taxonomy ranks with respective plots.
-
For advanced arguments:
python taxoprof_stats.py -h
This script computes both Alpha and Beta diversity based on a specific taxonomy rank.
-
Basic command:
python diversity.py -c <config_file>
-
Output:
All the output stores in the directory
<input_directory>/Analysis/3_diversity/
-
3.1_alpha_diversity/
-
3.2_beta_diversity/
-
-
For advanced arguments:
python diversity.py -h
Similar to the taxonomy profiling, this script takes the list of samples from samples_to_process.tab
(see previous step) and generates the functional profiles of each sample using HUMAnN3.
-
Basic command:
bash MetaGP exec_funcprof -s <samplename> -f <forward_file> -r <reverse_file> -c <config_file>
-
Output:
All the output stores in the directory
<input_directory>/Analysis/4_functional_profile/
-
For advanced arguments:
bash MetaGP exec_funcprof -h
Or,
python func_profiling.py -h
This script merges the functional profiles for each sample and normalizes them based on copies per million (cpm) and relative abundance (relab).
- Basic command:
python funcprof_stats.py -c <config_file>
-
Output:
All the output stores in the directory
<input_directory>/Analysis/4_functional_profile/profiles
-
For advanced arguments:
python funcprof_stats.py -h