Skip to content

Commit 317dac0

Browse files
2020.11 docs (#3053)
* changes after deploy for 2020.11 * fix test * addressing @ElDeveloper commenets * minor doc changes that we missed * clean and simplify docs, mainly processing recommendations * Apply suggestions from code review Co-authored-by: Yoshiki Vázquez Baeza <[email protected]> Co-authored-by: Yoshiki Vázquez Baeza <[email protected]>
1 parent 07324b3 commit 317dac0

File tree

7 files changed

+29
-86
lines changed

7 files changed

+29
-86
lines changed

README.rst

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -39,13 +39,7 @@ Current features
3939
* Easy long-term sequence data deposition to the European Nucleotide Archive (ENA),
4040
part of the European Bioinformatics Institute (EBI) for private and public
4141
studies.
42-
* Raw data processing for:
43-
44-
* Target gene data: we support deblur against GreenGenes (13_8) and close
45-
reference picking against GreenGenes (13_8) and Silva.
46-
* Metagenomic and Metatranscriptomic data: we support Shogun processing.
47-
* biom files can be added as new preparation templates for downstream
48-
analyses; however, this cannot be made public.
42+
* Raw data processing for `Target Gene, Metagenomic, Metabolomic and BIOM files <https://qiita.ucsd.edu/static/doc/html/processingdata/index.html#processing-recommendations>`. BIOM files can be added as new preparation files for downstream analyses; however, this cannot be made public.
4943

5044
* Basic downstream analyses using Qiime2.
5145
* Basic study search in the study listing page.

qiita_pet/support_files/doc/source/checklist-for-ebi-ena-submission.rst

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,7 @@ For each preparation that needs to be uploaded to EBI-ENA we will check:
4444
1. Data processing
4545

4646
a. Only datasets where raw sequences are available and linked to the preparation can be submitted. Studies where the starting point is a BIOM table cannot be submitted, since EBI is a sequence archive
47-
b. The data is processed and the owner confirms the data is correct:
48-
49-
1. For target gene: data is demultiplexed (review split_library_log to make sure each sample has roughly the expected number of sequences) and there is at least a closed-reference (GG for 16S, Silva for 18S, UNITE for ITS) or trim/deblur artifacts. Trimming should be done with 90, 100 and 150 base pairs (preferred)
50-
2. For shotgun: data is uploaded via per_sample_FASTQ and processed using Shogun/utree. Remember to remove sequencing data for any human subject via `the HMP SOP <https://www.hmpdacc.org/hmp/doc/HumanSequenceRemoval_SOP.pdf>`__ or `the Knight Lab SOP <https://github.com/qiita-spots/qp-shogun/blob/master/notebooks/host_filtering.rst>`__
47+
b. The data is processed and the owner confirms the data is correct and followed our :doc:`processingdata/processing-recommendations`.
5148

5249
2. Verify the sample information
5350

qiita_pet/support_files/doc/source/dev/resource_allocation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ separate possible name conflicts while at the same time keeping this separation
1919
simple.
2020

2121
#. RESOURCE_PARAMS_COMMAND: This is the most common entry as it defines the allocation
22-
for a specific command name, like "Shogun v1.0.7" or "Beta diversity (phylogenetic)",
22+
for a specific command name, like "deblur" or "Beta diversity (phylogenetic)",
2323
for the complete list of commands visit: `Qiita Software <https://qiita.ucsd.edu/software/>`__
2424
#. COMPLETE_JOBS_RESOURCE_PARAM: When a RESOURCE_PARAMS_COMMAND completes, it will define if the job
2525
finished successfully and a set of artifact(s) that need to be validated and then added to Qiita -

qiita_pet/support_files/doc/source/faq.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ What kind of data can I upload to Qiita for processing?
3030
Processing in Qiita requires 3 things: raw data, sample and prep information
3131
files. `Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>`__
3232
you can find a list of currently supported raw files files. Note that we are
33-
accepting any kind of target gene (16S, 18S, ITS, whatever). You can also upload
34-
and process WGS via Shogun. Check our :doc:`processingdata/processing-recommendations`.
33+
accepting any kind of target gene (16S, 18S, ITS, whatever), Whole Genome and
34+
Metatranscriptomic. Check our :doc:`processingdata/processing-recommendations`.
3535

3636

3737
What's the difference between a sample and a prep information file?

qiita_pet/support_files/doc/source/index.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,4 +76,3 @@ If you intend to deploy or develop Qiita we recommend that you have a look at th
7676
qiita-philosophy/index.rst
7777
admin/index.rst
7878
dev/index.rst
79-
resources.rst

qiita_pet/support_files/doc/source/processingdata/processing-recommendations.rst

Lines changed: 24 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,10 @@ Currently, Qiita supports the processing of raw data from:
88
#. Metatranscriptome sequencing
99

1010

11-
Note that the selected processing are mainly guided so we can perform meta-analyses, this is combine different studies,
12-
even from different wet lab techniques or sequencing technologies.
11+
Note that the selected processing recommendations are mainly guided towards performing meta-analyses,
12+
this is combine different studies, even from different wet lab techniques or
13+
sequencing technologies. However, these parameters shouldn't prevent you using the
14+
resulting tables as your primary analytical source.
1315

1416

1517
Target gene barcoded sequencing
@@ -41,13 +43,15 @@ Currently, we have the reference databases: Greengenes version 3_8-97, Silva 119
4143
Shotgun sequencing
4244
------------------
4345

44-
Qiita currently has one shotgun metagenomics data analysis pipeline: `Shogun <https://msystems.asm.org/content/3/6/e00069-18>`_.
46+
Qiita currently has one active shotgun metagenomics data analysis pipeline: a per sample
47+
bowtie2 alignment step with Woltka classification using either the WoLr1 or Rep200 databases.
48+
Below you will find more information about each of these options.
4549

4650
The current workflow is as follows:
4751

4852
#. Removal of adapter sequence and quality control: `Atropos <https://github.com/jdidion/atropos/>`_
4953
#. Removal of host contamination using `Bowtie2 <http://bowtie-bio.sourceforge.net/bowtie2/index.shtml>`_
50-
#. Taxonomy profiling using choice of three different aligners and two different reference databases; see sections below
54+
#. Taxonomy profiling using bowtie2 as an aligner and two different reference databases; see sections below
5155

5256
Note that we recommend only uploading sequences that have already been through QC and human sequence removal. However, we
5357
recommend that all sequence files go through adapter and quality control within the system to ensure they are ready for
@@ -63,21 +67,24 @@ we recommend using the `--nextseq-trim 30` parameter.
6367
For host removal we currently support *Danio Rerio* (zebrafish), *Drosophila Melanogaster* (fruit fly), *Mus Musculus* (mouse),
6468
*Rattus Norvegicus* (rat), and Enterobacteria phage phiX174 (the Illumina spike-in control).
6569

66-
Note that the Shogun command produces 4 output artifacts:
67-
- The Alignment Profile BIOM artifact, which contains the alignment files
68-
- A Taxonomic Prediction - phylum BIOM artifact, which contains the taxonomic predictions based on the alignment
69-
- A Taxonomic Prediction - genus BIOM artifact, which contains the taxonomic predictions based on the alignment
70-
- A Taxonomic Prediction - species BIOM artifact, which contains the taxonomic predictions based on the alignment
71-
The 3 Taxonomic Prediction files can be used for subsequent analysis and visualization.
70+
Note that the command produces up to 6 output artifacts based on the aligner and database selected:
71+
- Alignment Profile: contains the raw alignment file and the no rank classification BIOM table
72+
- Taxonomic Prediction - phylum: contains the phylum level taxonomic predictions BIOM table
73+
- Taxonomic Prediction - genus: contains the genus level taxonomic predictions BIOM table
74+
- Taxonomic Prediction - species: contains the genus level taxonomic predictions BIOM table
75+
- Per genome Predictions: contains the per genome level taxonomic predictions BIOM table
76+
- Per gene Predictions: Only WoLr1, contains the per gene level taxonomic predictions BIOM table
7277

73-
Shogun aligners
74-
^^^^^^^^^^^^^^^
78+
Aligners
79+
^^^^^^^^
80+
81+
Note that some of these are legacy option but not available for new processing.
7582

7683
#. Bowtie2: The classical ultrafast short sequence aligner. Based on-FM indexing of genome sequences to achieve
7784
efficient memory and CPU performance. We tuned the parameter setting for Bowtie2 to achieve optimal
7885
alignment accuracy for typical shotgun metagenome datasets.
7986

80-
- Version: 2.3.5.1
87+
- Version: 2.4.2
8188
- Alignment file format: SAM
8289
- Website: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
8390
- Citation: Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.
@@ -99,8 +106,10 @@ Shogun aligners
99106
- Website: https://github.com/knights-lab/UTree
100107
- Citation: Gabriel Al-Ghalith and Dan Knights. Faster and lower-memory metagenomic profiling with UTree. DOI: 10.5281/zenodo.998252
101108

102-
Shogun reference databases
103-
^^^^^^^^^^^^^^^^^^^^^^^^^^
109+
Reference databases
110+
^^^^^^^^^^^^^^^^^^^
111+
112+
Note that some of these are legacy option but not available for new processing.
104113

105114
#. WoLr1 ("Web of Life" release 1): An even representation of microbial diversity, selected using an prototype
106115
selection algorithm based on the MinHash distance matrix among all non-redundant bacterial and archaeal genomes

qiita_pet/support_files/doc/source/resources.rst

Lines changed: 0 additions & 56 deletions
This file was deleted.

0 commit comments

Comments
 (0)