2020.11 docs (#3053)

antgonza · ElDeveloper · web-flow · commit 317dac0cc1e7 · 2020-11-24T16:29:51.000-08:00
* changes after deploy for 2020.11 * fix test * addressing @ElDeveloper commenets * minor doc changes that we missed * clean and simplify docs, mainly processing recommendations * Apply suggestions from code review Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu> Co-authored-by: Yoshiki Vázquez Baeza <yoshiki@ucsd.edu>
diff --git a/README.rst b/README.rst
@@ -39,13 +39,7 @@ Current features
 * Easy long-term sequence data deposition to the European Nucleotide Archive (ENA),
   part of the European Bioinformatics Institute (EBI) for private and public
   studies.
-* Raw data processing for:
-
-  * Target gene data: we support deblur against GreenGenes (13_8) and close
-    reference picking against GreenGenes (13_8) and Silva.
-  * Metagenomic and Metatranscriptomic data: we support Shogun processing.
-  * biom files can be added as new preparation templates for downstream
-    analyses; however, this cannot be made public.
+* Raw data processing for `Target Gene, Metagenomic, Metabolomic and BIOM files <https://qiita.ucsd.edu/static/doc/html/processingdata/index.html#processing-recommendations>`. BIOM files can be added as new preparation files for downstream analyses; however, this cannot be made public.
 
 * Basic downstream analyses using Qiime2.
 * Basic study search in the study listing page.
diff --git a/qiita_pet/support_files/doc/source/checklist-for-ebi-ena-submission.rst b/qiita_pet/support_files/doc/source/checklist-for-ebi-ena-submission.rst
@@ -44,10 +44,7 @@ For each preparation that needs to be uploaded to EBI-ENA we will check:
   1. Data processing
 
     a. Only datasets where raw sequences are available and linked to the preparation can be submitted. Studies where the starting point is a BIOM table cannot be submitted, since EBI is a sequence archive
-    b. The data is processed and the owner confirms the data is correct:
-
-      1. For target gene: data is demultiplexed (review split_library_log to make sure each sample has roughly the expected number of sequences) and there is at least a closed-reference (GG for 16S, Silva for 18S, UNITE for ITS) or trim/deblur artifacts. Trimming should be done with 90, 100 and 150 base pairs (preferred)
-      2. For shotgun: data is uploaded via per_sample_FASTQ and processed using Shogun/utree. Remember to remove sequencing data for any human subject via `the HMP SOP <https://www.hmpdacc.org/hmp/doc/HumanSequenceRemoval_SOP.pdf>`__ or `the Knight Lab SOP <https://github.com/qiita-spots/qp-shogun/blob/master/notebooks/host_filtering.rst>`__
+    b. The data is processed and the owner confirms the data is correct and followed our :doc:`processingdata/processing-recommendations`.
 
   2. Verify the sample information
 
diff --git a/qiita_pet/support_files/doc/source/dev/resource_allocation.rst b/qiita_pet/support_files/doc/source/dev/resource_allocation.rst
@@ -19,7 +19,7 @@ separate possible name conflicts while at the same time keeping this separation
 simple.
 
 #. RESOURCE_PARAMS_COMMAND: This is the most common entry as it defines the allocation
-   for a specific command name, like "Shogun v1.0.7" or "Beta diversity (phylogenetic)",
+   for a specific command name, like "deblur" or "Beta diversity (phylogenetic)",
    for the complete list of commands visit: `Qiita Software <https://qiita.ucsd.edu/software/>`__
 #. COMPLETE_JOBS_RESOURCE_PARAM: When a RESOURCE_PARAMS_COMMAND completes, it will define if the job
    finished successfully and a set of artifact(s) that need to be validated and then added to Qiita -
diff --git a/qiita_pet/support_files/doc/source/faq.rst b/qiita_pet/support_files/doc/source/faq.rst
@@ -30,8 +30,8 @@ What kind of data can I upload to Qiita for processing?
 Processing in Qiita requires 3 things: raw data, sample and prep information
 files. `Here <https://github.com/biocore/qiita/blob/master/README.rst#accepted-raw-files>`__
 you can find a list of currently supported raw files files. Note that we are
-accepting any kind of target gene (16S, 18S, ITS, whatever). You can also upload
-and process WGS via Shogun. Check our :doc:`processingdata/processing-recommendations`.
+accepting any kind of target gene (16S, 18S, ITS, whatever), Whole Genome and
+Metatranscriptomic. Check our :doc:`processingdata/processing-recommendations`.
 
 
 What's the difference between a sample and a prep information file?
diff --git a/qiita_pet/support_files/doc/source/index.rst b/qiita_pet/support_files/doc/source/index.rst
@@ -76,4 +76,3 @@ If you intend to deploy or develop Qiita we recommend that you have a look at th
    qiita-philosophy/index.rst
    admin/index.rst
    dev/index.rst
-   resources.rst
diff --git a/qiita_pet/support_files/doc/source/processingdata/processing-recommendations.rst b/qiita_pet/support_files/doc/source/processingdata/processing-recommendations.rst
@@ -8,8 +8,10 @@ Currently, Qiita supports the processing of raw data from:
 #. Metatranscriptome sequencing
 
 
-Note that the selected processing are mainly guided so we can perform meta-analyses, this is combine different studies,
-even from different wet lab techniques or sequencing technologies.
+Note that the selected processing recommendations are mainly guided towards performing meta-analyses,
+this is combine different studies, even from different wet lab techniques or
+sequencing technologies. However, these parameters shouldn't prevent you using the
+resulting tables as your primary analytical source.
 
 
 Target gene barcoded sequencing
@@ -41,13 +43,15 @@ Currently, we have the reference databases: Greengenes version 3_8-97, Silva 119
 Shotgun sequencing
 ------------------
 
-Qiita currently has one shotgun metagenomics data analysis pipeline: `Shogun <https://msystems.asm.org/content/3/6/e00069-18>`_.
+Qiita currently has one active shotgun metagenomics data analysis pipeline: a per sample
+bowtie2 alignment step with Woltka classification using either the WoLr1 or Rep200 databases.
+Below you will find more information about each of these options.
 
 The current workflow is as follows:
 
 #. Removal of adapter sequence and quality control: `Atropos <https://github.com/jdidion/atropos/>`_
 #. Removal of host contamination using `Bowtie2 <http://bowtie-bio.sourceforge.net/bowtie2/index.shtml>`_
-#. Taxonomy profiling using choice of three different aligners and two different reference databases; see sections below
+#. Taxonomy profiling using bowtie2 as an aligner and two different reference databases; see sections below
 
 Note that we recommend only uploading sequences that have already been through QC and human sequence removal. However, we
 recommend that all sequence files go through adapter and quality control within the system to ensure they are ready for
@@ -63,21 +67,24 @@ we recommend using the `--nextseq-trim 30` parameter.
 For host removal we currently support *Danio Rerio* (zebrafish), *Drosophila Melanogaster* (fruit fly), *Mus Musculus* (mouse),
 *Rattus Norvegicus* (rat), and Enterobacteria phage phiX174 (the Illumina spike-in control).
 
-Note that the Shogun command produces 4 output artifacts:
-- The Alignment Profile BIOM artifact, which contains the alignment files
-- A Taxonomic Prediction - phylum BIOM artifact, which contains the taxonomic predictions based on the alignment
-- A Taxonomic Prediction - genus BIOM artifact, which contains the taxonomic predictions based on the alignment
-- A Taxonomic Prediction - species BIOM artifact, which contains the taxonomic predictions based on the alignment
-The 3 Taxonomic Prediction files can be used for subsequent analysis and visualization.
+Note that the command produces up to 6 output artifacts based on the aligner and database selected:
+- Alignment Profile: contains the raw alignment file and the no rank classification BIOM table
+- Taxonomic Prediction - phylum: contains the phylum level taxonomic predictions BIOM table
+- Taxonomic Prediction - genus: contains the genus level taxonomic predictions BIOM table
+- Taxonomic Prediction - species: contains the genus level taxonomic predictions BIOM table
+- Per genome Predictions: contains the per genome level taxonomic predictions BIOM table
+- Per gene Predictions: Only WoLr1, contains the per gene level taxonomic predictions BIOM table
 
-Shogun aligners
-^^^^^^^^^^^^^^^
+Aligners
+^^^^^^^^
+
+Note that some of these are legacy option but not available for new processing.
 
 #. Bowtie2: The classical ultrafast short sequence aligner. Based on-FM indexing of genome sequences to achieve
    efficient memory and CPU performance. We tuned the parameter setting for Bowtie2 to achieve optimal
    alignment accuracy for typical shotgun metagenome datasets.
 
-   - Version: 2.3.5.1
+   - Version: 2.4.2
    - Alignment file format: SAM
    - Website: http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
    - Citation: Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012, 9:357-359.
@@ -99,8 +106,10 @@ Shogun aligners
    - Website: https://github.com/knights-lab/UTree
    - Citation: Gabriel Al-Ghalith and Dan Knights. Faster and lower-memory metagenomic profiling with UTree. DOI: 10.5281/zenodo.998252
 
-Shogun reference databases
-^^^^^^^^^^^^^^^^^^^^^^^^^^
+Reference databases
+^^^^^^^^^^^^^^^^^^^
+
+Note that some of these are legacy option but not available for new processing.
 
 #. WoLr1 ("Web of Life" release 1): An even representation of microbial diversity, selected using an prototype
    selection algorithm based on the MinHash distance matrix among all non-redundant bacterial and archaeal genomes
diff --git a/qiita_pet/support_files/doc/source/resources.rst b/qiita_pet/support_files/doc/source/resources.rst