From e5a14350d6bf16ab5dc42f73b6b98b873734b1ea Mon Sep 17 00:00:00 2001 From: SamGurr Date: Fri, 1 Nov 2024 10:23:31 -0400 Subject: [PATCH] added tagseq multiqc stuff --- .../multiQC/adapter_only/multiqc_report.html | 7257 +++++++++++++++++ .../multiQC/clean/multiqc_report.html | 7257 +++++++++++++++++ .../multiQC/raw/multiqc_report.html | 7257 +++++++++++++++++ 3 files changed, 21771 insertions(+) create mode 100644 HPC_analysis/output/Transriptomics/TagSeq_F2_juveniles/multiQC/adapter_only/multiqc_report.html create mode 100644 HPC_analysis/output/Transriptomics/TagSeq_F2_juveniles/multiQC/clean/multiqc_report.html create mode 100644 HPC_analysis/output/Transriptomics/TagSeq_F2_juveniles/multiQC/raw/multiqc_report.html diff --git a/HPC_analysis/output/Transriptomics/TagSeq_F2_juveniles/multiQC/adapter_only/multiqc_report.html b/HPC_analysis/output/Transriptomics/TagSeq_F2_juveniles/multiQC/adapter_only/multiqc_report.html new file mode 100644 index 0000000..1d2709d --- /dev/null +++ b/HPC_analysis/output/Transriptomics/TagSeq_F2_juveniles/multiQC/adapter_only/multiqc_report.html @@ -0,0 +1,7257 @@ + + + + + + + + + + + + + +MultiQC Report + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+
+

+ + + + + + +

+ +

Loading report..

+ +
+ +
+
+ + + +
+ + + + +
+ + + + +
+

+ + Highlight Samples +

+ +
+ + + +
+

+ Regex mode off + + +

+
    +
    + + +
    +

    + + Rename Samples +

    + +
    + + + +
    +

    Click here for bulk input.

    +
    +

    Paste two columns of a tab-delimited table here (eg. from Excel).

    +

    First column should be the old name, second column the new name.

    +
    + + +
    +
    +

    + Regex mode off + + +

    +
      +
      + + +
      +

      + + Show / Hide Samples +

      + +
      +
      + +
      +
      + +
      +
      + + +
      +
      + +

      + Regex mode off + + +

      +
        +
        + + +
        +

        Export Plots

        +
        + +
        +
        +
        +
        +
        + + px +
        +
        +
        +
        + + px +
        +
        +
        +
        +
        + +
        +
        + +
        +
        +
        +
        + +
        +
        +
        + + X +
        +
        +
        +
        + +
        +

        Download the raw data used to create the plots in this report below:

        +
        +
        + +
        +
        + +
        +
        + +

        Note that additional data was saved in multiqc_data when this report was generated.

        + +
        +
        +
        + +
        +
        Choose Plots
        + + +
        + +
        + +

        If you use plots from MultiQC in a publication or presentation, please cite:

        +
        + MultiQC: Summarize analysis results for multiple tools and samples in a single report
        + Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
        + Bioinformatics (2016)
        + doi: 10.1093/bioinformatics/btw354
        + PMID: 27312411 +
        +
        +
        + + +
        +

        Save Settings

        +

        You can save the toolbox settings for this report to the browser.

        +
        + + +
        +
        + +

        Load Settings

        +

        Choose a saved report profile from the dropdown box below:

        +
        +
        + +
        +
        + + + + +
        +
        +
        + + +
        +

        Tool Citations

        +

        Please remember to cite the tools that you use in your analysis.

        +

        To help with this, you can download publication details of the tools mentioned in this report:

        +

        +

        +
        + + +
        +

        About MultiQC

        +

        This report was generated using MultiQC, version 1.12

        +

        You can see a YouTube video describing how to use MultiQC reports here: + https://youtu.be/qPbIlO_KWN0

        +

        For more information about MultiQC, including other videos and + extensive documentation, please visit http://multiqc.info

        +

        You can report bugs, suggest improvements and find the source code for MultiQC on GitHub: + https://github.com/ewels/MultiQC

        +

        MultiQC is published in Bioinformatics:

        +
        + MultiQC: Summarize analysis results for multiple tools and samples in a single report
        + Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
        + Bioinformatics (2016)
        + doi: 10.1093/bioinformatics/btw354
        + PMID: 27312411 +
        +
        + +
        + +
        + + +
        + + + +

        + + + + +

        + + + +

        + A modular tool to aggregate results from bioinformatics analyses across many samples into a single report. +

        + + + + + + + + + + + +
        +

        Report + + generated on 2024-10-30, 14:38 + + + based on data in: + + /home/sgurr/Airradians_F2Juveniles_TagSeq/output/trim/adapters_only

        + + +
        + + + + + + + +
        + + + + + + + + +
        +

        General Statistics

        + + + + + + + + + + + Showing 44/44 rows and 4/5 columns. + +
        +
        + +
        Sample Name% Dups% GCRead LengthM Seqs
        adapters.7206-B11-HiSeqTranscript_S4_R1_001
        78.0%
        40%
        136 bp
        5.8
        adapters.7206-B11-HiSeqTranscript_S4_R2_001
        75.6%
        40%
        136 bp
        5.5
        adapters.7206-B12-HiSeqTranscript_S5_R1_001
        43.7%
        40%
        140 bp
        4.8
        adapters.7206-B12-HiSeqTranscript_S5_R2_001
        40.1%
        40%
        139 bp
        4.4
        adapters.7206-C1-HiSeqTranscript_S1_R1_001
        42.8%
        40%
        146 bp
        5.3
        adapters.7206-C1-HiSeqTranscript_S1_R2_001
        41.4%
        40%
        145 bp
        5.5
        adapters.7206-C11-HiSeqTranscript_S7_R1_001
        44.5%
        40%
        141 bp
        5.9
        adapters.7206-C11-HiSeqTranscript_S7_R2_001
        41.7%
        41%
        140 bp
        5.6
        adapters.7206-C12-HiSeqTranscript_S4_R1_001
        51.4%
        41%
        146 bp
        6.0
        adapters.7206-C12-HiSeqTranscript_S4_R2_001
        49.9%
        41%
        146 bp
        5.8
        adapters.7206-C2-HiSeqTranscript_S5_R1_001
        65.6%
        39%
        149 bp
        3.5
        adapters.7206-C2-HiSeqTranscript_S5_R2_001
        64.1%
        39%
        149 bp
        3.4
        adapters.7206-C3-HiSeqTranscript_S5_R1_001
        52.5%
        41%
        139 bp
        8.1
        adapters.7206-C3-HiSeqTranscript_S5_R2_001
        52.1%
        41%
        139 bp
        8.7
        adapters.7206-C4-HiSeqTranscript_S8_R1_001
        47.4%
        41%
        144 bp
        5.6
        adapters.7206-C4-HiSeqTranscript_S8_R2_001
        46.2%
        41%
        144 bp
        5.5
        adapters.7206-C5-HiSeqTranscript_S9_R1_001
        49.0%
        41%
        138 bp
        5.3
        adapters.7206-C5-HiSeqTranscript_S9_R2_001
        47.1%
        41%
        137 bp
        5.1
        adapters.7206-C6-HiSeqTranscript_S4_R1_001
        52.5%
        41%
        141 bp
        9.7
        adapters.7206-C6-HiSeqTranscript_S4_R2_001
        51.9%
        41%
        140 bp
        10.2
        adapters.7206-C7-HiSeqTranscript_S3_R1_001
        47.5%
        41%
        142 bp
        5.2
        adapters.7206-C7-HiSeqTranscript_S3_R2_001
        44.7%
        41%
        142 bp
        4.8
        adapters.7206-C8-HiSeqTranscript_S4_R1_001
        48.4%
        40%
        142 bp
        5.0
        adapters.7206-C8-HiSeqTranscript_S4_R2_001
        45.6%
        40%
        142 bp
        4.7
        adapters.7206-C9-HiSeqTranscript_S2_R1_001
        51.2%
        41%
        144 bp
        9.7
        adapters.7206-C9-HiSeqTranscript_S2_R2_001
        50.8%
        41%
        143 bp
        10.3
        adapters.7206-D1-HiSeqTranscript_S5_R1_001
        47.3%
        40%
        147 bp
        5.1
        adapters.7206-D1-HiSeqTranscript_S5_R2_001
        45.7%
        40%
        147 bp
        4.9
        adapters.7206-D2-HiSeqTranscript_S5_R1_001
        60.4%
        40%
        146 bp
        23.5
        adapters.7206-D2-HiSeqTranscript_S5_R2_001
        58.9%
        40%
        145 bp
        22.8
        adapters.7206-D3-HiSeqTranscript_S3_R1_001
        46.1%
        40%
        146 bp
        7.1
        adapters.7206-D3-HiSeqTranscript_S3_R2_001
        43.7%
        40%
        146 bp
        6.8
        adapters.7206-D4-HiSeqTranscript_S38_R1_001
        55.4%
        40%
        144 bp
        5.8
        adapters.7206-D4-HiSeqTranscript_S38_R2_001
        53.8%
        40%
        144 bp
        5.7
        adapters.7206-D5-HiSeqTranscript_S37_R1_001
        42.1%
        40%
        139 bp
        5.3
        adapters.7206-D5-HiSeqTranscript_S37_R2_001
        40.0%
        41%
        139 bp
        5.1
        adapters.7206-D6-HiSeqTranscript_S1_R1_001
        51.2%
        41%
        144 bp
        9.7
        adapters.7206-D6-HiSeqTranscript_S1_R2_001
        49.4%
        41%
        144 bp
        9.5
        adapters.7206-D7-HiSeqTranscript_S2_R1_001
        59.0%
        40%
        145 bp
        10.0
        adapters.7206-D7-HiSeqTranscript_S2_R2_001
        57.2%
        40%
        145 bp
        9.7
        adapters.7206-D8-HiSeqTranscript_S10_R1_001
        47.4%
        40%
        144 bp
        5.0
        adapters.7206-D8-HiSeqTranscript_S10_R2_001
        45.7%
        40%
        144 bp
        4.8
        adapters.7206-D9-HiSeqTranscript_S3_R1_001
        41.7%
        40%
        147 bp
        5.5
        adapters.7206-D9-HiSeqTranscript_S3_R2_001
        41.5%
        41%
        146 bp
        5.9
        + + +
        + + + + + + +
        + + +
        +

        FastQC

        +

        FastQC is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.

        + + + + +
        + +

        + Sequence Counts + + + +

        + +

        Sequence counts for each sample. Duplicate read counts are an estimate only.

        + + +
        +

        This plot show the total number of reads, broken down into unique and duplicate +if possible (only more recent versions of FastQC give duplicate info).

        +

        You can read more about duplicate calculation in the +FastQC documentation. +A small part has been copied here for convenience:

        +

        Only sequences which first appear in the first 100,000 sequences +in each file are analysed. This should be enough to get a good impression +for the duplication levels in the whole file. Each sequence is tracked to +the end of the file to give a representative count of the overall duplication level.

        +

        The duplication detection requires an exact sequence match over the whole length of +the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

        +
        + +
        + + +
        +
        loading..
        +
        + + +
        +
        + + +
        + + + + + +
        + +

        + Sequence Quality Histograms + + + +

        + +

        The mean quality value across each base position in the read.

        + + +
        +

        To enable multiple samples to be plotted on the same graph, only the mean quality +scores are plotted (unlike the box plots seen in FastQC reports).

        +

        Taken from the FastQC help:

        +

        The y-axis on the graph shows the quality scores. The higher the score, the better +the base call. The background of the graph divides the y axis into very good quality +calls (green), calls of reasonable quality (orange), and calls of poor quality (red). +The quality of calls on most platforms will degrade as the run progresses, so it is +common to see base calls falling into the orange area towards the end of a read.

        +
        + +
        loading..
        +
        + + +
        +
        + + + + + + +
        + +

        + Per Sequence Quality Scores + + + +

        + +

        The number of reads with average quality scores. Shows if a subset of reads has poor quality.

        + + +
        +

        From the FastQC help:

        +

        The per sequence quality score report allows you to see if a subset of your +sequences have universally low quality values. It is often the case that a +subset of sequences will have universally poor quality, however these should +represent only a small percentage of the total sequences.

        +
        + +
        loading..
        +
        + + +
        +
        + + + + + + +
        + +

        + Per Base Sequence Content + + + +

        + +

        The proportion of each base position for which each of the four normal DNA bases has been called.

        + + +
        +

        To enable multiple samples to be shown in a single plot, the base composition data +is shown as a heatmap. The colours represent the balance between the four bases: +an even distribution should give an even muddy brown colour. Hover over the plot +to see the percentage of the four bases under the cursor.

        +

        To see the data as a line plot, as in the original FastQC graph, click on a sample track.

        +

        From the FastQC help:

        +

        Per Base Sequence Content plots out the proportion of each base position in a +file for which each of the four normal DNA bases has been called.

        +

        In a random library you would expect that there would be little to no difference +between the different bases of a sequence run, so the lines in this plot should +run parallel with each other. The relative amount of each base should reflect +the overall amount of these bases in your genome, but in any case they should +not be hugely imbalanced from each other.

        +

        It's worth noting that some types of library will always produce biased sequence +composition, normally at the start of the read. Libraries produced by priming +using random hexamers (including nearly all RNA-Seq libraries) and those which +were fragmented using transposases inherit an intrinsic bias in the positions +at which reads start. This bias does not concern an absolute sequence, but instead +provides enrichement of a number of different K-mers at the 5' end of the reads. +Whilst this is a true technical bias, it isn't something which can be corrected +by trimming and in most cases doesn't seem to adversely affect the downstream +analysis.

        +
        + +
        +
        +
        + + Click a sample row to see a line plot for that dataset. +
        +
        Rollover for sample name
        + +
        + Position: - +
        %T: -
        +
        %C: -
        +
        %A: -
        +
        %G: -
        +
        +
        +
        + +
        +
        +
        +
        + + + +
        +
        + + + + + + +
        + +

        + Per Sequence GC Content + + + +

        + +

        The average GC content of reads. Normal random library typically have a + roughly normal distribution of GC content.

        + + +
        +

        From the FastQC help:

        +

        This module measures the GC content across the whole length of each sequence +in a file and compares it to a modelled normal distribution of GC content.

        +

        In a normal random library you would expect to see a roughly normal distribution +of GC content where the central peak corresponds to the overall GC content of +the underlying genome. Since we don't know the the GC content of the genome the +modal GC content is calculated from the observed data and used to build a +reference distribution.

        +

        An unusually shaped distribution could indicate a contaminated library or +some other kinds of biased subset. A normal distribution which is shifted +indicates some systematic bias which is independent of base position. If there +is a systematic bias which creates a shifted normal distribution then this won't +be flagged as an error by the module since it doesn't know what your genome's +GC content should be.

        +
        + +
        + + +
        + +
        loading..
        +
        + + +
        +
        + + + + + + +
        + +

        + Per Base N Content + + + +

        + +

        The percentage of base calls at each position for which an N was called.

        + + +
        +

        From the FastQC help:

        +

        If a sequencer is unable to make a base call with sufficient confidence then it will +normally substitute an N rather than a conventional base call. This graph shows the +percentage of base calls at each position for which an N was called.

        +

        It's not unusual to see a very low proportion of Ns appearing in a sequence, especially +nearer the end of a sequence. However, if this proportion rises above a few percent +it suggests that the analysis pipeline was unable to interpret the data well enough to +make valid base calls.

        +
        + +
        loading..
        +
        + + +
        +
        + + + + + + +
        + +

        + Sequence Length Distribution + +

        + +

        The distribution of fragment sizes (read lengths) found. + See the FastQC help

        + + +
        loading..
        +
        + + +
        +
        + + + + + + +
        + +

        + Sequence Duplication Levels + + + +

        + +

        The relative level of duplication found for every sequence.

        + + +
        +

        From the FastQC Help:

        +

        In a diverse library most sequences will occur only once in the final set. +A low level of duplication may indicate a very high level of coverage of the +target sequence, but a high level of duplication is more likely to indicate +some kind of enrichment bias (eg PCR over amplification). This graph shows +the degree of duplication for every sequence in a library: the relative +number of sequences with different degrees of duplication.

        +

        Only sequences which first appear in the first 100,000 sequences +in each file are analysed. This should be enough to get a good impression +for the duplication levels in the whole file. Each sequence is tracked to +the end of the file to give a representative count of the overall duplication level.

        +

        The duplication detection requires an exact sequence match over the whole length of +the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

        +

        In a properly diverse library most sequences should fall into the far left of the +plot in both the red and blue lines. A general level of enrichment, indicating broad +oversequencing in the library will tend to flatten the lines, lowering the low end +and generally raising other categories. More specific enrichments of subsets, or +the presence of low complexity contaminants will tend to produce spikes towards the +right of the plot.

        +
        + +
        loading..
        +
        + + +
        +
        + + + + + + +
        + +

        + Overrepresented sequences + + + +

        + +

        The total amount of overrepresented sequences found in each library.

        + + +
        +

        FastQC calculates and lists overrepresented sequences in FastQ files. It would not be +possible to show this for all samples in a MultiQC report, so instead this plot shows +the number of sequences categorized as over represented.

        +

        Sometimes, a single sequence may account for a large number of reads in a dataset. +To show this, the bars are split into two: the first shows the overrepresented reads +that come from the single most common sequence. The second shows the total count +from all remaining overrepresented sequences.

        +

        From the FastQC Help:

        +

        A normal high-throughput library will contain a diverse set of sequences, with no +individual sequence making up a tiny fraction of the whole. Finding that a single +sequence is very overrepresented in the set either means that it is highly biologically +significant, or indicates that the library is contaminated, or not as diverse as you expected.

        +

        FastQC lists all of the sequences which make up more than 0.1% of the total. +To conserve memory only sequences which appear in the first 100,000 sequences are tracked +to the end of the file. It is therefore possible that a sequence which is overrepresented +but doesn't appear at the start of the file for some reason could be missed by this module.

        +
        + +
        +
        loading..
        +
        + + +
        +
        + + + + + + +
        + +

        + Adapter Content + + + +

        + +

        The cumulative percentage count of the proportion of your + library which has seen each of the adapter sequences at each position.

        + + +
        +

        Note that only samples with ≥ 0.1% adapter contamination are shown.

        +

        There may be several lines per sample, as one is shown for each adapter +detected in the file.

        +

        From the FastQC Help:

        +

        The plot shows a cumulative percentage count of the proportion +of your library which has seen each of the adapter sequences at each position. +Once a sequence has been seen in a read it is counted as being present +right through to the end of the read so the percentages you see will only +increase as the read length goes on.

        +
        + +
        loading..
        +
        + + +
        +
        + + + + + + +
        + +

        + Status Checks + + + +

        + +

        Status for each FastQC section showing whether results seem entirely normal (green), +slightly abnormal (orange) or very unusual (red).

        + + +
        +

        FastQC assigns a status for each section of the report. +These give a quick evaluation of whether the results of the analysis seem +entirely normal (green), slightly abnormal (orange) or very unusual (red).

        +

        It is important to stress that although the analysis results appear to give a pass/fail result, +these evaluations must be taken in the context of what you expect from your library. +A 'normal' sample as far as FastQC is concerned is random and diverse. +Some experiments may be expected to produce libraries which are biased in particular ways. +You should treat the summary evaluations therefore as pointers to where you should concentrate +your attention and understand why your library may not look random and diverse.

        +

        Specific guidance on how to interpret the output of each module can be found in the relevant +report section, or in the FastQC help.

        +

        In this heatmap, we summarise all of these into a single heatmap for a quick overview. +Note that not all FastQC sections have plots in MultiQC reports, but all status checks +are shown in this heatmap.

        +
        + +
        +
        +
        + +
        +
        +
        + + + +
        +
        + + + +
        +
        +
        +
        + loading.. +
        +
        +
        +
        + + + +
        + + + +
        + + + + +
        + + + + + + + + + + + + + + + + diff --git a/HPC_analysis/output/Transriptomics/TagSeq_F2_juveniles/multiQC/clean/multiqc_report.html b/HPC_analysis/output/Transriptomics/TagSeq_F2_juveniles/multiQC/clean/multiqc_report.html new file mode 100644 index 0000000..0183d60 --- /dev/null +++ b/HPC_analysis/output/Transriptomics/TagSeq_F2_juveniles/multiQC/clean/multiqc_report.html @@ -0,0 +1,7257 @@ + + + + + + + + + + + + + +MultiQC Report + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
        +
        +

        + + + + + + +

        + +

        Loading report..

        + +
        + +
        +
        + + + +
        + + + + +
        + + + + +
        +

        + + Highlight Samples +

        + +
        + + + +
        +

        + Regex mode off + + +

        +
          +
          + + +
          +

          + + Rename Samples +

          + +
          + + + +
          +

          Click here for bulk input.

          +
          +

          Paste two columns of a tab-delimited table here (eg. from Excel).

          +

          First column should be the old name, second column the new name.

          +
          + + +
          +
          +

          + Regex mode off + + +

          +
            +
            + + +
            +

            + + Show / Hide Samples +

            + +
            +
            + +
            +
            + +
            +
            + + +
            +
            + +

            + Regex mode off + + +

            +
              +
              + + +
              +

              Export Plots

              +
              + +
              +
              +
              +
              +
              + + px +
              +
              +
              +
              + + px +
              +
              +
              +
              +
              + +
              +
              + +
              +
              +
              +
              + +
              +
              +
              + + X +
              +
              +
              +
              + +
              +

              Download the raw data used to create the plots in this report below:

              +
              +
              + +
              +
              + +
              +
              + +

              Note that additional data was saved in multiqc_data when this report was generated.

              + +
              +
              +
              + +
              +
              Choose Plots
              + + +
              + +
              + +

              If you use plots from MultiQC in a publication or presentation, please cite:

              +
              + MultiQC: Summarize analysis results for multiple tools and samples in a single report
              + Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
              + Bioinformatics (2016)
              + doi: 10.1093/bioinformatics/btw354
              + PMID: 27312411 +
              +
              +
              + + +
              +

              Save Settings

              +

              You can save the toolbox settings for this report to the browser.

              +
              + + +
              +
              + +

              Load Settings

              +

              Choose a saved report profile from the dropdown box below:

              +
              +
              + +
              +
              + + + + +
              +
              +
              + + +
              +

              Tool Citations

              +

              Please remember to cite the tools that you use in your analysis.

              +

              To help with this, you can download publication details of the tools mentioned in this report:

              +

              +

              +
              + + +
              +

              About MultiQC

              +

              This report was generated using MultiQC, version 1.12

              +

              You can see a YouTube video describing how to use MultiQC reports here: + https://youtu.be/qPbIlO_KWN0

              +

              For more information about MultiQC, including other videos and + extensive documentation, please visit http://multiqc.info

              +

              You can report bugs, suggest improvements and find the source code for MultiQC on GitHub: + https://github.com/ewels/MultiQC

              +

              MultiQC is published in Bioinformatics:

              +
              + MultiQC: Summarize analysis results for multiple tools and samples in a single report
              + Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
              + Bioinformatics (2016)
              + doi: 10.1093/bioinformatics/btw354
              + PMID: 27312411 +
              +
              + +
              + +
              + + +
              + + + +

              + + + + +

              + + + +

              + A modular tool to aggregate results from bioinformatics analyses across many samples into a single report. +

              + + + + + + + + + + + +
              +

              Report + + generated on 2024-10-30, 21:37 + + + based on data in: + + /home/sgurr/Airradians_F2Juveniles_TagSeq/output/trim/clean

              + + +
              + + + + + + + +
              + + + + + + + + +
              +

              General Statistics

              + + + + + + + + + + + Showing 44/44 rows and 4/5 columns. + +
              +
              + +
              Sample Name% Dups% GCRead LengthM Seqs
              clean.7206-B11-HiSeqTranscript_S4_R1_001
              78.1%
              40%
              136 bp
              5.8
              clean.7206-B11-HiSeqTranscript_S4_R2_001
              75.7%
              40%
              136 bp
              5.5
              clean.7206-B12-HiSeqTranscript_S5_R1_001
              43.8%
              40%
              140 bp
              4.8
              clean.7206-B12-HiSeqTranscript_S5_R2_001
              40.1%
              40%
              139 bp
              4.4
              clean.7206-C1-HiSeqTranscript_S1_R1_001
              42.8%
              40%
              145 bp
              5.3
              clean.7206-C1-HiSeqTranscript_S1_R2_001
              41.4%
              40%
              145 bp
              5.5
              clean.7206-C11-HiSeqTranscript_S7_R1_001
              44.5%
              41%
              141 bp
              5.9
              clean.7206-C11-HiSeqTranscript_S7_R2_001
              41.6%
              41%
              140 bp
              5.6
              clean.7206-C12-HiSeqTranscript_S4_R1_001
              51.4%
              41%
              146 bp
              6.0
              clean.7206-C12-HiSeqTranscript_S4_R2_001
              49.9%
              41%
              146 bp
              5.8
              clean.7206-C2-HiSeqTranscript_S5_R1_001
              65.6%
              39%
              149 bp
              3.5
              clean.7206-C2-HiSeqTranscript_S5_R2_001
              64.1%
              40%
              149 bp
              3.4
              clean.7206-C3-HiSeqTranscript_S5_R1_001
              52.5%
              41%
              139 bp
              8.1
              clean.7206-C3-HiSeqTranscript_S5_R2_001
              52.1%
              41%
              139 bp
              8.6
              clean.7206-C4-HiSeqTranscript_S8_R1_001
              47.4%
              41%
              144 bp
              5.6
              clean.7206-C4-HiSeqTranscript_S8_R2_001
              46.2%
              41%
              144 bp
              5.5
              clean.7206-C5-HiSeqTranscript_S9_R1_001
              49.0%
              41%
              138 bp
              5.3
              clean.7206-C5-HiSeqTranscript_S9_R2_001
              47.1%
              41%
              137 bp
              5.1
              clean.7206-C6-HiSeqTranscript_S4_R1_001
              52.5%
              41%
              140 bp
              9.7
              clean.7206-C6-HiSeqTranscript_S4_R2_001
              51.9%
              41%
              140 bp
              10.2
              clean.7206-C7-HiSeqTranscript_S3_R1_001
              47.5%
              41%
              142 bp
              5.2
              clean.7206-C7-HiSeqTranscript_S3_R2_001
              44.7%
              41%
              142 bp
              4.8
              clean.7206-C8-HiSeqTranscript_S4_R1_001
              48.4%
              40%
              142 bp
              5.0
              clean.7206-C8-HiSeqTranscript_S4_R2_001
              45.6%
              40%
              142 bp
              4.7
              clean.7206-C9-HiSeqTranscript_S2_R1_001
              51.2%
              41%
              144 bp
              9.7
              clean.7206-C9-HiSeqTranscript_S2_R2_001
              50.8%
              41%
              143 bp
              10.3
              clean.7206-D1-HiSeqTranscript_S5_R1_001
              47.3%
              40%
              147 bp
              5.1
              clean.7206-D1-HiSeqTranscript_S5_R2_001
              45.7%
              40%
              147 bp
              4.9
              clean.7206-D2-HiSeqTranscript_S5_R1_001
              60.4%
              40%
              146 bp
              23.5
              clean.7206-D2-HiSeqTranscript_S5_R2_001
              58.9%
              40%
              145 bp
              22.8
              clean.7206-D3-HiSeqTranscript_S3_R1_001
              46.1%
              40%
              146 bp
              7.1
              clean.7206-D3-HiSeqTranscript_S3_R2_001
              43.7%
              40%
              146 bp
              6.8
              clean.7206-D4-HiSeqTranscript_S38_R1_001
              55.5%
              40%
              144 bp
              5.8
              clean.7206-D4-HiSeqTranscript_S38_R2_001
              53.8%
              40%
              144 bp
              5.7
              clean.7206-D5-HiSeqTranscript_S37_R1_001
              42.1%
              40%
              139 bp
              5.3
              clean.7206-D5-HiSeqTranscript_S37_R2_001
              40.0%
              41%
              139 bp
              5.1
              clean.7206-D6-HiSeqTranscript_S1_R1_001
              51.2%
              41%
              144 bp
              9.7
              clean.7206-D6-HiSeqTranscript_S1_R2_001
              49.4%
              41%
              144 bp
              9.5
              clean.7206-D7-HiSeqTranscript_S2_R1_001
              59.0%
              40%
              145 bp
              10.0
              clean.7206-D7-HiSeqTranscript_S2_R2_001
              57.2%
              40%
              145 bp
              9.7
              clean.7206-D8-HiSeqTranscript_S10_R1_001
              47.4%
              40%
              144 bp
              5.0
              clean.7206-D8-HiSeqTranscript_S10_R2_001
              45.7%
              40%
              144 bp
              4.8
              clean.7206-D9-HiSeqTranscript_S3_R1_001
              41.7%
              40%
              147 bp
              5.5
              clean.7206-D9-HiSeqTranscript_S3_R2_001
              41.5%
              41%
              146 bp
              5.9
              + + +
              + + + + + + +
              + + +
              +

              FastQC

              +

              FastQC is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.

              + + + + +
              + +

              + Sequence Counts + + + +

              + +

              Sequence counts for each sample. Duplicate read counts are an estimate only.

              + + +
              +

              This plot show the total number of reads, broken down into unique and duplicate +if possible (only more recent versions of FastQC give duplicate info).

              +

              You can read more about duplicate calculation in the +FastQC documentation. +A small part has been copied here for convenience:

              +

              Only sequences which first appear in the first 100,000 sequences +in each file are analysed. This should be enough to get a good impression +for the duplication levels in the whole file. Each sequence is tracked to +the end of the file to give a representative count of the overall duplication level.

              +

              The duplication detection requires an exact sequence match over the whole length of +the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

              +
              + +
              + + +
              +
              loading..
              +
              + + +
              +
              + + +
              + + + + + +
              + +

              + Sequence Quality Histograms + + + +

              + +

              The mean quality value across each base position in the read.

              + + +
              +

              To enable multiple samples to be plotted on the same graph, only the mean quality +scores are plotted (unlike the box plots seen in FastQC reports).

              +

              Taken from the FastQC help:

              +

              The y-axis on the graph shows the quality scores. The higher the score, the better +the base call. The background of the graph divides the y axis into very good quality +calls (green), calls of reasonable quality (orange), and calls of poor quality (red). +The quality of calls on most platforms will degrade as the run progresses, so it is +common to see base calls falling into the orange area towards the end of a read.

              +
              + +
              loading..
              +
              + + +
              +
              + + + + + + +
              + +

              + Per Sequence Quality Scores + + + +

              + +

              The number of reads with average quality scores. Shows if a subset of reads has poor quality.

              + + +
              +

              From the FastQC help:

              +

              The per sequence quality score report allows you to see if a subset of your +sequences have universally low quality values. It is often the case that a +subset of sequences will have universally poor quality, however these should +represent only a small percentage of the total sequences.

              +
              + +
              loading..
              +
              + + +
              +
              + + + + + + +
              + +

              + Per Base Sequence Content + + + +

              + +

              The proportion of each base position for which each of the four normal DNA bases has been called.

              + + +
              +

              To enable multiple samples to be shown in a single plot, the base composition data +is shown as a heatmap. The colours represent the balance between the four bases: +an even distribution should give an even muddy brown colour. Hover over the plot +to see the percentage of the four bases under the cursor.

              +

              To see the data as a line plot, as in the original FastQC graph, click on a sample track.

              +

              From the FastQC help:

              +

              Per Base Sequence Content plots out the proportion of each base position in a +file for which each of the four normal DNA bases has been called.

              +

              In a random library you would expect that there would be little to no difference +between the different bases of a sequence run, so the lines in this plot should +run parallel with each other. The relative amount of each base should reflect +the overall amount of these bases in your genome, but in any case they should +not be hugely imbalanced from each other.

              +

              It's worth noting that some types of library will always produce biased sequence +composition, normally at the start of the read. Libraries produced by priming +using random hexamers (including nearly all RNA-Seq libraries) and those which +were fragmented using transposases inherit an intrinsic bias in the positions +at which reads start. This bias does not concern an absolute sequence, but instead +provides enrichement of a number of different K-mers at the 5' end of the reads. +Whilst this is a true technical bias, it isn't something which can be corrected +by trimming and in most cases doesn't seem to adversely affect the downstream +analysis.

              +
              + +
              +
              +
              + + Click a sample row to see a line plot for that dataset. +
              +
              Rollover for sample name
              + +
              + Position: - +
              %T: -
              +
              %C: -
              +
              %A: -
              +
              %G: -
              +
              +
              +
              + +
              +
              +
              +
              + + + +
              +
              + + + + + + +
              + +

              + Per Sequence GC Content + + + +

              + +

              The average GC content of reads. Normal random library typically have a + roughly normal distribution of GC content.

              + + +
              +

              From the FastQC help:

              +

              This module measures the GC content across the whole length of each sequence +in a file and compares it to a modelled normal distribution of GC content.

              +

              In a normal random library you would expect to see a roughly normal distribution +of GC content where the central peak corresponds to the overall GC content of +the underlying genome. Since we don't know the the GC content of the genome the +modal GC content is calculated from the observed data and used to build a +reference distribution.

              +

              An unusually shaped distribution could indicate a contaminated library or +some other kinds of biased subset. A normal distribution which is shifted +indicates some systematic bias which is independent of base position. If there +is a systematic bias which creates a shifted normal distribution then this won't +be flagged as an error by the module since it doesn't know what your genome's +GC content should be.

              +
              + +
              + + +
              + +
              loading..
              +
              + + +
              +
              + + + + + + +
              + +

              + Per Base N Content + + + +

              + +

              The percentage of base calls at each position for which an N was called.

              + + +
              +

              From the FastQC help:

              +

              If a sequencer is unable to make a base call with sufficient confidence then it will +normally substitute an N rather than a conventional base call. This graph shows the +percentage of base calls at each position for which an N was called.

              +

              It's not unusual to see a very low proportion of Ns appearing in a sequence, especially +nearer the end of a sequence. However, if this proportion rises above a few percent +it suggests that the analysis pipeline was unable to interpret the data well enough to +make valid base calls.

              +
              + +
              loading..
              +
              + + +
              +
              + + + + + + +
              + +

              + Sequence Length Distribution + +

              + +

              The distribution of fragment sizes (read lengths) found. + See the FastQC help

              + + +
              loading..
              +
              + + +
              +
              + + + + + + +
              + +

              + Sequence Duplication Levels + + + +

              + +

              The relative level of duplication found for every sequence.

              + + +
              +

              From the FastQC Help:

              +

              In a diverse library most sequences will occur only once in the final set. +A low level of duplication may indicate a very high level of coverage of the +target sequence, but a high level of duplication is more likely to indicate +some kind of enrichment bias (eg PCR over amplification). This graph shows +the degree of duplication for every sequence in a library: the relative +number of sequences with different degrees of duplication.

              +

              Only sequences which first appear in the first 100,000 sequences +in each file are analysed. This should be enough to get a good impression +for the duplication levels in the whole file. Each sequence is tracked to +the end of the file to give a representative count of the overall duplication level.

              +

              The duplication detection requires an exact sequence match over the whole length of +the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

              +

              In a properly diverse library most sequences should fall into the far left of the +plot in both the red and blue lines. A general level of enrichment, indicating broad +oversequencing in the library will tend to flatten the lines, lowering the low end +and generally raising other categories. More specific enrichments of subsets, or +the presence of low complexity contaminants will tend to produce spikes towards the +right of the plot.

              +
              + +
              loading..
              +
              + + +
              +
              + + + + + + +
              + +

              + Overrepresented sequences + + + +

              + +

              The total amount of overrepresented sequences found in each library.

              + + +
              +

              FastQC calculates and lists overrepresented sequences in FastQ files. It would not be +possible to show this for all samples in a MultiQC report, so instead this plot shows +the number of sequences categorized as over represented.

              +

              Sometimes, a single sequence may account for a large number of reads in a dataset. +To show this, the bars are split into two: the first shows the overrepresented reads +that come from the single most common sequence. The second shows the total count +from all remaining overrepresented sequences.

              +

              From the FastQC Help:

              +

              A normal high-throughput library will contain a diverse set of sequences, with no +individual sequence making up a tiny fraction of the whole. Finding that a single +sequence is very overrepresented in the set either means that it is highly biologically +significant, or indicates that the library is contaminated, or not as diverse as you expected.

              +

              FastQC lists all of the sequences which make up more than 0.1% of the total. +To conserve memory only sequences which appear in the first 100,000 sequences are tracked +to the end of the file. It is therefore possible that a sequence which is overrepresented +but doesn't appear at the start of the file for some reason could be missed by this module.

              +
              + +
              +
              loading..
              +
              + + +
              +
              + + + + + + +
              + +

              + Adapter Content + + + +

              + +

              The cumulative percentage count of the proportion of your + library which has seen each of the adapter sequences at each position.

              + + +
              +

              Note that only samples with ≥ 0.1% adapter contamination are shown.

              +

              There may be several lines per sample, as one is shown for each adapter +detected in the file.

              +

              From the FastQC Help:

              +

              The plot shows a cumulative percentage count of the proportion +of your library which has seen each of the adapter sequences at each position. +Once a sequence has been seen in a read it is counted as being present +right through to the end of the read so the percentages you see will only +increase as the read length goes on.

              +
              + +
              loading..
              +
              + + +
              +
              + + + + + + +
              + +

              + Status Checks + + + +

              + +

              Status for each FastQC section showing whether results seem entirely normal (green), +slightly abnormal (orange) or very unusual (red).

              + + +
              +

              FastQC assigns a status for each section of the report. +These give a quick evaluation of whether the results of the analysis seem +entirely normal (green), slightly abnormal (orange) or very unusual (red).

              +

              It is important to stress that although the analysis results appear to give a pass/fail result, +these evaluations must be taken in the context of what you expect from your library. +A 'normal' sample as far as FastQC is concerned is random and diverse. +Some experiments may be expected to produce libraries which are biased in particular ways. +You should treat the summary evaluations therefore as pointers to where you should concentrate +your attention and understand why your library may not look random and diverse.

              +

              Specific guidance on how to interpret the output of each module can be found in the relevant +report section, or in the FastQC help.

              +

              In this heatmap, we summarise all of these into a single heatmap for a quick overview. +Note that not all FastQC sections have plots in MultiQC reports, but all status checks +are shown in this heatmap.

              +
              + +
              +
              +
              + +
              +
              +
              + + + +
              +
              + + + +
              +
              +
              +
              + loading.. +
              +
              +
              +
              + + + +
              + + + +
              + + + + +
              + + + + + + + + + + + + + + + + diff --git a/HPC_analysis/output/Transriptomics/TagSeq_F2_juveniles/multiQC/raw/multiqc_report.html b/HPC_analysis/output/Transriptomics/TagSeq_F2_juveniles/multiQC/raw/multiqc_report.html new file mode 100644 index 0000000..7dda426 --- /dev/null +++ b/HPC_analysis/output/Transriptomics/TagSeq_F2_juveniles/multiQC/raw/multiqc_report.html @@ -0,0 +1,7257 @@ + + + + + + + + + + + + + +MultiQC Report + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
              +
              +

              + + + + + + +

              + +

              Loading report..

              + +
              + +
              +
              + + + +
              + + + + +
              + + + + +
              +

              + + Highlight Samples +

              + +
              + + + +
              +

              + Regex mode off + + +

              +
                +
                + + +
                +

                + + Rename Samples +

                + +
                + + + +
                +

                Click here for bulk input.

                +
                +

                Paste two columns of a tab-delimited table here (eg. from Excel).

                +

                First column should be the old name, second column the new name.

                +
                + + +
                +
                +

                + Regex mode off + + +

                +
                  +
                  + + +
                  +

                  + + Show / Hide Samples +

                  + +
                  +
                  + +
                  +
                  + +
                  +
                  + + +
                  +
                  + +

                  + Regex mode off + + +

                  +
                    +
                    + + +
                    +

                    Export Plots

                    +
                    + +
                    +
                    +
                    +
                    +
                    + + px +
                    +
                    +
                    +
                    + + px +
                    +
                    +
                    +
                    +
                    + +
                    +
                    + +
                    +
                    +
                    +
                    + +
                    +
                    +
                    + + X +
                    +
                    +
                    +
                    + +
                    +

                    Download the raw data used to create the plots in this report below:

                    +
                    +
                    + +
                    +
                    + +
                    +
                    + +

                    Note that additional data was saved in multiqc_data when this report was generated.

                    + +
                    +
                    +
                    + +
                    +
                    Choose Plots
                    + + +
                    + +
                    + +

                    If you use plots from MultiQC in a publication or presentation, please cite:

                    +
                    + MultiQC: Summarize analysis results for multiple tools and samples in a single report
                    + Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
                    + Bioinformatics (2016)
                    + doi: 10.1093/bioinformatics/btw354
                    + PMID: 27312411 +
                    +
                    +
                    + + +
                    +

                    Save Settings

                    +

                    You can save the toolbox settings for this report to the browser.

                    +
                    + + +
                    +
                    + +

                    Load Settings

                    +

                    Choose a saved report profile from the dropdown box below:

                    +
                    +
                    + +
                    +
                    + + + + +
                    +
                    +
                    + + +
                    +

                    Tool Citations

                    +

                    Please remember to cite the tools that you use in your analysis.

                    +

                    To help with this, you can download publication details of the tools mentioned in this report:

                    +

                    +

                    +
                    + + +
                    +

                    About MultiQC

                    +

                    This report was generated using MultiQC, version 1.12

                    +

                    You can see a YouTube video describing how to use MultiQC reports here: + https://youtu.be/qPbIlO_KWN0

                    +

                    For more information about MultiQC, including other videos and + extensive documentation, please visit http://multiqc.info

                    +

                    You can report bugs, suggest improvements and find the source code for MultiQC on GitHub: + https://github.com/ewels/MultiQC

                    +

                    MultiQC is published in Bioinformatics:

                    +
                    + MultiQC: Summarize analysis results for multiple tools and samples in a single report
                    + Philip Ewels, Måns Magnusson, Sverker Lundin and Max Käller
                    + Bioinformatics (2016)
                    + doi: 10.1093/bioinformatics/btw354
                    + PMID: 27312411 +
                    +
                    + +
                    + +
                    + + +
                    + + + +

                    + + + + +

                    + + + +

                    + A modular tool to aggregate results from bioinformatics analyses across many samples into a single report. +

                    + + + + + + + + + + + +
                    +

                    Report + + generated on 2024-10-25, 21:27 + + + based on data in: + + /home/sgurr/Airradians_F2Juveniles_TagSeq/output/rawqc

                    + + +
                    + + + + + + + +
                    + + + + + + + + +
                    +

                    General Statistics

                    + + + + + + + + + + + Showing 44/44 rows and 3/5 columns. + +
                    +
                    + +
                    Sample Name% Dups% GCM Seqs
                    7206-B11-HiSeqTranscript_S4_R1_001
                    78.6%
                    42%
                    6.0
                    7206-B11-HiSeqTranscript_S4_R2_001
                    72.2%
                    42%
                    6.0
                    7206-B12-HiSeqTranscript_S5_R1_001
                    43.9%
                    41%
                    4.9
                    7206-B12-HiSeqTranscript_S5_R2_001
                    37.5%
                    41%
                    4.9
                    7206-C1-HiSeqTranscript_S1_R1_001
                    39.8%
                    40%
                    6.0
                    7206-C1-HiSeqTranscript_S1_R2_001
                    38.1%
                    41%
                    6.0
                    7206-C11-HiSeqTranscript_S7_R1_001
                    45.0%
                    42%
                    6.1
                    7206-C11-HiSeqTranscript_S7_R2_001
                    39.3%
                    42%
                    6.1
                    7206-C12-HiSeqTranscript_S4_R1_001
                    51.7%
                    41%
                    6.2
                    7206-C12-HiSeqTranscript_S4_R2_001
                    48.4%
                    41%
                    6.2
                    7206-C2-HiSeqTranscript_S5_R1_001
                    65.1%
                    39%
                    3.6
                    7206-C2-HiSeqTranscript_S5_R2_001
                    62.4%
                    40%
                    3.6
                    7206-C3-HiSeqTranscript_S5_R1_001
                    48.1%
                    41%
                    9.3
                    7206-C3-HiSeqTranscript_S5_R2_001
                    48.4%
                    42%
                    9.3
                    7206-C4-HiSeqTranscript_S8_R1_001
                    48.5%
                    42%
                    5.8
                    7206-C4-HiSeqTranscript_S8_R2_001
                    44.4%
                    42%
                    5.8
                    7206-C5-HiSeqTranscript_S9_R1_001
                    52.8%
                    44%
                    5.8
                    7206-C5-HiSeqTranscript_S9_R2_001
                    44.5%
                    45%
                    5.8
                    7206-C6-HiSeqTranscript_S4_R1_001
                    49.7%
                    42%
                    11.3
                    7206-C6-HiSeqTranscript_S4_R2_001
                    48.7%
                    43%
                    11.3
                    7206-C7-HiSeqTranscript_S3_R1_001
                    50.6%
                    43%
                    5.6
                    7206-C7-HiSeqTranscript_S3_R2_001
                    41.0%
                    43%
                    5.6
                    7206-C8-HiSeqTranscript_S4_R1_001
                    52.5%
                    44%
                    5.5
                    7206-C8-HiSeqTranscript_S4_R2_001
                    41.8%
                    44%
                    5.5
                    7206-C9-HiSeqTranscript_S2_R1_001
                    48.1%
                    42%
                    11.3
                    7206-C9-HiSeqTranscript_S2_R2_001
                    47.4%
                    43%
                    11.3
                    7206-D1-HiSeqTranscript_S5_R1_001
                    47.5%
                    41%
                    5.2
                    7206-D1-HiSeqTranscript_S5_R2_001
                    44.3%
                    41%
                    5.2
                    7206-D2-HiSeqTranscript_S5_R1_001
                    60.1%
                    41%
                    23.9
                    7206-D2-HiSeqTranscript_S5_R2_001
                    57.6%
                    41%
                    23.9
                    7206-D3-HiSeqTranscript_S3_R1_001
                    46.2%
                    41%
                    7.3
                    7206-D3-HiSeqTranscript_S3_R2_001
                    42.0%
                    41%
                    7.3
                    7206-D4-HiSeqTranscript_S38_R1_001
                    56.5%
                    41%
                    6.1
                    7206-D4-HiSeqTranscript_S38_R2_001
                    51.8%
                    41%
                    6.1
                    7206-D5-HiSeqTranscript_S37_R1_001
                    42.6%
                    41%
                    5.4
                    7206-D5-HiSeqTranscript_S37_R2_001
                    39.1%
                    41%
                    5.4
                    7206-D6-HiSeqTranscript_S1_R1_001
                    51.5%
                    42%
                    9.9
                    7206-D6-HiSeqTranscript_S1_R2_001
                    48.1%
                    42%
                    9.9
                    7206-D7-HiSeqTranscript_S2_R1_001
                    59.2%
                    41%
                    10.2
                    7206-D7-HiSeqTranscript_S2_R2_001
                    55.4%
                    41%
                    10.2
                    7206-D8-HiSeqTranscript_S10_R1_001
                    48.1%
                    41%
                    5.1
                    7206-D8-HiSeqTranscript_S10_R2_001
                    44.3%
                    41%
                    5.1
                    7206-D9-HiSeqTranscript_S3_R1_001
                    38.5%
                    41%
                    6.4
                    7206-D9-HiSeqTranscript_S3_R2_001
                    38.6%
                    41%
                    6.4
                    + + +
                    + + + + + + +
                    + + +
                    +

                    FastQC

                    +

                    FastQC is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.

                    + + + + +
                    + +

                    + Sequence Counts + + + +

                    + +

                    Sequence counts for each sample. Duplicate read counts are an estimate only.

                    + + +
                    +

                    This plot show the total number of reads, broken down into unique and duplicate +if possible (only more recent versions of FastQC give duplicate info).

                    +

                    You can read more about duplicate calculation in the +FastQC documentation. +A small part has been copied here for convenience:

                    +

                    Only sequences which first appear in the first 100,000 sequences +in each file are analysed. This should be enough to get a good impression +for the duplication levels in the whole file. Each sequence is tracked to +the end of the file to give a representative count of the overall duplication level.

                    +

                    The duplication detection requires an exact sequence match over the whole length of +the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

                    +
                    + +
                    + + +
                    +
                    loading..
                    +
                    + + +
                    +
                    + + +
                    + + + + + +
                    + +

                    + Sequence Quality Histograms + + + +

                    + +

                    The mean quality value across each base position in the read.

                    + + +
                    +

                    To enable multiple samples to be plotted on the same graph, only the mean quality +scores are plotted (unlike the box plots seen in FastQC reports).

                    +

                    Taken from the FastQC help:

                    +

                    The y-axis on the graph shows the quality scores. The higher the score, the better +the base call. The background of the graph divides the y axis into very good quality +calls (green), calls of reasonable quality (orange), and calls of poor quality (red). +The quality of calls on most platforms will degrade as the run progresses, so it is +common to see base calls falling into the orange area towards the end of a read.

                    +
                    + +
                    loading..
                    +
                    + + +
                    +
                    + + + + + + +
                    + +

                    + Per Sequence Quality Scores + + + +

                    + +

                    The number of reads with average quality scores. Shows if a subset of reads has poor quality.

                    + + +
                    +

                    From the FastQC help:

                    +

                    The per sequence quality score report allows you to see if a subset of your +sequences have universally low quality values. It is often the case that a +subset of sequences will have universally poor quality, however these should +represent only a small percentage of the total sequences.

                    +
                    + +
                    loading..
                    +
                    + + +
                    +
                    + + + + + + +
                    + +

                    + Per Base Sequence Content + + + +

                    + +

                    The proportion of each base position for which each of the four normal DNA bases has been called.

                    + + +
                    +

                    To enable multiple samples to be shown in a single plot, the base composition data +is shown as a heatmap. The colours represent the balance between the four bases: +an even distribution should give an even muddy brown colour. Hover over the plot +to see the percentage of the four bases under the cursor.

                    +

                    To see the data as a line plot, as in the original FastQC graph, click on a sample track.

                    +

                    From the FastQC help:

                    +

                    Per Base Sequence Content plots out the proportion of each base position in a +file for which each of the four normal DNA bases has been called.

                    +

                    In a random library you would expect that there would be little to no difference +between the different bases of a sequence run, so the lines in this plot should +run parallel with each other. The relative amount of each base should reflect +the overall amount of these bases in your genome, but in any case they should +not be hugely imbalanced from each other.

                    +

                    It's worth noting that some types of library will always produce biased sequence +composition, normally at the start of the read. Libraries produced by priming +using random hexamers (including nearly all RNA-Seq libraries) and those which +were fragmented using transposases inherit an intrinsic bias in the positions +at which reads start. This bias does not concern an absolute sequence, but instead +provides enrichement of a number of different K-mers at the 5' end of the reads. +Whilst this is a true technical bias, it isn't something which can be corrected +by trimming and in most cases doesn't seem to adversely affect the downstream +analysis.

                    +
                    + +
                    +
                    +
                    + + Click a sample row to see a line plot for that dataset. +
                    +
                    Rollover for sample name
                    + +
                    + Position: - +
                    %T: -
                    +
                    %C: -
                    +
                    %A: -
                    +
                    %G: -
                    +
                    +
                    +
                    + +
                    +
                    +
                    +
                    + + + +
                    +
                    + + + + + + +
                    + +

                    + Per Sequence GC Content + + + +

                    + +

                    The average GC content of reads. Normal random library typically have a + roughly normal distribution of GC content.

                    + + +
                    +

                    From the FastQC help:

                    +

                    This module measures the GC content across the whole length of each sequence +in a file and compares it to a modelled normal distribution of GC content.

                    +

                    In a normal random library you would expect to see a roughly normal distribution +of GC content where the central peak corresponds to the overall GC content of +the underlying genome. Since we don't know the the GC content of the genome the +modal GC content is calculated from the observed data and used to build a +reference distribution.

                    +

                    An unusually shaped distribution could indicate a contaminated library or +some other kinds of biased subset. A normal distribution which is shifted +indicates some systematic bias which is independent of base position. If there +is a systematic bias which creates a shifted normal distribution then this won't +be flagged as an error by the module since it doesn't know what your genome's +GC content should be.

                    +
                    + +
                    + + +
                    + +
                    loading..
                    +
                    + + +
                    +
                    + + + + + + +
                    + +

                    + Per Base N Content + + + +

                    + +

                    The percentage of base calls at each position for which an N was called.

                    + + +
                    +

                    From the FastQC help:

                    +

                    If a sequencer is unable to make a base call with sufficient confidence then it will +normally substitute an N rather than a conventional base call. This graph shows the +percentage of base calls at each position for which an N was called.

                    +

                    It's not unusual to see a very low proportion of Ns appearing in a sequence, especially +nearer the end of a sequence. However, if this proportion rises above a few percent +it suggests that the analysis pipeline was unable to interpret the data well enough to +make valid base calls.

                    +
                    + +
                    loading..
                    +
                    + + +
                    +
                    + + + + + + +
                    + +

                    + Sequence Length Distribution + +

                    + +

                    The distribution of fragment sizes (read lengths) found. + See the FastQC help

                    + + +
                    loading..
                    +
                    + + +
                    +
                    + + + + + + +
                    + +

                    + Sequence Duplication Levels + + + +

                    + +

                    The relative level of duplication found for every sequence.

                    + + +
                    +

                    From the FastQC Help:

                    +

                    In a diverse library most sequences will occur only once in the final set. +A low level of duplication may indicate a very high level of coverage of the +target sequence, but a high level of duplication is more likely to indicate +some kind of enrichment bias (eg PCR over amplification). This graph shows +the degree of duplication for every sequence in a library: the relative +number of sequences with different degrees of duplication.

                    +

                    Only sequences which first appear in the first 100,000 sequences +in each file are analysed. This should be enough to get a good impression +for the duplication levels in the whole file. Each sequence is tracked to +the end of the file to give a representative count of the overall duplication level.

                    +

                    The duplication detection requires an exact sequence match over the whole length of +the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.

                    +

                    In a properly diverse library most sequences should fall into the far left of the +plot in both the red and blue lines. A general level of enrichment, indicating broad +oversequencing in the library will tend to flatten the lines, lowering the low end +and generally raising other categories. More specific enrichments of subsets, or +the presence of low complexity contaminants will tend to produce spikes towards the +right of the plot.

                    +
                    + +
                    loading..
                    +
                    + + +
                    +
                    + + + + + + +
                    + +

                    + Overrepresented sequences + + + +

                    + +

                    The total amount of overrepresented sequences found in each library.

                    + + +
                    +

                    FastQC calculates and lists overrepresented sequences in FastQ files. It would not be +possible to show this for all samples in a MultiQC report, so instead this plot shows +the number of sequences categorized as over represented.

                    +

                    Sometimes, a single sequence may account for a large number of reads in a dataset. +To show this, the bars are split into two: the first shows the overrepresented reads +that come from the single most common sequence. The second shows the total count +from all remaining overrepresented sequences.

                    +

                    From the FastQC Help:

                    +

                    A normal high-throughput library will contain a diverse set of sequences, with no +individual sequence making up a tiny fraction of the whole. Finding that a single +sequence is very overrepresented in the set either means that it is highly biologically +significant, or indicates that the library is contaminated, or not as diverse as you expected.

                    +

                    FastQC lists all of the sequences which make up more than 0.1% of the total. +To conserve memory only sequences which appear in the first 100,000 sequences are tracked +to the end of the file. It is therefore possible that a sequence which is overrepresented +but doesn't appear at the start of the file for some reason could be missed by this module.

                    +
                    + +
                    +
                    loading..
                    +
                    + + +
                    +
                    + + + + + + +
                    + +

                    + Adapter Content + + + +

                    + +

                    The cumulative percentage count of the proportion of your + library which has seen each of the adapter sequences at each position.

                    + + +
                    +

                    Note that only samples with ≥ 0.1% adapter contamination are shown.

                    +

                    There may be several lines per sample, as one is shown for each adapter +detected in the file.

                    +

                    From the FastQC Help:

                    +

                    The plot shows a cumulative percentage count of the proportion +of your library which has seen each of the adapter sequences at each position. +Once a sequence has been seen in a read it is counted as being present +right through to the end of the read so the percentages you see will only +increase as the read length goes on.

                    +
                    + +
                    loading..
                    +
                    + + +
                    +
                    + + + + + + +
                    + +

                    + Status Checks + + + +

                    + +

                    Status for each FastQC section showing whether results seem entirely normal (green), +slightly abnormal (orange) or very unusual (red).

                    + + +
                    +

                    FastQC assigns a status for each section of the report. +These give a quick evaluation of whether the results of the analysis seem +entirely normal (green), slightly abnormal (orange) or very unusual (red).

                    +

                    It is important to stress that although the analysis results appear to give a pass/fail result, +these evaluations must be taken in the context of what you expect from your library. +A 'normal' sample as far as FastQC is concerned is random and diverse. +Some experiments may be expected to produce libraries which are biased in particular ways. +You should treat the summary evaluations therefore as pointers to where you should concentrate +your attention and understand why your library may not look random and diverse.

                    +

                    Specific guidance on how to interpret the output of each module can be found in the relevant +report section, or in the FastQC help.

                    +

                    In this heatmap, we summarise all of these into a single heatmap for a quick overview. +Note that not all FastQC sections have plots in MultiQC reports, but all status checks +are shown in this heatmap.

                    +
                    + +
                    +
                    +
                    + +
                    +
                    +
                    + + + +
                    +
                    + + + +
                    +
                    +
                    +
                    + loading.. +
                    +
                    +
                    +
                    + + + +
                    + + + +
                    + + + + +
                    + + + + + + + + + + + + + + + +