+ + + + +
+ + + ++ A modular tool to aggregate results from bioinformatics analyses across many samples into a single report. +
+ + + + + + + + + + + +Report
+
+ generated on 2024-10-30, 14:38
+
+
+ based on data in:
+
+ /home/sgurr/Airradians_F2Juveniles_TagSeq/output/trim/adapters_only
+ + + + + + + + +
General Statistics
+ + + + + + + + + + + Showing 44/44 rows and 4/5 columns. + +Sample Name | % Dups | % GC | Read Length | M Seqs |
---|---|---|---|---|
adapters.7206-B11-HiSeqTranscript_S4_R1_001 | 78.0% | 40% | 136 bp | 5.8 |
adapters.7206-B11-HiSeqTranscript_S4_R2_001 | 75.6% | 40% | 136 bp | 5.5 |
adapters.7206-B12-HiSeqTranscript_S5_R1_001 | 43.7% | 40% | 140 bp | 4.8 |
adapters.7206-B12-HiSeqTranscript_S5_R2_001 | 40.1% | 40% | 139 bp | 4.4 |
adapters.7206-C1-HiSeqTranscript_S1_R1_001 | 42.8% | 40% | 146 bp | 5.3 |
adapters.7206-C1-HiSeqTranscript_S1_R2_001 | 41.4% | 40% | 145 bp | 5.5 |
adapters.7206-C11-HiSeqTranscript_S7_R1_001 | 44.5% | 40% | 141 bp | 5.9 |
adapters.7206-C11-HiSeqTranscript_S7_R2_001 | 41.7% | 41% | 140 bp | 5.6 |
adapters.7206-C12-HiSeqTranscript_S4_R1_001 | 51.4% | 41% | 146 bp | 6.0 |
adapters.7206-C12-HiSeqTranscript_S4_R2_001 | 49.9% | 41% | 146 bp | 5.8 |
adapters.7206-C2-HiSeqTranscript_S5_R1_001 | 65.6% | 39% | 149 bp | 3.5 |
adapters.7206-C2-HiSeqTranscript_S5_R2_001 | 64.1% | 39% | 149 bp | 3.4 |
adapters.7206-C3-HiSeqTranscript_S5_R1_001 | 52.5% | 41% | 139 bp | 8.1 |
adapters.7206-C3-HiSeqTranscript_S5_R2_001 | 52.1% | 41% | 139 bp | 8.7 |
adapters.7206-C4-HiSeqTranscript_S8_R1_001 | 47.4% | 41% | 144 bp | 5.6 |
adapters.7206-C4-HiSeqTranscript_S8_R2_001 | 46.2% | 41% | 144 bp | 5.5 |
adapters.7206-C5-HiSeqTranscript_S9_R1_001 | 49.0% | 41% | 138 bp | 5.3 |
adapters.7206-C5-HiSeqTranscript_S9_R2_001 | 47.1% | 41% | 137 bp | 5.1 |
adapters.7206-C6-HiSeqTranscript_S4_R1_001 | 52.5% | 41% | 141 bp | 9.7 |
adapters.7206-C6-HiSeqTranscript_S4_R2_001 | 51.9% | 41% | 140 bp | 10.2 |
adapters.7206-C7-HiSeqTranscript_S3_R1_001 | 47.5% | 41% | 142 bp | 5.2 |
adapters.7206-C7-HiSeqTranscript_S3_R2_001 | 44.7% | 41% | 142 bp | 4.8 |
adapters.7206-C8-HiSeqTranscript_S4_R1_001 | 48.4% | 40% | 142 bp | 5.0 |
adapters.7206-C8-HiSeqTranscript_S4_R2_001 | 45.6% | 40% | 142 bp | 4.7 |
adapters.7206-C9-HiSeqTranscript_S2_R1_001 | 51.2% | 41% | 144 bp | 9.7 |
adapters.7206-C9-HiSeqTranscript_S2_R2_001 | 50.8% | 41% | 143 bp | 10.3 |
adapters.7206-D1-HiSeqTranscript_S5_R1_001 | 47.3% | 40% | 147 bp | 5.1 |
adapters.7206-D1-HiSeqTranscript_S5_R2_001 | 45.7% | 40% | 147 bp | 4.9 |
adapters.7206-D2-HiSeqTranscript_S5_R1_001 | 60.4% | 40% | 146 bp | 23.5 |
adapters.7206-D2-HiSeqTranscript_S5_R2_001 | 58.9% | 40% | 145 bp | 22.8 |
adapters.7206-D3-HiSeqTranscript_S3_R1_001 | 46.1% | 40% | 146 bp | 7.1 |
adapters.7206-D3-HiSeqTranscript_S3_R2_001 | 43.7% | 40% | 146 bp | 6.8 |
adapters.7206-D4-HiSeqTranscript_S38_R1_001 | 55.4% | 40% | 144 bp | 5.8 |
adapters.7206-D4-HiSeqTranscript_S38_R2_001 | 53.8% | 40% | 144 bp | 5.7 |
adapters.7206-D5-HiSeqTranscript_S37_R1_001 | 42.1% | 40% | 139 bp | 5.3 |
adapters.7206-D5-HiSeqTranscript_S37_R2_001 | 40.0% | 41% | 139 bp | 5.1 |
adapters.7206-D6-HiSeqTranscript_S1_R1_001 | 51.2% | 41% | 144 bp | 9.7 |
adapters.7206-D6-HiSeqTranscript_S1_R2_001 | 49.4% | 41% | 144 bp | 9.5 |
adapters.7206-D7-HiSeqTranscript_S2_R1_001 | 59.0% | 40% | 145 bp | 10.0 |
adapters.7206-D7-HiSeqTranscript_S2_R2_001 | 57.2% | 40% | 145 bp | 9.7 |
adapters.7206-D8-HiSeqTranscript_S10_R1_001 | 47.4% | 40% | 144 bp | 5.0 |
adapters.7206-D8-HiSeqTranscript_S10_R2_001 | 45.7% | 40% | 144 bp | 4.8 |
adapters.7206-D9-HiSeqTranscript_S3_R1_001 | 41.7% | 40% | 147 bp | 5.5 |
adapters.7206-D9-HiSeqTranscript_S3_R2_001 | 41.5% | 41% | 146 bp | 5.9 |
FastQC
+FastQC is a quality control tool for high throughput sequence data, written by Simon Andrews at the Babraham Institute in Cambridge.
+ + + + ++ Sequence Counts + + + +
+ +Sequence counts for each sample. Duplicate read counts are an estimate only.
This plot show the total number of reads, broken down into unique and duplicate +if possible (only more recent versions of FastQC give duplicate info).
+You can read more about duplicate calculation in the +FastQC documentation. +A small part has been copied here for convenience:
+Only sequences which first appear in the first 100,000 sequences +in each file are analysed. This should be enough to get a good impression +for the duplication levels in the whole file. Each sequence is tracked to +the end of the file to give a representative count of the overall duplication level.
+The duplication detection requires an exact sequence match over the whole length of +the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.
+
+ Sequence Quality Histograms + + + +
+ +The mean quality value across each base position in the read.
To enable multiple samples to be plotted on the same graph, only the mean quality +scores are plotted (unlike the box plots seen in FastQC reports).
+Taken from the FastQC help:
+The y-axis on the graph shows the quality scores. The higher the score, the better +the base call. The background of the graph divides the y axis into very good quality +calls (green), calls of reasonable quality (orange), and calls of poor quality (red). +The quality of calls on most platforms will degrade as the run progresses, so it is +common to see base calls falling into the orange area towards the end of a read.
+
+ Per Sequence Quality Scores + + + +
+ +The number of reads with average quality scores. Shows if a subset of reads has poor quality.
From the FastQC help:
+The per sequence quality score report allows you to see if a subset of your +sequences have universally low quality values. It is often the case that a +subset of sequences will have universally poor quality, however these should +represent only a small percentage of the total sequences.
+
+ Per Base Sequence Content + + + +
+ +The proportion of each base position for which each of the four normal DNA bases has been called.
To enable multiple samples to be shown in a single plot, the base composition data +is shown as a heatmap. The colours represent the balance between the four bases: +an even distribution should give an even muddy brown colour. Hover over the plot +to see the percentage of the four bases under the cursor.
+To see the data as a line plot, as in the original FastQC graph, click on a sample track.
+From the FastQC help:
+Per Base Sequence Content plots out the proportion of each base position in a +file for which each of the four normal DNA bases has been called.
+In a random library you would expect that there would be little to no difference +between the different bases of a sequence run, so the lines in this plot should +run parallel with each other. The relative amount of each base should reflect +the overall amount of these bases in your genome, but in any case they should +not be hugely imbalanced from each other.
+It's worth noting that some types of library will always produce biased sequence +composition, normally at the start of the read. Libraries produced by priming +using random hexamers (including nearly all RNA-Seq libraries) and those which +were fragmented using transposases inherit an intrinsic bias in the positions +at which reads start. This bias does not concern an absolute sequence, but instead +provides enrichement of a number of different K-mers at the 5' end of the reads. +Whilst this is a true technical bias, it isn't something which can be corrected +by trimming and in most cases doesn't seem to adversely affect the downstream +analysis.
Rollover for sample name
+ ++
+ Per Sequence GC Content + + + +
+ +The average GC content of reads. Normal random library typically have a + roughly normal distribution of GC content.
From the FastQC help:
+This module measures the GC content across the whole length of each sequence +in a file and compares it to a modelled normal distribution of GC content.
+In a normal random library you would expect to see a roughly normal distribution +of GC content where the central peak corresponds to the overall GC content of +the underlying genome. Since we don't know the the GC content of the genome the +modal GC content is calculated from the observed data and used to build a +reference distribution.
+An unusually shaped distribution could indicate a contaminated library or +some other kinds of biased subset. A normal distribution which is shifted +indicates some systematic bias which is independent of base position. If there +is a systematic bias which creates a shifted normal distribution then this won't +be flagged as an error by the module since it doesn't know what your genome's +GC content should be.
+
+ Per Base N Content + + + +
+ +The percentage of base calls at each position for which an N
was called.
From the FastQC help:
+If a sequencer is unable to make a base call with sufficient confidence then it will
+normally substitute an N
rather than a conventional base call. This graph shows the
+percentage of base calls at each position for which an N
was called.
It's not unusual to see a very low proportion of Ns appearing in a sequence, especially +nearer the end of a sequence. However, if this proportion rises above a few percent +it suggests that the analysis pipeline was unable to interpret the data well enough to +make valid base calls.
+
+ Sequence Length Distribution + +
+ +The distribution of fragment sizes (read lengths) found. + See the FastQC help
+
+ Sequence Duplication Levels + + + +
+ +The relative level of duplication found for every sequence.
From the FastQC Help:
+In a diverse library most sequences will occur only once in the final set. +A low level of duplication may indicate a very high level of coverage of the +target sequence, but a high level of duplication is more likely to indicate +some kind of enrichment bias (eg PCR over amplification). This graph shows +the degree of duplication for every sequence in a library: the relative +number of sequences with different degrees of duplication.
+Only sequences which first appear in the first 100,000 sequences +in each file are analysed. This should be enough to get a good impression +for the duplication levels in the whole file. Each sequence is tracked to +the end of the file to give a representative count of the overall duplication level.
+The duplication detection requires an exact sequence match over the whole length of +the sequence. Any reads over 75bp in length are truncated to 50bp for this analysis.
+In a properly diverse library most sequences should fall into the far left of the +plot in both the red and blue lines. A general level of enrichment, indicating broad +oversequencing in the library will tend to flatten the lines, lowering the low end +and generally raising other categories. More specific enrichments of subsets, or +the presence of low complexity contaminants will tend to produce spikes towards the +right of the plot.
+
+ Overrepresented sequences + + + +
+ +The total amount of overrepresented sequences found in each library.
FastQC calculates and lists overrepresented sequences in FastQ files. It would not be +possible to show this for all samples in a MultiQC report, so instead this plot shows +the number of sequences categorized as over represented.
+Sometimes, a single sequence may account for a large number of reads in a dataset. +To show this, the bars are split into two: the first shows the overrepresented reads +that come from the single most common sequence. The second shows the total count +from all remaining overrepresented sequences.
+From the FastQC Help:
+A normal high-throughput library will contain a diverse set of sequences, with no +individual sequence making up a tiny fraction of the whole. Finding that a single +sequence is very overrepresented in the set either means that it is highly biologically +significant, or indicates that the library is contaminated, or not as diverse as you expected.
+FastQC lists all of the sequences which make up more than 0.1% of the total. +To conserve memory only sequences which appear in the first 100,000 sequences are tracked +to the end of the file. It is therefore possible that a sequence which is overrepresented +but doesn't appear at the start of the file for some reason could be missed by this module.
+
+ Adapter Content + + + +
+ +The cumulative percentage count of the proportion of your + library which has seen each of the adapter sequences at each position.
Note that only samples with ≥ 0.1% adapter contamination are shown.
+There may be several lines per sample, as one is shown for each adapter +detected in the file.
+From the FastQC Help:
+The plot shows a cumulative percentage count of the proportion +of your library which has seen each of the adapter sequences at each position. +Once a sequence has been seen in a read it is counted as being present +right through to the end of the read so the percentages you see will only +increase as the read length goes on.
+
+ Status Checks + + + +
+ +Status for each FastQC section showing whether results seem entirely normal (green), +slightly abnormal (orange) or very unusual (red).
FastQC assigns a status for each section of the report. +These give a quick evaluation of whether the results of the analysis seem +entirely normal (green), slightly abnormal (orange) or very unusual (red).
+It is important to stress that although the analysis results appear to give a pass/fail result, +these evaluations must be taken in the context of what you expect from your library. +A 'normal' sample as far as FastQC is concerned is random and diverse. +Some experiments may be expected to produce libraries which are biased in particular ways. +You should treat the summary evaluations therefore as pointers to where you should concentrate +your attention and understand why your library may not look random and diverse.
+Specific guidance on how to interpret the output of each module can be found in the relevant +report section, or in the FastQC help.
+In this heatmap, we summarise all of these into a single heatmap for a quick overview. +Note that not all FastQC sections have plots in MultiQC reports, but all status checks +are shown in this heatmap.