You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: docs/Viralrecon-service.md
+51-4
Original file line number
Diff line number
Diff line change
@@ -74,6 +74,9 @@ Once the pipelines have been executed for all references, you can continue with
74
74
75
75
Once finished, repeat the process if there are any other host.
76
76
77
+
> [!WARNING]
78
+
> If there is more than 1 host, please remember to use the appropriate kraken database. Full info on which organisms are associated with each kraken database can be found in **`/data/bi/references/kraken/README`**.
79
+
77
80
---
78
81
79
82
On completion of the pipeline, it is strongly recommended to review different files to check that it all worked properly. Among others, the 3 essential ones are the following:
@@ -86,7 +89,10 @@ On completion of the pipeline, it is strongly recommended to review different fi
86
89
87
90
---
88
91
89
-
Meantime, you can access the `YYMMDD_ANALYSIS_0X_MAG` folder and execute the process following its [manual](https://github.com/BU-ISCIII/BU-ISCIII/wiki/MAG-service).
92
+
In the meantime, you can access the `YYMMDD_ANALYSIS_0X_MAG` folder and execute the process following its [manual](https://github.com/BU-ISCIII/BU-ISCIII/wiki/MAG-service).
93
+
94
+
> [!WARNING]
95
+
> Please take into account that, if there are both samples with **single-end reads** and **paired-end reads** associated with the service, you'll have to run **MAG** for single-end and paired-end reads **separately** (use the `--single_end` parameter when running MAG with the samples that contain single-end reads).
90
96
91
97
If the pipeline has **successfully finished**, move to the `../RESULTS` folder.
92
98
@@ -96,6 +102,9 @@ Check the lablog and execute it.
96
102
97
103
Access the newly created folder and execute the scripts in order.
98
104
105
+
> [!WARNING]
106
+
> If the researcher asked you to provide them with the **reads without host**, create a folder within `/RESULTS/*_entrega01/`, and copy these reads inside this folder, so that the researcher can find them easily.
107
+
99
108
If everything is correct and the necessary files and links have been generated, you can proceed with the service completion. To do this, execute the finish module of buisciii-tools.
100
109
101
110
$ bu-isciii finish SRVCNMXXX.X
@@ -113,6 +122,7 @@ Lastly, remember to remove all the files related to this service from `scratch_t
113
122
$ bu-isciii scratch SRVCNMXXX.X
114
123
> remove_scratch
115
124
125
+
---
116
126
117
127
### How to create scheme.bed
118
128
@@ -141,13 +151,47 @@ This will generate a series of alignments with your primers sequences in `blast.
141
151
Once you have selected the corresponding lines, you can execute an auxiliar script called `blast_parser.py` (you may find it in `/data/bi/references/auxiliar_scripts/`) to do the rest of the work:
* Make sure that `samples_ref` file does not contain any spaces as it's read as a tab-separated file. You can use `cat -A samples_ref.txt` to ensure this (`\t` is shown as `^I` in this case).
158
+
159
+
* When there's a mix of full-numerical and strings in sample IDs (e.g. `87439.fastq.gz` and `SARS_01.fastq.gz`) the pipeline may crush in `MULTIQC` step. This is caused because there's a bug with MultiQC ([MultiQC issue](https://github.com/nf-core/viralrecon/issues/345)) that can be temporarily fixed by adding any non-numerical character to the sample IDs. Nevertheless, you can follow the instructions in [this tutorial](https://drive.google.com/drive/u/0/folders/1-GafpZR2HVlecNaAsXIslK3aecHplD4z) to properly correct this error.
145
160
146
-
Make sure that `samples_ref` file does not contain any spaces as it's read as a tab-separated file. You can use `cat -A samples_ref.txt` to ensure this (`\t` is shown as `^I` in this case).
161
+
* When running `bash _01_run_<reference1>.sh` and checking the `.log` file, you might see the following message related with Bowtie2, where `XXX` is the number of mapped reads against the reference for each sample:
162
+
> -[nf-core/viralrecon] X samples skipped since they failed Bowtie2 1000 mapped read threshold: <br> XXX: SAMPLE1 <br> XXX: SAMPLE2 <br> ... </br>
163
+
164
+
If this happens, you'll have to determine if it is still worth it to analyse these samples (**we want to find out if the number of mapped reads for each sample (XXX above) is greater or equal to the theoretical number of reads needed for a 10x coverage of the reference genome**). To do so, do the following calculation:
165
+
> A) **PAIRED-END READS**: number of reads = (genome size * 10) / (read length x 2) <br> B) **SINGLE-END READS**: number of reads = (genome size * 10) / (read length) </br>
147
166
148
-
When there's a mix of full-numerical and strings in sample IDs (e.g. `87439.fastq.gz` and `SARS_01.fastq.gz`) the pipeline may crush in `MULTIQC` step. This is caused because there's a bug with MultiQC ([MultiQC issue](https://github.com/nf-core/viralrecon/issues/345)) that can be temporarily fixed by adding any non-numerical character to the sample IDs. Nevertheless, you can follow the instructions in [this tutorial](https://drive.google.com/drive/u/0/folders/1-GafpZR2HVlecNaAsXIslK3aecHplD4z) to properly correct this error.
167
+
If the obtained number is greater than the one associated with each sample (XXX), this sample can be **omitted** in the analysis. Otherwise, it might be worth it to decrease `--min_mapped_reads` (default value: **1000**) when running viralrecon so that this sample is not skipped by Bowtie2.
149
168
169
+
* If the service contains samples from **multiple runs**, the corresponding `./mapping_illumina_yyyymmdd.xlsx` will contain, in its first column, only one value: the name of the first run that was found by the script, which is used in this file for every sample even though that's not correct. Make sure to change this manually in this file so that every sample is associated with its correct run.
150
170
171
+
* If your service contains samples with single-end reads, the `%unmapped_reads` column in `./mapping_illumina_yyyymmdd.xlsx` will have **negative** values, which clearly makes no sense. If this happens, you'll have to do the following calculations on the following columns of this file, so that data is correct:
172
+
* Column `readshostR1` must be equal to `readshost` for these samples.
* Make sure no **Ns** are found in the `variants_long_table.csv` file.
178
+
179
+
---
180
+
181
+
### What if the researcher wants me to send them the no host reads?
182
+
183
+
If the researcher, when requesting the service, asks you to provide them with the **.fastq.gz files with no host**, you'll have to add this on `/DOC/viralrecon.config`:
The no host reads will be inside **`/*_mapping/kraken2/`** for each host and each reference.
193
+
194
+
---
151
195
152
196
## Viralrecon report template
153
197
@@ -163,4 +207,7 @@ When there's a mix of full-numerical and strings in sample IDs (e.g. `87439.fast
163
207
-**Cantidad de lecturas**: 0,5M - 2.4M
164
208
-**Calidad general**: Buena
165
209
-**Incidencia muestras individuales**:
166
-
- 2 muestras no consiguen mapear (CONTROLNEGATIVO y POSUL54). Tienen muy pocas lecturas, baja calidad y elevado porcentaje de secuencias sobrerrepresentadas.
210
+
- 2 muestras no consiguen mapear (CONTROLNEGATIVO y POSUL54). Tienen muy pocas lecturas, baja calidad y elevado porcentaje de secuencias sobrerrepresentadas.
0 commit comments