Skip to content

Commit 296d0cf

Browse files
committed
Updated viralrecon documentation
1 parent 57a8170 commit 296d0cf

File tree

1 file changed

+51
-4
lines changed

1 file changed

+51
-4
lines changed

docs/Viralrecon-service.md

+51-4
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,9 @@ Once the pipelines have been executed for all references, you can continue with
7474
7575
Once finished, repeat the process if there are any other host.
7676

77+
> [!WARNING]
78+
> If there is more than 1 host, please remember to use the appropriate kraken database. Full info on which organisms are associated with each kraken database can be found in **`/data/bi/references/kraken/README`**.
79+
7780
---
7881

7982
On completion of the pipeline, it is strongly recommended to review different files to check that it all worked properly. Among others, the 3 essential ones are the following:
@@ -86,7 +89,10 @@ On completion of the pipeline, it is strongly recommended to review different fi
8689

8790
---
8891

89-
Meantime, you can access the `YYMMDD_ANALYSIS_0X_MAG` folder and execute the process following its [manual](https://github.com/BU-ISCIII/BU-ISCIII/wiki/MAG-service).
92+
In the meantime, you can access the `YYMMDD_ANALYSIS_0X_MAG` folder and execute the process following its [manual](https://github.com/BU-ISCIII/BU-ISCIII/wiki/MAG-service).
93+
94+
> [!WARNING]
95+
> Please take into account that, if there are both samples with **single-end reads** and **paired-end reads** associated with the service, you'll have to run **MAG** for single-end and paired-end reads **separately** (use the `--single_end` parameter when running MAG with the samples that contain single-end reads).
9096
9197
If the pipeline has **successfully finished**, move to the `../RESULTS` folder.
9298

@@ -96,6 +102,9 @@ Check the lablog and execute it.
96102

97103
Access the newly created folder and execute the scripts in order.
98104

105+
> [!WARNING]
106+
> If the researcher asked you to provide them with the **reads without host**, create a folder within `/RESULTS/*_entrega01/`, and copy these reads inside this folder, so that the researcher can find them easily.
107+
99108
If everything is correct and the necessary files and links have been generated, you can proceed with the service completion. To do this, execute the finish module of buisciii-tools.
100109

101110
$ bu-isciii finish SRVCNMXXX.X
@@ -113,6 +122,7 @@ Lastly, remember to remove all the files related to this service from `scratch_t
113122
$ bu-isciii scratch SRVCNMXXX.X
114123
> remove_scratch
115124

125+
---
116126

117127
### How to create scheme.bed
118128

@@ -141,13 +151,47 @@ This will generate a series of alignments with your primers sequences in `blast.
141151
Once you have selected the corresponding lines, you can execute an auxiliar script called `blast_parser.py` (you may find it in `/data/bi/references/auxiliar_scripts/`) to do the rest of the work:
142152
```python3 blast_parser.py blast_mod.txt scheme.bed```
143153

154+
---
155+
144156
### Common errors while running the service
157+
* Make sure that `samples_ref` file does not contain any spaces as it's read as a tab-separated file. You can use `cat -A samples_ref.txt` to ensure this (`\t` is shown as `^I` in this case).
158+
159+
* When there's a mix of full-numerical and strings in sample IDs (e.g. `87439.fastq.gz` and `SARS_01.fastq.gz`) the pipeline may crush in `MULTIQC` step. This is caused because there's a bug with MultiQC ([MultiQC issue](https://github.com/nf-core/viralrecon/issues/345)) that can be temporarily fixed by adding any non-numerical character to the sample IDs. Nevertheless, you can follow the instructions in [this tutorial](https://drive.google.com/drive/u/0/folders/1-GafpZR2HVlecNaAsXIslK3aecHplD4z) to properly correct this error.
145160

146-
Make sure that `samples_ref` file does not contain any spaces as it's read as a tab-separated file. You can use `cat -A samples_ref.txt` to ensure this (`\t` is shown as `^I` in this case).
161+
* When running `bash _01_run_<reference1>.sh` and checking the `.log` file, you might see the following message related with Bowtie2, where `XXX` is the number of mapped reads against the reference for each sample:
162+
> -[nf-core/viralrecon] X samples skipped since they failed Bowtie2 1000 mapped read threshold: <br> XXX: SAMPLE1 <br> XXX: SAMPLE2 <br> ... </br>
163+
164+
If this happens, you'll have to determine if it is still worth it to analyse these samples (**we want to find out if the number of mapped reads for each sample (XXX above) is greater or equal to the theoretical number of reads needed for a 10x coverage of the reference genome**). To do so, do the following calculation:
165+
> A) **PAIRED-END READS**: number of reads = (genome size * 10) / (read length x 2) <br> B) **SINGLE-END READS**: number of reads = (genome size * 10) / (read length) </br>
147166
148-
When there's a mix of full-numerical and strings in sample IDs (e.g. `87439.fastq.gz` and `SARS_01.fastq.gz`) the pipeline may crush in `MULTIQC` step. This is caused because there's a bug with MultiQC ([MultiQC issue](https://github.com/nf-core/viralrecon/issues/345)) that can be temporarily fixed by adding any non-numerical character to the sample IDs. Nevertheless, you can follow the instructions in [this tutorial](https://drive.google.com/drive/u/0/folders/1-GafpZR2HVlecNaAsXIslK3aecHplD4z) to properly correct this error.
167+
If the obtained number is greater than the one associated with each sample (XXX), this sample can be **omitted** in the analysis. Otherwise, it might be worth it to decrease `--min_mapped_reads` (default value: **1000**) when running viralrecon so that this sample is not skipped by Bowtie2.
149168

169+
* If the service contains samples from **multiple runs**, the corresponding `./mapping_illumina_yyyymmdd.xlsx` will contain, in its first column, only one value: the name of the first run that was found by the script, which is used in this file for every sample even though that's not correct. Make sure to change this manually in this file so that every sample is associated with its correct run.
150170

171+
* If your service contains samples with single-end reads, the `%unmapped_reads` column in `./mapping_illumina_yyyymmdd.xlsx` will have **negative** values, which clearly makes no sense. If this happens, you'll have to do the following calculations on the following columns of this file, so that data is correct:
172+
* Column `readshostR1` must be equal to `readshost` for these samples.
173+
* `%readshost` = (`readshost` * 100) / `totalreads`
174+
* `unmappedreads` = `totalreads` - (`readshost` + `readsvirus`)
175+
* `%unmappedreads` = (`unmappedreads` * 100) / `totalreads`
176+
177+
* Make sure no **Ns** are found in the `variants_long_table.csv` file.
178+
179+
---
180+
181+
### What if the researcher wants me to send them the no host reads?
182+
183+
If the researcher, when requesting the service, asks you to provide them with the **.fastq.gz files with no host**, you'll have to add this on `/DOC/viralrecon.config`:
184+
185+
```
186+
withName: 'KRAKEN2_KRAKEN2' {
187+
publishDir = [
188+
pattern: "*.{unclassified.fastq.gz,unclassified_1.fastq.gz,unclassified_2.fastq.gz,txt}"
189+
]
190+
}
191+
```
192+
The no host reads will be inside **`/*_mapping/kraken2/`** for each host and each reference.
193+
194+
---
151195

152196
## Viralrecon report template
153197

@@ -163,4 +207,7 @@ When there's a mix of full-numerical and strings in sample IDs (e.g. `87439.fast
163207
- **Cantidad de lecturas**: 0,5M - 2.4M
164208
- **Calidad general**: Buena
165209
- **Incidencia muestras individuales**:
166-
- 2 muestras no consiguen mapear (CONTROLNEGATIVO y POSUL54). Tienen muy pocas lecturas, baja calidad y elevado porcentaje de secuencias sobrerrepresentadas.
210+
- 2 muestras no consiguen mapear (CONTROLNEGATIVO y POSUL54). Tienen muy pocas lecturas, baja calidad y elevado porcentaje de secuencias sobrerrepresentadas.
211+
212+
213+
---

0 commit comments

Comments
 (0)