Skip to content

Commit 48cf9f5

Browse files
committed
Added PikaVirus report
1 parent 1674491 commit 48cf9f5

File tree

1 file changed

+74
-0
lines changed

1 file changed

+74
-0
lines changed

docs/PikaVirus-service.md

+74
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# How to perform a PikaVirus Service
2+
3+
This is a brief tutorial on how to perform a Pikavirus Service as a member of the ISCIII's Bioinformatics Unit! PikaVirus is a associated with _Viral: Detection and characterization of viral genomes within metagenomic data_ service from service catalog.
4+
5+
[PikaVirus](https://github.com/BU-ISCIII/PikaVirus) is a bioinformatics best-practise analysis pipeline for metagenomic analysis following a new approach, based on eliminatory k-mer analysis, followed by assembly and posterior contig-binning. This service will allow us to identify the viral species present in a sample.
6+
7+
Let's get started with the service. When performing a PikaVirus service, remember that the typical acronym is `VIRAL-DISCOVERY`, but this may differ depending on the service.
8+
9+
First of all, follow the [first steps to create a service](/link/to/tools/and/iskylims/TODO). Once the `new-service` is finished, you'll have a new folder with 6 folders: `ANALYSIS`, `DOC`, `RAW`, `REFERENCES`, `RESULTS` and `TMP` (as explained [here](https://github.com/BU-ISCIII/BU-ISCIII/wiki/bioinformatics#33-services_and_collaborations)). We should check the following folders before going any further:
10+
11+
- `DOC`: This folder should contains a `hpc_slurm_pikavirus.config` file with the specific configuration for nf-core/mag in our HPC and some custom parameters like databases paths.
12+
- `RAW`: Check that the number of files contained within the RAW folder is equal to the number of samples specified in [iskyLIMS](https://iskylims.isciii.es/) x 2, in the case that they are paired-end reads.
13+
14+
If everything is OK, we can get into the `ANALYSIS` folder and we'll find the following items inside:
15+
16+
- `lablog_pikavirus`: an executable file that creates the 00-reads folder, moves inside, creates symbolic links to the reads renaming them and renames the `ANALYSIS01_PIKAVIRUS` folder.
17+
- `samples_id.txt`: a `.txt` file containing all the sample names, one per line, so there will be as many lines as samples associated with our service.
18+
- `ANALYSIS01_PIKAVIRUS`: Folder with the main PikaVirus analysis files.
19+
20+
Now we can execute the lablog:
21+
22+
```bash
23+
bash lablog_pikavirus
24+
```
25+
26+
> [!WARNING]
27+
> If PikaVirus is not the only analysis in your service, don't forget to run the other `lablogs` before the next steps.
28+
29+
After executing this file, if everything is OK, we can now proceed with the next BU-ISCIII tool: `scratch` as explained [here](/link/to/tools/and/iskylims/TODO)
30+
31+
Once this function is finished, we should go into the `scratch_tmp` folder and the specific `ANALYSIS01_PIKAVIRUS` folder associated with our service. Once we're inside, we will see the following folders/files:
32+
33+
- `lablog`: will create symbolic links to `00-reads` and the `samples_id.txt` file, creates `pikavirus.sbatch`, `_01_nf_pikavirus.sh` and `samplesheet.csv` files.
34+
35+
Let's execute the `lablog` file:
36+
37+
```bash
38+
bash lablog
39+
```
40+
41+
Now, we should check we've loaded all the needed dependencies and perform the metagenomic analysis:
42+
43+
```bash
44+
module load Nextflow/21.10.6 singularity
45+
bash _01_nf_pikavirus.sh
46+
```
47+
48+
After this, the analysis will start, and we'll be able to check the status of the process with:
49+
50+
```bash
51+
tail -f DATE_pikavirus01.log
52+
```
53+
54+
Once checked everything has finished OK in the `DATE_pikavirus01.log`, within the `DATE_ANALYSIS01_PIKAVIRUS` folder, we'll have the following content:
55+
56+
- `01-PikaVirus-results/`: Results of the PikaVirus pipeline.
57+
- `multiqc_report.html`: MultiQC's HTML report of the PikaVirus pipeline.
58+
- `mash_results`: Mash results for each sample, we would need to check them if there is any problem with the hits.
59+
- `<sample_name>`: Results specific for the sample (quality control, multiQC, coverage...)
60+
- `<sample_name>_results.html`: HTML results with the stats for the sample (hits found, coverage, stats...)
61+
- `all_samples_virus_table.tsv`: Table with the results for all the samples and the stats in .tsv format.
62+
63+
The file `all_samples_virus_table.tsv` usually contains some duplicated reads and phages. In order to clean the results in this table we should run:
64+
65+
```bash
66+
grep -v 'genome' all_samples_virus_table.tsv | grep -v 'phage' > all_samples_virus_table_filtered.tsv
67+
```
68+
69+
Then convert this table to `.xlsx` if this process is not included in the `RESULTS`'s `labglog`.
70+
71+
In this service we should check:
72+
73+
- `01-PikaVirus-results/multiqc_report.html`: FastQC/fastp reports to assess the good/bad quality of the reads, the presence of adaptaers, etc...
74+
- `all_samples_virus_table.tsv`: Check that the species are the spected and that they contain enough coverage. Those virus covered to >50% at a depth of 10X, can be used as reference genome for `nf-core/viralrecon` for further analysis.

0 commit comments

Comments
 (0)