|
| 1 | +# How to perform a PikaVirus Service |
| 2 | + |
| 3 | +This is a brief tutorial on how to perform a Pikavirus Service as a member of the ISCIII's Bioinformatics Unit! PikaVirus is a associated with _Viral: Detection and characterization of viral genomes within metagenomic data_ service from service catalog. |
| 4 | + |
| 5 | +[PikaVirus](https://github.com/BU-ISCIII/PikaVirus) is a bioinformatics best-practise analysis pipeline for metagenomic analysis following a new approach, based on eliminatory k-mer analysis, followed by assembly and posterior contig-binning. This service will allow us to identify the viral species present in a sample. |
| 6 | + |
| 7 | +Let's get started with the service. When performing a PikaVirus service, remember that the typical acronym is `VIRAL-DISCOVERY`, but this may differ depending on the service. |
| 8 | + |
| 9 | +First of all, follow the [first steps to create a service](/link/to/tools/and/iskylims/TODO). Once the `new-service` is finished, you'll have a new folder with 6 folders: `ANALYSIS`, `DOC`, `RAW`, `REFERENCES`, `RESULTS` and `TMP` (as explained [here](https://github.com/BU-ISCIII/BU-ISCIII/wiki/bioinformatics#33-services_and_collaborations)). We should check the following folders before going any further: |
| 10 | + |
| 11 | +- `DOC`: This folder should contains a `hpc_slurm_pikavirus.config` file with the specific configuration for nf-core/mag in our HPC and some custom parameters like databases paths. |
| 12 | +- `RAW`: Check that the number of files contained within the RAW folder is equal to the number of samples specified in [iskyLIMS](https://iskylims.isciii.es/) x 2, in the case that they are paired-end reads. |
| 13 | + |
| 14 | +If everything is OK, we can get into the `ANALYSIS` folder and we'll find the following items inside: |
| 15 | + |
| 16 | +- `lablog_pikavirus`: an executable file that creates the 00-reads folder, moves inside, creates symbolic links to the reads renaming them and renames the `ANALYSIS01_PIKAVIRUS` folder. |
| 17 | +- `samples_id.txt`: a `.txt` file containing all the sample names, one per line, so there will be as many lines as samples associated with our service. |
| 18 | +- `ANALYSIS01_PIKAVIRUS`: Folder with the main PikaVirus analysis files. |
| 19 | + |
| 20 | +Now we can execute the lablog: |
| 21 | + |
| 22 | +```bash |
| 23 | +bash lablog_pikavirus |
| 24 | +``` |
| 25 | + |
| 26 | +> [!WARNING] |
| 27 | +> If PikaVirus is not the only analysis in your service, don't forget to run the other `lablogs` before the next steps. |
| 28 | +
|
| 29 | +After executing this file, if everything is OK, we can now proceed with the next BU-ISCIII tool: `scratch` as explained [here](/link/to/tools/and/iskylims/TODO) |
| 30 | + |
| 31 | +Once this function is finished, we should go into the `scratch_tmp` folder and the specific `ANALYSIS01_PIKAVIRUS` folder associated with our service. Once we're inside, we will see the following folders/files: |
| 32 | + |
| 33 | +- `lablog`: will create symbolic links to `00-reads` and the `samples_id.txt` file, creates `pikavirus.sbatch`, `_01_nf_pikavirus.sh` and `samplesheet.csv` files. |
| 34 | + |
| 35 | +Let's execute the `lablog` file: |
| 36 | + |
| 37 | +```bash |
| 38 | +bash lablog |
| 39 | +``` |
| 40 | + |
| 41 | +Now, we should check we've loaded all the needed dependencies and perform the metagenomic analysis: |
| 42 | + |
| 43 | +```bash |
| 44 | +module load Nextflow/21.10.6 singularity |
| 45 | +bash _01_nf_pikavirus.sh |
| 46 | +``` |
| 47 | + |
| 48 | +After this, the analysis will start, and we'll be able to check the status of the process with: |
| 49 | + |
| 50 | +```bash |
| 51 | +tail -f DATE_pikavirus01.log |
| 52 | +``` |
| 53 | + |
| 54 | +Once checked everything has finished OK in the `DATE_pikavirus01.log`, within the `DATE_ANALYSIS01_PIKAVIRUS` folder, we'll have the following content: |
| 55 | + |
| 56 | +- `01-PikaVirus-results/`: Results of the PikaVirus pipeline. |
| 57 | + - `multiqc_report.html`: MultiQC's HTML report of the PikaVirus pipeline. |
| 58 | + - `mash_results`: Mash results for each sample, we would need to check them if there is any problem with the hits. |
| 59 | + - `<sample_name>`: Results specific for the sample (quality control, multiQC, coverage...) |
| 60 | + - `<sample_name>_results.html`: HTML results with the stats for the sample (hits found, coverage, stats...) |
| 61 | + - `all_samples_virus_table.tsv`: Table with the results for all the samples and the stats in .tsv format. |
| 62 | + |
| 63 | +The file `all_samples_virus_table.tsv` usually contains some duplicated reads and phages. In order to clean the results in this table we should run: |
| 64 | + |
| 65 | +```bash |
| 66 | +grep -v 'genome' all_samples_virus_table.tsv | grep -v 'phage' > all_samples_virus_table_filtered.tsv |
| 67 | +``` |
| 68 | + |
| 69 | +Then convert this table to `.xlsx` if this process is not included in the `RESULTS`'s `labglog`. |
| 70 | + |
| 71 | +In this service we should check: |
| 72 | + |
| 73 | +- `01-PikaVirus-results/multiqc_report.html`: FastQC/fastp reports to assess the good/bad quality of the reads, the presence of adaptaers, etc... |
| 74 | +- `all_samples_virus_table.tsv`: Check that the species are the spected and that they contain enough coverage. Those virus covered to >50% at a depth of 10X, can be used as reference genome for `nf-core/viralrecon` for further analysis. |
0 commit comments