-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new tutorial for deciphering viral populations using SNV and baculovirus isolates (Variant analysis) #5700
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Welcome @wennj!
I have added a few batch of comments, if you agree with them you can accept them from inside the github interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not change this file.
- Transform the output (VCF file) to a readable table format using tools available at Galaxy only. | ||
- Interpret the SNV data to analysis the intra-isolate variability of baculovirus isolates or samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Transform the output (VCF file) to a readable table format using tools available at Galaxy only. | |
- Interpret the SNV data to analysis the intra-isolate variability of baculovirus isolates or samples. | |
- Transform the output (VCF file) to a readable table format | |
- Interpret the SNV data to analyse the intra-isolate variability of baculovirus isolates or samples. |
Not sure if the transform step is really an objective.
SRA: Sequence Read Archive | ||
key_points: | ||
- Baculovirus populations are heterogeneous and show genetic variability. | ||
- Some SNV positions occur only in certain isolates and are therefore specific for that isolate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Some SNV positions occur only in certain isolates and are therefore specific for that isolate. | |
- Some SNV positions occur only in certain isolates and are therefore specific to that isolate. |
> The genome of the isolate Autographa californica multiple nucleopolyhedrovirus isolate C6 (AcMNPV-C6) | ||
> (family *Baculoviridae*, genus *Alphabaculovirus*, species *Alphabaculovirus aucalifornicae*) is one of the | ||
> best-studied baculovirus genomes and is 133,894 bp long ({% cite Ayres1994 %}). | ||
> It is the first fully sequenced genon of baculoviruses and today hundreds of genomes are fully sequenced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> It is the first fully sequenced genon of baculoviruses and today hundreds of genomes are fully sequenced | |
> It is the first fully sequenced genome of baculoviruses and today hundreds of genomes are fully sequenced |
The genome size makes it difficult to study genetic variation within a baculovirus population, | ||
especially since the most commonly used sequencing technique (Illumina sequencing) | ||
generates only short reads and requires genome assembly. Since genome assembly generates | ||
a consensus sequence that reflects the majority of a sequenced virus population, | ||
occuring genetic variation is masked. Tools for haplotype-sensitive assembly are | ||
available but so far have been establisehd for viruses with a relatively short genome | ||
but not for baculoviruses or other large dsDNA viruses (REFERENCES). Using Nanopore sequencing, | ||
it is possible to sequence signficiant fragments of baculovirus genomes to determine major | ||
haplotypes ({% cite Wennmann2024 %}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The genome size makes it difficult to study genetic variation within a baculovirus population, | |
especially since the most commonly used sequencing technique (Illumina sequencing) | |
generates only short reads and requires genome assembly. Since genome assembly generates | |
a consensus sequence that reflects the majority of a sequenced virus population, | |
occuring genetic variation is masked. Tools for haplotype-sensitive assembly are | |
available but so far have been establisehd for viruses with a relatively short genome | |
but not for baculoviruses or other large dsDNA viruses (REFERENCES). Using Nanopore sequencing, | |
it is possible to sequence signficiant fragments of baculovirus genomes to determine major | |
haplotypes ({% cite Wennmann2024 %}). | |
The genome size makes it difficult to study genetic variation within a baculovirus population, | |
especially since the most commonly used sequencing technique (Illumina sequencing) | |
generates only short reads and requires genome assembly. Since genome assembly generates | |
a consensus sequence that reflects the majority of a sequenced virus population, | |
occurring genetic variation is masked. Tools for haplotype-sensitive assembly are | |
available but so far have been established for viruses with a relatively short genome | |
but not for baculoviruses or other large dsDNA viruses (REFERENCES). Using Nanopore sequencing, | |
it is possible to sequence significant fragments of baculovirus genomes to determine major | |
haplotypes ({% cite Wennmann2024 %}). |
|
||
## Replace SRA Names with Virus Abbreviations | ||
|
||
One thing that stands out are the SAMPLE names, which were taken automatically from the NCBI SRA datasets. Since it is difficult to remember which virus isolate is behind which SRA number, we can replace the accession numbers with proper names. This makes the table even easier to read and later we can use the information directly to display the SNV positions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that stands out are the SAMPLE names, which were taken automatically from the NCBI SRA datasets. Since it is difficult to remember which virus isolate is behind which SRA number, we can replace the accession numbers with proper names. This makes the table even easier to read and later we can use the information directly to display the SNV positions. | |
One thing that stands out is the SAMPLE names, which were taken automatically from the NCBI SRA datasets. Since it is difficult to remember which virus isolate is behind which SRA number, we can replace the accession numbers with proper names. This makes the table even easier to read and later we can use the information directly to display the SNV positions. |
|
||
# Visualizing SNV Variability Across Isolates | ||
|
||
The final table should be much easier to read and contain all the information we need to perform an analysis of the intra-isolate specific variability. Here is an short overview about the final table (selected columns) and its first three SNV positions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final table should be much easier to read and contain all the information we need to perform an analysis of the intra-isolate specific variability. Here is an short overview about the final table (selected columns) and its first three SNV positions: | |
The final table should be much easier to read and contain all the information we need to perform an analysis of the intra-isolate specific variability. Here is a short overview of the final table (selected columns) and its first three SNV positions: |
|
||
![SNV specificity plot for CpGV-V15](../../images/baculovirus-isolate-variation/SNV_specificity_plot_CpGV-V15.png "Deciphering the composition of the isolate CpGV-V15 by using SNV specificities.") | ||
|
||
Most of the SNV positions specific for CpGV-E2 have a frequency around 0.5 (50%) (Fig. 5, SNV specificity: CpGV-E2). Looking at the positions specific for CpGV-S, we also see a SNV cloud at around 0.5 (50 %) (Fig. 5, SNV specificity: CpGV-S). From these two observations, we can conclude that CpGV-E2 and CpGV-S occur at a ratio of 50% each. As a control, we look at the markers (SNV positions) that are specific for both CpGV-E2 and CpGV-S (Fig. 5, SNV specificity: CpGV-E2 + CpGV-S). Here we see that most SNV positions have a relative frequency of ~1 (100%). This also makes sense, because if CpGV-V15 is a mixture of CpGV-E2 and CpGV-S, then the markers that indicate both isolates should be at 100%, which is the case. Based on the SNV analysis, it can be concluded that CpGV-V15 is mainly a mixture of CpGV-S and CpGV-E2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the SNV positions specific for CpGV-E2 have a frequency around 0.5 (50%) (Fig. 5, SNV specificity: CpGV-E2). Looking at the positions specific for CpGV-S, we also see a SNV cloud at around 0.5 (50 %) (Fig. 5, SNV specificity: CpGV-S). From these two observations, we can conclude that CpGV-E2 and CpGV-S occur at a ratio of 50% each. As a control, we look at the markers (SNV positions) that are specific for both CpGV-E2 and CpGV-S (Fig. 5, SNV specificity: CpGV-E2 + CpGV-S). Here we see that most SNV positions have a relative frequency of ~1 (100%). This also makes sense, because if CpGV-V15 is a mixture of CpGV-E2 and CpGV-S, then the markers that indicate both isolates should be at 100%, which is the case. Based on the SNV analysis, it can be concluded that CpGV-V15 is mainly a mixture of CpGV-S and CpGV-E2. | |
Most of the SNV positions specific for CpGV-E2 have a frequency of around 0.5 (50%) (Fig. 5, SNV specificity: CpGV-E2). Looking at the positions specific to CpGV-S, we also see a SNV cloud at around 0.5 (50 %) (Fig. 5, SNV specificity: CpGV-S). From these two observations, we can conclude that CpGV-E2 and CpGV-S occur at a ratio of 50% each. As a control, we look at the markers (SNV positions) that are specific for both CpGV-E2 and CpGV-S (Fig. 5, SNV specificity: CpGV-E2 + CpGV-S). Here we see that most SNV positions have a relative frequency of ~1 (100%). This also makes sense, because if CpGV-V15 is a mixture of CpGV-E2 and CpGV-S, then the markers that indicate both isolates should be at 100%, which is the case. Based on the SNV analysis, it can be concluded that CpGV-V15 is mainly a mixture of CpGV-S and CpGV-E2. |
- Baculovirus populations are heterogeneous and show genetic variability. | ||
- Some SNV positions occur only in certain isolates and are therefore specific for that isolate. | ||
- SNV specificities can be used as markers to identify isolates and determine mixtures. | ||
contributors: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can extend this for example by:
contributions:
authorship:
- wennj
editing:
- your-student
layout: tutorial_hands_on | ||
|
||
title: Deciphering Virus Populations - Single Nucleotide Variants (SNVs) and Specificities in Baculovirus Isolates | ||
subtopic: '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subtopic: '' | |
subtopic: 'one-health' |
Maybe?
Dear GTN Team,
I am a new contributor and have created a tutorial for analysing virus populations using SNV (and using the example of large dsDNA viruses). The workflow is based on published results and follows a scientific approach. I have followed the GTN instructions as closely as possible (and as I understand them). Please excuse any mistakes in using Github. The tutorial has also been reviewed by a student.
Key changes:
Checked:
Help:
TODO:
Best,
JTW