Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new tutorial for deciphering viral populations using SNV and baculovirus isolates (Variant analysis) #5700

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

wennj
Copy link

@wennj wennj commented Jan 17, 2025

Dear GTN Team,

I am a new contributor and have created a tutorial for analysing virus populations using SNV (and using the example of large dsDNA viruses). The workflow is based on published results and follows a scientific approach. I have followed the GTN instructions as closely as possible (and as I understand them). Please excuse any mistakes in using Github. The tutorial has also been reviewed by a student.

Key changes:

  • Added a new step-by-step tutorial (variant-analysis/baculovirus-isolate-variation) for analysing viral populations on the example of baculovirus isolates
  • CONTRIBUTORS.yaml was changed.

Checked:

  • Images can be re-hosted.

Help:

  • One image in the introduction is linked to ICTV.global. Allowed?

TODO:

  • Tool bcftools mpileup / bcftools call are using DPR (deprecated but functional). Needs to be changed to AD in the future.
  • Awaiting feedback and suggestions for improvement.
  • Refine the tutorial based on reviewer comments.

Best,
JTW

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Welcome @wennj!

I have added a few batch of comments, if you agree with them you can accept them from inside the github interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not change this file.

Comment on lines +15 to +16
- Transform the output (VCF file) to a readable table format using tools available at Galaxy only.
- Interpret the SNV data to analysis the intra-isolate variability of baculovirus isolates or samples.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Transform the output (VCF file) to a readable table format using tools available at Galaxy only.
- Interpret the SNV data to analysis the intra-isolate variability of baculovirus isolates or samples.
- Transform the output (VCF file) to a readable table format
- Interpret the SNV data to analyse the intra-isolate variability of baculovirus isolates or samples.

Not sure if the transform step is really an objective.

SRA: Sequence Read Archive
key_points:
- Baculovirus populations are heterogeneous and show genetic variability.
- Some SNV positions occur only in certain isolates and are therefore specific for that isolate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Some SNV positions occur only in certain isolates and are therefore specific for that isolate.
- Some SNV positions occur only in certain isolates and are therefore specific to that isolate.

> The genome of the isolate Autographa californica multiple nucleopolyhedrovirus isolate C6 (AcMNPV-C6)
> (family *Baculoviridae*, genus *Alphabaculovirus*, species *Alphabaculovirus aucalifornicae*) is one of the
> best-studied baculovirus genomes and is 133,894 bp long ({% cite Ayres1994 %}).
> It is the first fully sequenced genon of baculoviruses and today hundreds of genomes are fully sequenced
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> It is the first fully sequenced genon of baculoviruses and today hundreds of genomes are fully sequenced
> It is the first fully sequenced genome of baculoviruses and today hundreds of genomes are fully sequenced

Comment on lines +60 to +68
The genome size makes it difficult to study genetic variation within a baculovirus population,
especially since the most commonly used sequencing technique (Illumina sequencing)
generates only short reads and requires genome assembly. Since genome assembly generates
a consensus sequence that reflects the majority of a sequenced virus population,
occuring genetic variation is masked. Tools for haplotype-sensitive assembly are
available but so far have been establisehd for viruses with a relatively short genome
but not for baculoviruses or other large dsDNA viruses (REFERENCES). Using Nanopore sequencing,
it is possible to sequence signficiant fragments of baculovirus genomes to determine major
haplotypes ({% cite Wennmann2024 %}).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The genome size makes it difficult to study genetic variation within a baculovirus population,
especially since the most commonly used sequencing technique (Illumina sequencing)
generates only short reads and requires genome assembly. Since genome assembly generates
a consensus sequence that reflects the majority of a sequenced virus population,
occuring genetic variation is masked. Tools for haplotype-sensitive assembly are
available but so far have been establisehd for viruses with a relatively short genome
but not for baculoviruses or other large dsDNA viruses (REFERENCES). Using Nanopore sequencing,
it is possible to sequence signficiant fragments of baculovirus genomes to determine major
haplotypes ({% cite Wennmann2024 %}).
The genome size makes it difficult to study genetic variation within a baculovirus population,
especially since the most commonly used sequencing technique (Illumina sequencing)
generates only short reads and requires genome assembly. Since genome assembly generates
a consensus sequence that reflects the majority of a sequenced virus population,
occurring genetic variation is masked. Tools for haplotype-sensitive assembly are
available but so far have been established for viruses with a relatively short genome
but not for baculoviruses or other large dsDNA viruses (REFERENCES). Using Nanopore sequencing,
it is possible to sequence significant fragments of baculovirus genomes to determine major
haplotypes ({% cite Wennmann2024 %}).


## Replace SRA Names with Virus Abbreviations

One thing that stands out are the SAMPLE names, which were taken automatically from the NCBI SRA datasets. Since it is difficult to remember which virus isolate is behind which SRA number, we can replace the accession numbers with proper names. This makes the table even easier to read and later we can use the information directly to display the SNV positions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
One thing that stands out are the SAMPLE names, which were taken automatically from the NCBI SRA datasets. Since it is difficult to remember which virus isolate is behind which SRA number, we can replace the accession numbers with proper names. This makes the table even easier to read and later we can use the information directly to display the SNV positions.
One thing that stands out is the SAMPLE names, which were taken automatically from the NCBI SRA datasets. Since it is difficult to remember which virus isolate is behind which SRA number, we can replace the accession numbers with proper names. This makes the table even easier to read and later we can use the information directly to display the SNV positions.


# Visualizing SNV Variability Across Isolates

The final table should be much easier to read and contain all the information we need to perform an analysis of the intra-isolate specific variability. Here is an short overview about the final table (selected columns) and its first three SNV positions:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The final table should be much easier to read and contain all the information we need to perform an analysis of the intra-isolate specific variability. Here is an short overview about the final table (selected columns) and its first three SNV positions:
The final table should be much easier to read and contain all the information we need to perform an analysis of the intra-isolate specific variability. Here is a short overview of the final table (selected columns) and its first three SNV positions:


![SNV specificity plot for CpGV-V15](../../images/baculovirus-isolate-variation/SNV_specificity_plot_CpGV-V15.png "Deciphering the composition of the isolate CpGV-V15 by using SNV specificities.")

Most of the SNV positions specific for CpGV-E2 have a frequency around 0.5 (50%) (Fig. 5, SNV specificity: CpGV-E2). Looking at the positions specific for CpGV-S, we also see a SNV cloud at around 0.5 (50 %) (Fig. 5, SNV specificity: CpGV-S). From these two observations, we can conclude that CpGV-E2 and CpGV-S occur at a ratio of 50% each. As a control, we look at the markers (SNV positions) that are specific for both CpGV-E2 and CpGV-S (Fig. 5, SNV specificity: CpGV-E2 + CpGV-S). Here we see that most SNV positions have a relative frequency of ~1 (100%). This also makes sense, because if CpGV-V15 is a mixture of CpGV-E2 and CpGV-S, then the markers that indicate both isolates should be at 100%, which is the case. Based on the SNV analysis, it can be concluded that CpGV-V15 is mainly a mixture of CpGV-S and CpGV-E2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Most of the SNV positions specific for CpGV-E2 have a frequency around 0.5 (50%) (Fig. 5, SNV specificity: CpGV-E2). Looking at the positions specific for CpGV-S, we also see a SNV cloud at around 0.5 (50 %) (Fig. 5, SNV specificity: CpGV-S). From these two observations, we can conclude that CpGV-E2 and CpGV-S occur at a ratio of 50% each. As a control, we look at the markers (SNV positions) that are specific for both CpGV-E2 and CpGV-S (Fig. 5, SNV specificity: CpGV-E2 + CpGV-S). Here we see that most SNV positions have a relative frequency of ~1 (100%). This also makes sense, because if CpGV-V15 is a mixture of CpGV-E2 and CpGV-S, then the markers that indicate both isolates should be at 100%, which is the case. Based on the SNV analysis, it can be concluded that CpGV-V15 is mainly a mixture of CpGV-S and CpGV-E2.
Most of the SNV positions specific for CpGV-E2 have a frequency of around 0.5 (50%) (Fig. 5, SNV specificity: CpGV-E2). Looking at the positions specific to CpGV-S, we also see a SNV cloud at around 0.5 (50 %) (Fig. 5, SNV specificity: CpGV-S). From these two observations, we can conclude that CpGV-E2 and CpGV-S occur at a ratio of 50% each. As a control, we look at the markers (SNV positions) that are specific for both CpGV-E2 and CpGV-S (Fig. 5, SNV specificity: CpGV-E2 + CpGV-S). Here we see that most SNV positions have a relative frequency of ~1 (100%). This also makes sense, because if CpGV-V15 is a mixture of CpGV-E2 and CpGV-S, then the markers that indicate both isolates should be at 100%, which is the case. Based on the SNV analysis, it can be concluded that CpGV-V15 is mainly a mixture of CpGV-S and CpGV-E2.

- Baculovirus populations are heterogeneous and show genetic variability.
- Some SNV positions occur only in certain isolates and are therefore specific for that isolate.
- SNV specificities can be used as markers to identify isolates and determine mixtures.
contributors:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can extend this for example by:

contributions:
authorship:
- wennj
editing:
- your-student

layout: tutorial_hands_on

title: Deciphering Virus Populations - Single Nucleotide Variants (SNVs) and Specificities in Baculovirus Isolates
subtopic: ''
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
subtopic: ''
subtopic: 'one-health'

Maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants