Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear effect of the protein2genome model #22

Open
cmayer opened this issue Aug 12, 2021 · 1 comment
Open

Unclear effect of the protein2genome model #22

cmayer opened this issue Aug 12, 2021 · 1 comment

Comments

@cmayer
Copy link

cmayer commented Aug 12, 2021

The manual says:
"protein2genome
This model allows alignment of a protein sequence to genomic DNA. This is similar to the protein2dna model, with the addition of modelling of introns and intron phases. This model is simliar to those used by genewise."

I could not identify any difference between the protein2genome and protein2dna models.

I was wondering what to use in the case of data that should contain mostly coding sequences, but could contain introns, UTRs and anything beyond the genes. E.g. for hybrid enrichment data for which the bait region lies within the genes, but sequences could span beyond the coding region.
Here, modeling the introns could help in principle.
As far as I understand the manual, the protein2genome should be favoured for the described scenario. How are introns "modeled" in the two protein2dna and protein2genome cases.

@hyphaltip
Copy link
Contributor

I believe protein2genome is incorporates the model with intron states while protein2dna is more about only modeling frameshifts in a protein to DNA alignment.

https://github.com/nathanweeks/exonerate/blob/master/doc/man/man1/exonerate.1

protein2dna
This model compares a protein sequence to a DNA sequence,
incorporating all the appropriate gaps and frameshifts.

This is a bestfit version of the protein2dna model,
with which the entire protein is included in the alignment.
It is currently only available when using exhaustive alignment.

protein2genome
This model allows alignment of a protein sequence to genomic
DNA.   This is similar to the protein2dna model,
with the addition of modelling of introns and intron phases.
This model is simliar to those used by genewise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants