Skip to content

Conversation

TheGreatJack
Copy link

Adding softclip and hardclip hadling to output correct reads with those kind of alignments

I was testing the "bam2msa" script, and it was really helpful, specially for the insertion ignoring functionality. But i found some problems regarding softclipped alignments as the program doesn't skip the necessary bases before writing to the sequence record. If you look at the third row of this msa that i made using BioExt v0.21.9 you will see that it doesn't fit quite well. In particular that read had this CIGAR string "22S56M63S" :

Before_zoom_01

I simply modified the CIGAR matching regex so that it also matches "S" and "H" for softclippled or hardclippled bases. The downstream code logic seems to be able to handle this modification without major problems. After said modification the same row in the previous alignment now fits as expected:

After_zoom_01

I also found good fit in other reads that also had softclipping. I understand that these heavily softclipped alignments may be a signal for chimeric reads or alignment errors. But that is something that can be filtered out before using bam2msa(See samclip). A warning to the user may be good. But in case someone wants to include softclipped reads or uses the program without filtering, this would be less of a headache.

I briefly checked for the effects on the function "gapful" and "clip" from misc. And the code logic seems robust to this change. I'm not too familiar with testing. I just directly tested "bam2msa"

Adding softclip and hardclip hadling to output correct reads with these kind of alignments
Hard clipping removed. Hard clipping doesn't include the "clipped" sequence for a bam record. So no need to skip over the query sequence. Adding "H" it in the match would clip things that it shouldn't in hardclipped alignments.
@TheGreatJack
Copy link
Author

Had to commit a second time because matching "H" would cause unnecessary skipping over the query sequence. This is because hardclipped sections of a query sequence arent included in the alignment record.. Silly mistake on my part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant