Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report position in read for reads in a pileup #260

Closed
wants to merge 2 commits into from

Conversation

DaGaMs
Copy link

@DaGaMs DaGaMs commented Jun 22, 2022

No description provided.

@DaGaMs DaGaMs requested review from alumi and a team as code owners June 22, 2022 14:55
@DaGaMs DaGaMs requested review from athos and removed request for a team June 22, 2022 14:55
Copy link
Member

@alumi alumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for submitting your PR. Also I would like to apologize for the long delay in replying to your message.
I think your suggestion of storing the position in read in cljam.io.pileup.PileupBase is useful, as there are certainly many cases where we can utilize that information, such as variant calls and filtering. However, on the other hand, I have three concerns and would like to merge the PRs when they have been resolved.

The first is that the existing test lein test :all fails.
The second is the increase in memory due to the addition of fields to PosBase and PileupBase. Since billions of instances of these classes would be created, I'm concerned about the performance degradation associated with the increase in memory consumed. It would be appreciated if we could obtain benchmarks for typical use cases and confirm that the differences are negligible enough to be compared with the current situation.
The third is a mismatch with the serialization format. cljam.io.pileup supports serialization to samtools mpileup format strings, but there is no pos equivalent fields there. Since the type of the pos field is a primitive int, it cannot be assigned nil, so it will continue to have an invalid value when read from a file. I think some action is needed, such as changing the type to java.lang.Integer while accepting the performance degradation, or introducing a type with an inheritance relationship to PileupBase.
In addition, regarding your comment that the position of deletion? in cljam.algo.pileup/resolve-base may be misplaced, please note that deletion? here expresses "whether or not a deletion is included immediately after the base at the position of interest" like the notation A-1N in the samtools mpileup format, and not "whether or not a base at the position of interest is missing". The process of assigning a deletion to the immediately preceding base is done within cljam.io.sam.util.cigar/to-index*. Since the naming of the local binding is ambiguous, as you say, I think it is worth renaming.

I would appreciate your consideration of the above.
Or we can convert this PR into an issue and put it on our team's backlog.

@DaGaMs
Copy link
Author

DaGaMs commented May 30, 2023

I wrote this PR last year while I was toying around with cljam for a test project. I've not used it since, so I won't have time to work on this further. Feel free to move this into an issue rather than merging 🤷‍♂️

@alumi
Copy link
Member

alumi commented May 30, 2023

I just opened #274. Thank you very much again for your interest in our project.

@alumi alumi closed this May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants