-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic regions support #16
Comments
Agreed. See sgkit-dev/vcf-zarr-spec#21 and sgkit-dev/vcf-zarr-spec#22 for previous thoughts on how we support this efficiently. I hadn't thought of storing a length array, that's an excellent idea. Would that really suffice to answer overlap queries efficiently? |
Yes. I would actually store an array of variant end positions, and then use pyranges to efficiently compute the overlap. For |
Bioframe is another option for overlap queries. |
Thanks @tomwhite. I think the first step is to do I think we do need to consider what would be an efficient index that we could store in a single/small number of chunks that would allow us to implement range queries (sgkit-dev/vcf-zarr-spec#21, sgkit-dev/vcf-zarr-spec#23) |
We should add the equivalent of the bcftools
-r/--regions
option to filter by regions.The work in sgkit-dev/sgkit#658 (which was never merged) could form the basis for the implementation.
Also, it would be simpler to implement
-t/--targets
first, since unlike-r/--regions
it only needs to check position, and not the variant length. To support the latter we probably need to store the variant length as a separate Zarr array in order to do efficient queries.The text was updated successfully, but these errors were encountered: