-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
computed identifiers using pydantic model serializers #342
Conversation
@theferrit32 @korikuzma @larrybabb @andreasprlic @ehclark please note this in-progress refactor of the serialization code. It is close to ready for review, the primary thing I want to address is checking the translator extras test suite, after which I plan to hand this off for review. I don't plan to address the enref/deref code refactor as part of this work so have disabled the tests for those methods. @theferrit32 I tagged you under "Assigned" here in case those are important to you and you want to address them as part of this PR. |
Working on updating VCF unit tests I ran across this. This looks problematic to me, but wanted to run it by others before I dig in too deep.
It seems to me that the sequence returned by This is the reference sequence starting at position 289450
This is the sequence with the insertion as defined by the HGVS expression
This is the sequence derived from the RLE
Another example:
|
Great catch, agreed that this needs to be addressed. I don't think we accounted for this case (ambiguous insertion of non-referenced-derived sequence) in our RLE normalization logic. I'll take a look at this tomorrow. |
… derived from the reference sequence
I went ahead and implemented a change so that LSE is now used for ambiguous insertion of non-reference derived bases. I could not think of a better method than just running the derivation logic and comparing. Its not great from a performance standpoint, so if you have a better idea @ahwagner please go ahead and improve on things. |
…annot be derived from the reference sequence" This reverts commit f407fd0.
This is ready for hand-off. I updated the VRS normalization algorithm (ga4gh/vrs@7871872) to account for ambiguous novel sequence insertions. Most tests are passing, though there are some failures in the @theferrit32 and @korikuzma please discuss and assign investigation of those test failures, after which this PR is ready for review. |
@theferrit32 I'm fine wrapping this up if you don't have time / want to. Just let me know! |
I was overthinking this and want to make another edit. Converting to draft for a bit while I work on it. |
Alright, ball is back in your court @theferrit32 and @korikuzma. Implemented, up-to-date algo is here. Still need someone to look at the test_vcf_annotation failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 @ahwagner Should wait until test_vcf_annotation failures are corrected before merging? If not. I'll merge this PR.
@larrybabb Correct. Wait to merge until tests are passing. We should add this in our branch protection rules. |
@korikuzma I just added branch protections to require status pass for all tests. |
@ahwagner can you remind me why we are disabling genotype? |
chr19-54220999-A-A had that `C` was the actual ref seq, but it should have been `T`
@ahwagner tests are passing. In the tests files I changed,
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Thanks @korikuzma
@ehclark sorry for the delays. This is the version of vrs-python we will be moving forward with. We assume there will be no more foreseeable changes to the digest for 2.0 (but there are no guarantees of course). |
@larrybabb @ehclark we don't foresee any changes to Allele or SequenceLocation digests. However, Haplotype, Adjacency, and other structures under discussion for the SV work are still likely to change. |
WIP: addresses #341, #335, #334. Uses the pydantic
.model_dump_json()
call to serialize for computed digests, and adds some logic to cache digests and computed identifiers in appropriate object fields when calculated.Adds tests to validate pydantic models match data class and field names from VRS Schema.
Disables Genotypes.
Remaining work: