-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable longer reads by default for longer insert protocols #54
base: main
Are you sure you want to change the base?
Conversation
Interesting! ... and good to know about the protocol. The problem is, however, that this will worsen accuracy of the WF in the common ARTIC primer case. A length of 2000 nts would definitely be indicative of a chimeric read. The current defaults are taken from https://artic.network/ncov-2019/ncov2019-bioinformatics-sop.html - the Read Filtering section in there (sorry no permalink to the chapter available). What would be cool to do in the WF would be to analyze the primer scheme and calculate the length limits from it (like suggested in that Read Filtering section). If you have time to think about an implementation for this, it would be a most welcome improvement. Otherwise the only thing I think should/could be done here is explain in the WF param help what reasonable defaults are (you could just copy/paste that from the ARTIC page I guess, or come up with your own rule). |
chimeras are not very toxic since we have a reference genome to help here. I compared 700 vs 2000 max length on some real ARTIC ONT datasets. 2000 max length filters out 148 reads with the 2000 max length vs. ~11k reads with a the 700 default. coverage with upper limit 2000: otherwise the consensus sequences are identical. I think think it's wise to increase this default for all protocols. max 2000 data: max 700 data: |
Hmm, I need to think about this and discuss it with a few people. I do understand that a higher upper threshold leads to better coverage, but I'm less sure that this never decreases variant detection/consensus accuracy. |
I could imagine some mis-clipping at chimera boundaries, but I think these effects will eliminated by ivar trim. |
@@ -103,7 +103,7 @@ | |||
"y": 710.5 | |||
}, | |||
"tool_id": null, | |||
"tool_state": "{\"default\": 700, \"parameter_type\": \"integer\", \"optional\": true}", | |||
"tool_state": "{\"default\": 2000, \"parameter_type\": \"integer\", \"optional\": true}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if there is some benefit in doing this we should probably make this a workflow parameter ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm fine with a workflow parameter, but I think it's good to also set the default based on this data (or other generate more data?)
No description provided.