-
Notifications
You must be signed in to change notification settings - Fork 483
Add tool to detect circular sequences #7296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
70659b1 to
ca3d37a
Compare
ca3d37a to
7e0e0e1
Compare
|
I do not understand why it fails at the "Combine chunked test results" step: https://github.com/galaxyproject/tools-iuc/actions/runs/17918115645?pr=7296 |
Co-authored-by: Saim Momin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you publish this python script separately (extra git repo + bioconda recipe) or re-use existing implementations? I think we should limit publishing software via IUC (except maybe for trivial cases) and restrict to Galaxy tools. IUC seems already busy with the tool wrappers and I'm afraid of the additional workload caused by scripts.
| description: Detect circular sequences (e.g. circular contigs) in a FASTA file by k-mer matching | ||
| long_description: | | ||
| Detect circular sequences (e.g. circular contigs) by looking for exact identical k-mer at the two | ||
| ends on a cadre sequence of the sequences prodvide in fasta file. In order to be able |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cadre?
| Args: | ||
| seq (str): sequence to format. | ||
| """ | ||
| return textwrap.wrap(seq, width=60, break_on_hyphens=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will probably cause problems with headers longer than 60.
use biopython for io?
| return textwrap.wrap(seq, width=60, break_on_hyphens=False) | ||
|
|
||
|
|
||
| def one_line_fasta(input_fp, output_fp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Biopython instead of this workaround?
| @@ -0,0 +1,58 @@ | |||
| <tool id="detect_circular_sequences" name="Detect circular sequences" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@"> | |||
| <description>(e.g. circular contigs) in a FASTA file by k-mer matching</description> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to avoid the term kmer. The word has to many implications.
The tool just checks for exact sequence identity of a single sequence.
| Returns: | ||
| : True if circular, False otherwise | ||
| """ | ||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This try just removes the traceback that could be useful for debugging.
| begin (): | ||
| end (): | ||
| """ | ||
| pattern = re.compile(re.escape(begin)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use regexp if you can use simple string search?
FOR CONTRIBUTOR: