-
Notifications
You must be signed in to change notification settings - Fork 12
Feature/podp_downloaded_bgc_data #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
liannette
wants to merge
50
commits into
NPLinker:dev
Choose a base branch
from
liannette:feature/podp_check_downloaded_bgc_data
base: dev
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,097
−264
Open
Changes from all commits
Commits
Show all changes
50 commits
Select commit
Hold shift + click to select a range
881e3eb
Merge pull request #4 from NPLinker/dev
liannette 2430868
Refactor: Simplify Path handling
liannette 1f07246
Refactor: Improve extract path handling to ensure that non-empty dirs…
liannette 8d7238c
Refactor: Extract file cleanup logic into a separate function for bet…
liannette d5e64eb
Refactor: Move extract_path preparation logic into a seperate functio…
liannette 66f13d6
Refactor: Separate genome assembly resolution and antiSMASH data retr…
liannette 706f841
Refactor: Improve genome ID handling and logging
liannette b25506e
Refactor: Move logging for antiSMASH data retrieval errors and succes…
liannette 2ed8ac6
simplify comment
liannette a1e4962
Enhance logging messages in antiSMASH data retrieval
liannette e8dd391
test: adapt test to changed logging info message
liannette ea159ca
Feat: Add antiSMASH API functionality
liannette a972a70
Feat: Add antiSMASH API functionality
liannette 0569976
Merge branch 'feature/antismash-jobs' of https://github.com/liannette…
liannette f3a159f
fix: improve logging message for start of antiSMASH API process
liannette da4da3c
docs: improve docstring for download_and_extract_ncbi_genome function
liannette af094e4
fix: update logging messages for antiSMASH data retrieval failures
liannette 99d99a2
add logging after antiSMASH job submission
liannette 1c83934
refactor: rename refseq_id to genome_assembly_acc
liannette 496db38
improve genome download process with validation and retry logic
liannette cc6573c
test: add unit tests for download_and_extract_ncbi_genome function
liannette 51e4817
refactor: rename verify_ncbi_dataset_md5_sums to _verify_ncbi_dataset…
liannette 6b564ea
refactor: move _verify_ncbi_dataset_md5_sums function to a new locati…
liannette 7570852
feat: handle already download antiSMASH results
liannette 9881254
fix mistake in docstring
liannette 0e1ec54
fix: update return type of submit_antismash_job to str
liannette f53f1a7
fix: update return type of download_and_extract_ncbi_genome to Path
liannette 5e5b2bd
update submit_antismash_job to return job ID as string and improve er…
liannette 909ca63
chore: add types-requests to development dependencies
liannette 252a177
fix: update return type of _verify_ncbi_dataset_md5_sums to None and …
liannette 81a0102
fix: update _verify_ncbi_dataset_md5_sums to accept str or PathLike f…
liannette 50890a7
fix: convert extract_path to Path in _prepare_extract_path for consis…
liannette b784476
chore: update typing dependencies in format-typing-check workflow
liannette bdebd96
fix: clarify return value documentation for submit_antismash_job func…
liannette 7334c90
fix: enable postponed evaluation of type annotations in ncbi_download…
liannette 2f0d074
feat: add genome accession resolver for NCBI assembly accessions
liannette 5d13fea
test: add unit tests for genome accession resolver functions
liannette 5a201cc
fix: ensure no mypy typing errors
liannette a8ed146
feat: use the new genome ID resolver
liannette 4ba85d0
refactor: change resolved_refseq_id to resolved_id
liannette 398d700
fix: skip antiSMASH DB retrieval for non refseq ids
liannette a2e6071
refactor: change resolve_attempted to failed_previously in GenomeStatus
liannette 8387650
feat: save updated genome status to json after each genome
liannette 3779815
fix: assert failed_previously is False in caching test
liannette a579945
refactor: remove unneccessary if statement
liannette 092c94a
update type hint for genome_id_data to comply with mypy
liannette 47f5618
check if BGC data already downloaded
liannette 91020c9
fix: correct spelling of antiSMASH in logging and comments
liannette bf6b6c5
fix: add bgc_path to genome status to ensure correct extraction path
liannette f197fef
refactor: use original genome ID from the genome status object for co…
liannette File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,16 +1,28 @@ | ||
| from .antismash_downloader import download_and_extract_antismash_data | ||
| from .antismash_api_client import antismash_job_is_done | ||
| from .antismash_api_client import submit_antismash_job | ||
| from .antismash_downloader import download_and_extract_from_antismash_api | ||
| from .antismash_downloader import download_and_extract_from_antismash_db | ||
| from .antismash_downloader import extract_antismash_data | ||
| from .antismash_loader import AntismashBGCLoader | ||
| from .antismash_loader import parse_bgc_genbank | ||
| from .genome_accession_resolver import resolve_genome_accession | ||
| from .ncbi_downloader import download_and_extract_ncbi_genome | ||
| from .podp_antismash_downloader import GenomeStatus | ||
| from .podp_antismash_downloader import get_best_available_genome_id | ||
| from .podp_antismash_downloader import podp_download_and_extract_antismash_data | ||
|
|
||
|
|
||
| __all__ = [ | ||
| "download_and_extract_antismash_data", | ||
| "extract_antismash_data", | ||
| "resolve_genome_accession", | ||
| "download_and_extract_from_antismash_api", | ||
| "download_and_extract_from_antismash_db", | ||
| "AntismashBGCLoader", | ||
| "parse_bgc_genbank", | ||
| "GenomeStatus", | ||
| "get_best_available_genome_id", | ||
| "podp_download_and_extract_antismash_data", | ||
| "download_and_extract_ncbi_genome", | ||
| "submit_antismash_job", | ||
| "antismash_job_is_done", | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| from __future__ import annotations | ||
| import logging | ||
| from os import PathLike | ||
| from pathlib import Path | ||
| import requests | ||
|
|
||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def submit_antismash_job(genbank_filepath: str | PathLike) -> str: | ||
| """Submits an antiSMASH job using the provided GenBank file. | ||
|
|
||
| This function sends a GenBank file to the antiSMASH API | ||
| and retrieves the job ID if the submission is successful. | ||
|
|
||
| Args: | ||
| genbank_filepath (str | PathLike): The path to the GenBank file to be submitted. | ||
|
|
||
| Returns: | ||
| str: The job ID of the submitted antiSMASH job. | ||
|
|
||
| Raises: | ||
| requests.exceptions.RequestException: If there is an issue with the HTTP request. | ||
| RuntimeError: If the API response does not contain a job ID. | ||
| """ | ||
| url = "https://antismash.secondarymetabolites.org/api/v1.0/submit" | ||
| genbank_filepath = Path(genbank_filepath) | ||
|
|
||
| with open(genbank_filepath, "rb") as file: | ||
| files = {"seq": file} | ||
| response = requests.post(url, files=files) | ||
| response.raise_for_status() # Raise an exception for HTTP errors | ||
|
|
||
| data = response.json() | ||
| if "id" not in data: | ||
| raise RuntimeError("No antiSMASH job ID returned") | ||
| return str(data["id"]) | ||
|
|
||
|
|
||
| def antismash_job_is_done(job_id: str) -> bool: | ||
| """Determines if the antiSMASH job has completed by checking its status. | ||
|
|
||
| This function queries the antiSMASH API to retrieve the current state | ||
| of the job and determines whether it has finished successfully, is still | ||
| in progress, or has encountered an error. | ||
|
|
||
| Args: | ||
| job_id (str): The unique identifier of the antiSMASH job. | ||
|
|
||
| Returns: | ||
| bool: True if the job is completed successfully, False if it is still | ||
| running or queued. | ||
|
|
||
| Raises: | ||
| RuntimeError: If the job has failed or if the API response indicates an error. | ||
| ValueError: If the job state is missing or an unexpected state is encountered | ||
| in the API response. | ||
| requests.exceptions.HTTPError: If an HTTP error occurs during the API request. | ||
| """ | ||
| url = f"https://antismash.secondarymetabolites.org/api/v1.0/status/{job_id}" | ||
|
|
||
| response = requests.get(url, timeout=10) | ||
| response.raise_for_status() # Raise exception for HTTP errors | ||
| respose_data = response.json() | ||
|
|
||
| if "state" not in respose_data: | ||
| raise ValueError(f"Job state missing in response for job_id: {job_id}") | ||
|
|
||
| job_state = respose_data["state"] | ||
| if job_state in ("running", "queued"): | ||
| return False | ||
| if job_state == "done": | ||
| return True | ||
| if job_state == "failed": | ||
| job_status = respose_data.get("status", "No error message provided") | ||
| raise RuntimeError(f"AntiSMASH job {job_id} failed with an error: {job_status}") | ||
| else: | ||
| raise ValueError( | ||
| f"Unexpected job state for antismash job ID {job_id}. Job state: {job_state}" | ||
| ) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'respose_data' seems to be misspelled; consider renaming it to 'response_data' for clarity.