-
Notifications
You must be signed in to change notification settings - Fork 20
feat: Registry v2 artifact searcher #805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 85 commits
e30aaa2
f556b46
186f166
58f97af
e018414
2b002fa
f7086ac
f3dbcc7
45b145d
3c5f97f
9ee90ce
e334e4a
6239069
1f19356
3e37f4d
320c917
a519f4a
433b3fa
a2f57bc
f8431c1
80d43d7
2f541bd
8d7772b
40324d5
43fd176
41d25bb
719d417
81f2862
f3b1f54
084bb5b
a26b694
1ee73dc
4396d18
468fc26
34eb132
e628c52
dd96c81
57911fa
4bcab66
1d5f42c
2eddd3c
7b5cd5d
ac328cd
338da3f
3d3441d
1c0a347
e83e617
25a3b50
b1aa3f8
ffc9a30
21fd6df
8c68cb3
09142f3
249c520
bab882a
366b7fa
39b0c27
4b62b34
bfed5ae
2d714d5
979d8d4
29a5ed9
711ee6e
1dce186
9886064
6477573
1b0d6fe
5605a62
0b11b51
0debfe7
cc95304
115f7e9
29c717a
23d1815
e0efde1
b818147
363501e
eb5de58
029bfe7
34a902b
c54d081
91a6e51
098dec1
0e17b1e
31d453f
a2b89df
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,332 @@ | ||
| import json | ||
| import re | ||
| from typing import Dict, Optional | ||
|
|
||
| from envgenehelper import logger | ||
|
|
||
| from artifact_searcher.utils.models import AuthConfig, Registry | ||
|
|
||
| try: | ||
| from qubership_pipelines_common_library.v1.maven_client import MavenArtifactSearcher | ||
| except ImportError: | ||
| MavenArtifactSearcher = None | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better to raise error here instead of using None, so it’s clear library is missing. if we return None here, exception will still be raised later at line 81, but failing early would make error clearer. |
||
|
|
||
| try: | ||
| from google.oauth2 import service_account | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need these import checks at all, then we hide error anyway? |
||
| from google.auth.transport.requests import Request | ||
| GCP_AUTH_AVAILABLE = True | ||
| except ImportError: | ||
| GCP_AUTH_AVAILABLE = False | ||
|
|
||
|
|
||
| # Timeout for MavenArtifactSearcher: (connect_timeout, read_timeout) | ||
| DEFAULT_SEARCHER_TIMEOUT = (30, 60) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. specify all timeouts in one place with defaults. may be here artifact_searcher.utils.constants. |
||
|
|
||
|
|
||
| class CloudAuthHelper: | ||
| """V2 authentication helper for cloud registries. | ||
|
|
||
| Supports: AWS (access keys), GCP (SA JSON), Artifactory/Nexus (user/pass or anonymous). | ||
| Creates configured MavenArtifactSearcher per provider. | ||
| """ | ||
|
|
||
| @staticmethod | ||
| def resolve_auth_config(registry: Registry, artifact_type: str = "maven") -> Optional[AuthConfig]: | ||
| """Find auth settings for this registry. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Comments on methods should be meaningful and short |
||
|
|
||
| Looks up authConfig based on maven_config reference. | ||
| Returns AuthConfig or None. | ||
| """ | ||
| if artifact_type != "maven": | ||
| return None | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for what this condition? artifact_type: has default maven value always. what sense of artifact_type if it is not used anywhere else ?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also quietly return None without log |
||
|
|
||
| auth_ref = getattr(registry.maven_config, 'auth_config', None) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can just registry.maven_config.auth_config, because you are checking here |
||
| if not auth_ref: | ||
| return None | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we quietly return None without log |
||
|
|
||
| if not registry.auth_config: | ||
| logger.warning(f"No authConfig dict but maven config references '{auth_ref}'") | ||
| return None | ||
|
|
||
| auth_config = registry.auth_config.get(auth_ref) | ||
| if not auth_config: | ||
| logger.error(f"AuthConfig '{auth_ref}' not found. Available: {list(registry.auth_config.keys())}") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if you just log why not warning, if you use error why you not raise ex? |
||
| return None | ||
|
|
||
| logger.info(f"Resolved authConfig '{auth_ref}' -> provider: {auth_config.provider}") | ||
| return auth_config | ||
|
|
||
| @staticmethod | ||
| def resolve_credentials(auth_config: AuthConfig, env_creds: Optional[Dict[str, dict]]) -> Optional[dict]: | ||
| """Get credentials from vault using authConfig's credentials ID. | ||
|
|
||
| Handles: usernamePassword (returns dict), secret (GCP), empty creds (anonymous). | ||
| Returns dict or None for anonymous. | ||
| """ | ||
| cred_id = auth_config.credentials_id | ||
| if not cred_id: | ||
| logger.info("No credentialsId specified, using anonymous access") | ||
| return None | ||
|
|
||
| if not env_creds or cred_id not in env_creds: | ||
| raise KeyError(f"Credential '{cred_id}' not found in env_creds") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. useser doesn't know anything about env_creds |
||
|
|
||
| cred_entry = env_creds[cred_id] | ||
|
|
||
| # Credentials can be structured as {"type": "usernamePassword", "data": {"username": "..."}} | ||
| # or as a flat dict {"username": "...", "password": "..."} | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are you sure we have flat format case in this file? |
||
| cred_type = cred_entry.get("type") if isinstance(cred_entry, dict) else None | ||
| cred_data = cred_entry.get("data", cred_entry) if isinstance(cred_entry, dict) else cred_entry | ||
|
|
||
| # For Nexus/Artifactory: empty username+password means anonymous/public access | ||
| if cred_type == "usernamePassword": | ||
| username = cred_data.get("username", "") | ||
| password = cred_data.get("password", "") | ||
| if not username and not password: | ||
| logger.info(f"Credential '{cred_id}' is anonymous (empty username/password)") | ||
| return None | ||
| creds = {"username": username, "password": password} | ||
| elif cred_type == "secret": | ||
| # For GCP service account JSON or other secret-based credentials | ||
| if "secret" in cred_data: | ||
| creds = cred_data | ||
| else: | ||
| # Handle case where data itself is the secret | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does such case even exist? |
||
| creds = {"secret": cred_data} | ||
| else: | ||
| # Fallback for unknown credential types | ||
| creds = cred_data | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't need any fallbacks, we need to fall with clearly error |
||
|
|
||
| logger.info(f"Resolved credentials for '{cred_id}' (type: {cred_type})") | ||
|
|
||
| # Validate credential format per provider | ||
| if auth_config.provider == "aws": | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. take it out to the appropriate aws method. here you try to resolve creds by type, not by provider |
||
| if "username" not in creds or "password" not in creds: | ||
| raise ValueError(f"AWS credentials must have 'username' and 'password'") | ||
| # GCP needs a service account JSON file (stored as 'secret') | ||
| elif auth_config.provider == "gcp" and auth_config.auth_method == "service_account": | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. take it out to the appropriate aws method. here you try to resolve creds by type, not by provider |
||
| if "secret" not in creds: | ||
| raise ValueError(f"GCP service_account credentials must have 'secret'") | ||
|
|
||
| return creds | ||
|
|
||
| @staticmethod | ||
| def _extract_repository_name(url: str) -> str: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use urlparse not parse just str
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rename, segment from url path is not repo. in doc repo is |
||
| """Extract repository name from registry URL (last path segment).""" | ||
| parts = [p for p in url.rstrip('/').split('/') if p] | ||
| if parts: | ||
| repo_name = parts[-1] | ||
| logger.debug(f"Extracted repository name: {repo_name} from URL: {url}") | ||
| return repo_name | ||
| raise ValueError(f"Could not extract repository name from URL: {url}") | ||
|
|
||
| @staticmethod | ||
| def _extract_region(url: str, auth_config: AuthConfig) -> str: | ||
| """Get AWS region from authConfig, URL, or default to us-east-1.""" | ||
| if auth_config.provider == "aws" and auth_config.aws_region: | ||
| logger.debug(f"Using explicit AWS region: {auth_config.aws_region}") | ||
| return auth_config.aws_region | ||
| aws_match = re.search(r'\.([a-z0-9-]+)\.amazonaws\.com', url) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can it obviously crash if it is not passed without hidden logic, which is not described in doc? |
||
| if aws_match: | ||
| region = aws_match.group(1) | ||
| logger.debug(f"Extracted AWS region from URL: {region}") | ||
| return region | ||
| logger.debug("AWS region not found in URL, defaulting to us-east-1") | ||
| return "us-east-1" | ||
|
|
||
| @staticmethod | ||
| def _extract_gcp_region(url: str) -> str: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is region not described in doc? |
||
| """Extract GCP region from URL (format: us-east1, not us-east-1).""" | ||
| match = re.search(r'https://([a-z0-9-]+)-maven\.pkg\.dev', url) | ||
| if match: | ||
| region = match.group(1) | ||
| logger.debug(f"Extracted GCP region from URL: {region}") | ||
| return region | ||
| logger.warning(f"Could not extract GCP region from URL: {url}, defaulting to us-central1") | ||
| return "us-central1" | ||
|
|
||
| @staticmethod | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. provider will be mandatory, no sense in this method. need to delete and detect provider by type. in schema will be added new types. |
||
| def _detect_provider(url: str, auth_config: AuthConfig) -> Optional[str]: | ||
| """Auto-detect provider from URL (Nexus/Artifactory only; AWS/GCP need explicit).""" | ||
| # If provider is explicitly set, use it | ||
| if auth_config.provider: | ||
| logger.debug(f"Using explicit provider: {auth_config.provider}") | ||
| return auth_config.provider | ||
|
|
||
| url_lower = url.lower() | ||
|
|
||
| # Auto-detect ONLY for on-premise registries (Nexus and Artifactory) | ||
| # AWS and GCP must be explicitly specified | ||
|
|
||
| # Artifactory patterns | ||
| if "artifactory" in url_lower or "/artifactory/" in url_lower: | ||
| logger.info(f"Auto-detected provider: artifactory from URL pattern") | ||
| return "artifactory" | ||
|
|
||
| # Nexus patterns | ||
| if "nexus" in url_lower or "/nexus/" in url_lower or "/service/rest/" in url_lower: | ||
| logger.info(f"Auto-detected provider: nexus from URL pattern") | ||
| return "nexus" | ||
|
|
||
| # AWS and GCP require explicit provider - no auto-detection | ||
| logger.warning(f"Could not auto-detect provider from URL: {url}. AWS and GCP require explicit provider specification.") | ||
| return None | ||
|
|
||
| @staticmethod | ||
| def create_maven_searcher(registry: Registry, env_creds: Optional[Dict[str, dict]]) -> 'MavenArtifactSearcher': | ||
| """Create configured MavenArtifactSearcher for this registry. | ||
|
|
||
| Resolves provider, loads credentials, configures searcher. | ||
| """ | ||
| if MavenArtifactSearcher is None: | ||
| raise ImportError("qubership_pipelines_common_library not available") | ||
|
|
||
| auth_config = CloudAuthHelper.resolve_auth_config(registry, "maven") | ||
| if not auth_config: | ||
| raise ValueError("Could not resolve authConfig for maven artifacts") | ||
|
|
||
| registry_url = registry.maven_config.repository_domain_name | ||
|
|
||
| # Try to detect provider if not explicitly set | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are too many comments everywhere about same thing |
||
| # Auto-detection works for Nexus and Artifactory (on-premise registries) | ||
| # AWS and GCP must be explicitly specified | ||
| provider = CloudAuthHelper._detect_provider(registry_url, auth_config) | ||
| if not provider: | ||
| logger.error(f"V2 fallback: Could not determine provider for registry '{registry.name}'. Please specify provider in authConfig or use recognizable URL pattern (nexus/artifactory)") | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. log is redundant if there is ex |
||
| raise ValueError(f"Could not determine provider for registry '{registry.name}'") | ||
|
|
||
| if provider not in ["aws", "gcp", "artifactory", "nexus"]: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use enum |
||
| raise ValueError(f"Unsupported provider: {provider}") | ||
|
|
||
| # Nexus: remove /repository/ suffix for search API compatibility | ||
| if provider == "nexus" and registry_url.endswith("/repository/"): | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use urlparse not parse just str |
||
| registry_url = registry_url[:-len("repository/")] | ||
| logger.info(f"Nexus: adjusted registry URL to {registry_url} for search API") | ||
|
|
||
| # Get the credentials (or None if anonymous access is allowed) | ||
| creds = CloudAuthHelper.resolve_credentials(auth_config, env_creds) | ||
|
|
||
| # Create the base searcher object - provider-specific config comes next | ||
| searcher = MavenArtifactSearcher(registry_url, params={"timeout": DEFAULT_SEARCHER_TIMEOUT}) | ||
|
|
||
| # AWS/GCP require authentication (no anonymous access) | ||
| if provider in ["aws", "gcp"] and creds is None: | ||
| raise ValueError(f"{provider.upper()} requires credentials - anonymous access not supported") | ||
|
|
||
| if provider == "aws": | ||
| return CloudAuthHelper._configure_aws(searcher, auth_config, creds, registry_url) | ||
| elif provider == "gcp": | ||
| return CloudAuthHelper._configure_gcp(searcher, auth_config, creds, registry_url) | ||
| elif provider == "artifactory": | ||
| return CloudAuthHelper._configure_artifactory(searcher, creds) | ||
| else: # nexus | ||
| return CloudAuthHelper._configure_nexus(searcher, creds, registry) | ||
|
|
||
| @staticmethod | ||
| def _configure_aws(searcher: 'MavenArtifactSearcher', auth_config: AuthConfig, | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. need to add validation |
||
| creds: dict, registry_url: str) -> 'MavenArtifactSearcher': | ||
| """Configure searcher for AWS CodeArtifact (access key, secret, domain, region, repo).""" | ||
| if not auth_config.aws_domain: | ||
| raise ValueError("AWS auth requires awsDomain in authConfig") | ||
| region = CloudAuthHelper._extract_region(registry_url, auth_config) | ||
| repo_name = CloudAuthHelper._extract_repository_name(registry_url) | ||
| logger.info(f"Configuring AWS CodeArtifact: domain={auth_config.aws_domain}, region={region}") | ||
| return searcher.with_aws_code_artifact( | ||
| access_key=creds["username"], | ||
| secret_key=creds["password"], | ||
| domain=auth_config.aws_domain, | ||
| region_name=region, | ||
| repository=repo_name | ||
| ) | ||
|
|
||
| @staticmethod | ||
| def _configure_gcp(searcher: 'MavenArtifactSearcher', auth_config: AuthConfig, | ||
| creds: dict, registry_url: str) -> 'MavenArtifactSearcher': | ||
| """Configure searcher for GCP Artifact Registry (SA JSON, project, region, repo).""" | ||
| if auth_config.auth_method != "service_account": | ||
| raise ValueError(f"GCP auth_method '{auth_config.auth_method}' not supported") | ||
|
|
||
| # Extract project from authConfig or URL | ||
| project = auth_config.gcp_reg_project | ||
| if not project: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. also there is nothing in doc about this implicit logic |
||
| # Extract from GCP URL pattern: https://<region>-maven.pkg.dev/<project>/<repo> | ||
| # The project is the first path segment after pkg.dev/ | ||
| match = re.search(r'pkg\.dev/([^/]+)', registry_url) | ||
| if match: | ||
| project = match.group(1) | ||
| logger.info(f"Extracted GCP project from URL: {project}") | ||
| else: | ||
| raise ValueError("GCP auth requires gcpRegProject in authConfig or valid GCP URL format (https://<region>-maven.pkg.dev/<project>/<repo>)") | ||
|
|
||
| sa_data = creds["secret"] | ||
| sa_json = json.dumps(sa_data) if isinstance(sa_data, dict) else sa_data | ||
| region = CloudAuthHelper._extract_gcp_region(registry_url) | ||
| repo_name = CloudAuthHelper._extract_repository_name(registry_url) | ||
|
|
||
| logger.info(f"Configuring GCP Artifact Registry: project={project}, region={region}") | ||
| return searcher.with_gcp_artifact_registry( | ||
| credential_params={"service_account_key": sa_json}, | ||
| project=project, | ||
| region_name=region, | ||
| repository=repo_name | ||
| ) | ||
|
|
||
| @staticmethod | ||
| def _configure_artifactory(searcher: 'MavenArtifactSearcher', creds: Optional[dict]) -> 'MavenArtifactSearcher': | ||
| """Set up the searcher to work with Artifactory. | ||
|
|
||
| Artifactory is simpler - just username and password. | ||
| Can work anonymously if the repository allows public access. | ||
| """ | ||
| if creds is None: | ||
| logger.info("Configuring Artifactory with anonymous access (no credentials)") | ||
| return searcher.with_artifactory(username=None, password=None) | ||
|
|
||
| return searcher.with_artifactory( | ||
| username=creds.get("username", ""), | ||
| password=creds.get("password", "") | ||
| ) | ||
|
|
||
| @staticmethod | ||
| def _configure_nexus(searcher: 'MavenArtifactSearcher', creds: Optional[dict], registry: Registry) -> 'MavenArtifactSearcher': | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. registry not used in method |
||
| """Configure searcher for Nexus (username/password or anonymous). | ||
|
|
||
| Note: Library searches all Nexus repos (cannot limit to specific repo). | ||
| """ | ||
| if creds is None: | ||
| logger.info("Configuring Nexus with anonymous access (no credentials)") | ||
| return searcher.with_nexus(username=None, password=None) | ||
|
|
||
| return searcher.with_nexus( | ||
| username=creds.get("username", ""), | ||
| password=creds.get("password", "") | ||
| ) | ||
|
|
||
| @staticmethod | ||
| def get_gcp_access_token(service_account_json: str) -> Optional[str]: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we use self-written authorization method if we have to use third-party library from qubership? |
||
| """Generate fresh GCP OAuth access token from service account JSON.""" | ||
| if not GCP_AUTH_AVAILABLE: | ||
| return None | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. quietly return None without log |
||
| try: | ||
| sa_info = json.loads(service_account_json) if isinstance(service_account_json, str) else service_account_json | ||
| credentials = service_account.Credentials.from_service_account_info( | ||
| sa_info, scopes=['https://www.googleapis.com/auth/cloud-platform'] | ||
| ) | ||
| credentials.refresh(Request()) | ||
| return credentials.token | ||
| except Exception as e: | ||
| logger.error(f"Failed to generate GCP access token: {e}") | ||
| return None | ||
|
|
||
| @staticmethod | ||
| def get_gcp_credentials_from_registry(registry: Registry, env_creds: Optional[Dict[str, dict]]) -> Optional[str]: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why only for gcp? superfluous |
||
| """Extract GCP service account JSON from registry for token generation.""" | ||
| auth_config = CloudAuthHelper.resolve_auth_config(registry, "maven") | ||
| if not auth_config or auth_config.provider != "gcp": | ||
| return None | ||
| try: | ||
| creds = CloudAuthHelper.resolve_credentials(auth_config, env_creds) | ||
| sa_data = creds.get("secret") | ||
| return json.dumps(sa_data) if isinstance(sa_data, dict) else sa_data | ||
| except Exception: | ||
| return None | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to describe contract for using MavenArtifactSearcher