Skip to content

feat: Registry v2 artifact searcher#805

Open
BackendBits wants to merge 86 commits intomainfrom
registry-v2-artifact-searcher
Open

feat: Registry v2 artifact searcher#805
BackendBits wants to merge 86 commits intomainfrom
registry-v2-artifact-searcher

Conversation

@BackendBits
Copy link
Collaborator

@BackendBits BackendBits commented Nov 18, 2025

Summary

Add support for Registry V2 cloud registries in the Python artifact searcher and fix build_env tests so CI passes.

Issue

Internal need to:

  • Use AWS CodeArtifact and GCP Artifact Registry via RegDef V2.
  • Unblock the open-source CI pipeline, which was failing in scripts/build_env tests due to a [tests] package conflict.

Breaking Change?

  • Yes
  • No

No existing flows are changed; V2 support is used only when configured.

Scope / Project

  • python/artifact-searcher
  • scripts/build_env

Implementation Notes

  • Added CloudAuthHelper in artifact_searcher to:

    • Resolve authConfig entries from RegDef V2.
    • Resolve credentials from env_creds.
    • Configure MavenArtifactSearcher for:
      • AWS CodeArtifact.
      • GCP Artifact Registry (service account).
  • Updated build_env RegDef V2 handling to:

    • Detect version 2.0.
    • Validate V2 RegDefs with the correct schema.
    • Check authConfig references in V2 config sections.
  • Fixed build_env tests by:

    • Adding [scripts/build_env/tests/init.py] so the local [tests] package is used in CI.
    • Keeping imports as from tests.test_helpers import TestHelpers.

Tests / Evidence

  • Existing pytest suites for:

    • python/envgene/envgenehelper
    • scripts/build_env

    run in the GitHub Actions [tests] job.

  • After the changes, the previous ModuleNotFoundError in scripts/build_env tests is resolved and the suite runs to completion (subject to any functional test failures).

Additional Notes

  • No new external dependencies beyond what artifact_searcher already requires.
  • [CloudAuthHelper] logs clear errors when configuration or credentials are incomplete, making misconfigurations easier to diagnose.

@BackendBits BackendBits force-pushed the registry-v2-artifact-searcher branch 3 times, most recently from 7047995 to f4408c0 Compare November 21, 2025 07:22
@BackendBits BackendBits force-pushed the registry-v2-artifact-searcher branch from f4408c0 to 4c67632 Compare November 30, 2025 13:01
@BackendBits BackendBits force-pushed the registry-v2-artifact-searcher branch from 4c67632 to 43d5a27 Compare December 1, 2025 09:58
@github-actions github-actions bot added the bug label Dec 1, 2025
Detect version field in RegDef files and validate against V2 schema when version is 2.0.
Use logger for validation messages and move schema paths to constants.
- Add AuthConfig model for V2 authentication configuration
- Add version and auth_config fields to Registry model
- Add V2 routing in check_artifact_async with fallback to V1
- Add CloudAuthHelper for AWS/GCP cloud registry authentication
- Add environment credential loading for V2 cloud registries
- Add V2 cloud registry dependencies (boto3, google-auth)
- Add comprehensive tests for V2 models and routing

Preserves all V1 functionality including:
- Nexus detection and URL conversion
- Snapshot version resolution
- URL-based artifact search
@BackendBits BackendBits force-pushed the registry-v2-artifact-searcher branch from 5ffc9b1 to f556b46 Compare December 2, 2025 23:57
@BackendBits BackendBits force-pushed the registry-v2-artifact-searcher branch from 0f21053 to e2bc8bf Compare December 3, 2025 06:31
@BackendBits BackendBits force-pushed the registry-v2-artifact-searcher branch from c4a4175 to 05467d1 Compare December 14, 2025 06:28
- Simplified retry logic (2 retries, 5s fixed delay)

- Reduced timeouts to reasonable values (60s search, 120s download)

- Removed debug/diagnostic code

- Removed unnecessary test files

- Clean up code style
@BackendBits BackendBits force-pushed the registry-v2-artifact-searcher branch from 05467d1 to 186f166 Compare December 14, 2025 06:55

auth_ref = getattr(registry.maven_config, 'auth_config', None)
if not auth_ref:
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we quietly return None without log

try:
from qubership_pipelines_common_library.v1.maven_client import MavenArtifactSearcher
except ImportError:
MavenArtifactSearcher = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to raise error here instead of using None, so it’s clear library is missing. if we return None here, exception will still be raised later at line 81, but failing early would make error clearer.

helm_config: Optional[HelmConfig] = None
helm_app_config: Optional[HelmAppConfig] = None
version: Optional[str] = "1.0"
auth_config: Optional[dict[str, AuthConfig]] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you haven't made different models for reg def v2. do not mix 2 models with different schemas in one. use polymorphism



# Timeout for MavenArtifactSearcher: (connect_timeout, read_timeout)
DEFAULT_SEARCHER_TIMEOUT = (30, 60)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify all timeouts in one place with defaults. may be here artifact_searcher.utils.constants.
make it possible to set them from pipeline. ARTIFACT_SEARCHER_CONFIG: map
don't forget to put this new pipe param here and job parameters

Returns AuthConfig or None.
"""
if artifact_type != "maven":
return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for what this condition? artifact_type: has default maven value always. what sense of artifact_type if it is not used anywhere else ?

max_retries = 2
last_error = None
local_path = None
maven_url = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used

searcher = await loop.run_in_executor(None, CloudAuthHelper.create_maven_searcher, app.registry, env_creds)

urls = await asyncio.wait_for(
loop.run_in_executor(None, partial(searcher.find_artifact_urls, artifact=maven_artifact)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we use third-party synchronous library to search for an artifact If we already have asynchronous method in our library for v1, we just need to add auth for other registers and that all. We need to reconsider the using of this library.

maven_url = None

# Retry on transient errors (401, timeout, expired)
for attempt in range(max_retries):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to add mechanism of retries for both v1 and v2

local_path = os.path.join(create_app_artifacts_local_path(app.name, version), os.path.basename(maven_url))
os.makedirs(os.path.dirname(local_path), exist_ok=True)

download_success = await _v2_download_with_fallback(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to waste time downloading artifact when searching, in this method just need to find

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can check check_artifact_by_full_url_async for example

return await _check_artifact_v1_async(app, artifact_extension, version, cred=None, classifier="")

# Try to extract useful HTTP error details for debugging
if hasattr(e, 'response'):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use same method for v1 and v2 for error handling


try:
await asyncio.wait_for(
loop.run_in_executor(None, lambda: searcher.download_artifact(url, str(local_path))),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have method for downloading artifact, why do we use it from third-party library again?

)

@staticmethod
def get_gcp_access_token(service_account_json: str) -> Optional[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we use self-written authorization method if we have to use third-party library from qubership?


try:
headers = {}
if auth_config.provider == "gcp":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there download implementation 3? we already have download either in our library or in third-paty one

return None

@staticmethod
def get_gcp_credentials_from_registry(registry: Registry, env_creds: Optional[Dict[str, dict]]) -> Optional[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only for gcp? superfluous

if 'SNAPSHOT' in version:
raise ValueError("SNAPSHOT is not supported version of Solution Descriptor artifacts")
# TODO: check if job would fail without plugins
def download_sd_by_appver(app_name: str, version: str, plugins: PluginEngine, env: Environment = None) -> dict[str, object]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env: Environment = None not used


# V1 fallback path or non-V2 registry - need credentials for HTTP download
cred = None
if app_def.registry.credentials_id and env_creds:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove fallback here also

if artifact_info:
template_url, _ = artifact_info
template_url, repo_info = artifact_info
# V2 optimization: artifact already downloaded by MavenArtifactSearcher
Copy link
Collaborator

@miyamuraga miyamuraga Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optimization not needed here

template_url, _ = artifact_info
template_url, repo_info = artifact_info
# V2 optimization: artifact already downloaded by MavenArtifactSearcher
if isinstance(repo_info, tuple) and len(repo_info) == 2 and repo_info[0] == "v2_downloaded":
Copy link
Collaborator

@miyamuraga miyamuraga Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is it v2_downloaded ?? repository name can't be v2_downloaded

template_url, _ = artifact_info
template_url, repo_info = artifact_info
# V2 optimization: artifact already downloaded by MavenArtifactSearcher
if isinstance(repo_info, tuple) and len(repo_info) == 2 and repo_info[0] == "v2_downloaded":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance(repo_info, tuple) and len(repo_info) == 2 needless

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants