Skip to content

PLTF-2501: Update openhands charts script for agent-server images#622

Draft
aivong-openhands wants to merge 33 commits into
mainfrom
av/update-openhands-charts-for-agent-server-image
Draft

PLTF-2501: Update openhands charts script for agent-server images#622
aivong-openhands wants to merge 33 commits into
mainfrom
av/update-openhands-charts-for-agent-server-image

Conversation

@aivong-openhands
Copy link
Copy Markdown
Contributor

@aivong-openhands aivong-openhands commented May 11, 2026

Summary

  • Switch update_openhands_charts.py to read the runtime image tag from sandbox_spec_service.py in the OpenHands enterprise repo at the cloud release tag, instead of from OPENHANDS_RUNTIME_IMAGE_TAG in the deploy workflow env vars
  • Remove openhands_runtime_image_tag from DeployConfig — the field is no longer needed
  • Update RUNTIME_TAG_PATTERN and WARM_RUNTIMES_TAG_PATTERN to match ghcr.io/openhands/agent-server (charts switched from openhands/runtime)
  • Add --skip-version-check flag to continue updating image tags even when charts are already at the latest version
  • fix: use correct agent-server image tag format (X.Y.Z-python) — replace the cloud-X.Y.Z-nikolaik format inherited from the old runtime convention with the X.Y.Z-python suffix that agent-server images actually use; centralize the value in shared RUNTIME_IMAGE_TAG / NEW_RUNTIME_IMAGE_TAG constants
  • refactor: unify error_if_missing handling in update_tag_in_content — add error_if_missing: bool = True to update_tag_in_content mirroring its sibling update_all_tags_in_content; replace two ... if re.search(PATTERN, content) else content ternaries at the replicated-wrapper call sites with explicit error_if_missing=False, removing double regex evaluation and restoring API symmetry between the two helpers
  • added a PR check to run the update openhands chart script's tests

Coverage gaps closed

  • TestUpdateRuntimeApiWorkflow + TestUpdateOpenhandsWorkflow — the two orchestration functions previously had no direct tests; new tests exercise their wiring contract (has_changes propagation from values into the chart bump, positional-argument pass-through, dry_run threading, return values)
  • TestResolveOpenhandsVersion (4 tests) and TestProcessUpdates (3 tests) covering guard-clause branches that were previously untested

Test design improvements

  • Split multi-behavior tests into focused ones per the Granular property: test_openhands_version_bump_when_has_changes, test_bump_runtime_api_version, test_returns_deploy_config_on_success, and a 7-assertion replicated-wrapper test were each split so a failure points at one concern
  • Parametrize idempotency is_unchanged checks across TestUpdateValues, TestUpdateReplicatedOpenhandsWrapperValues, and TestUpdateRuntimeApiValues so failure messages name the diverging key in the test ID
  • Parametrize cloud_tag_exists ref-format check, main() output format check, and UpdateResult count-property check (with explicit names rather than derived .rstrip("s") + "_count")
  • Consolidate three "preserved after update" tests into two parametrize-driven tests (scalar fields, list lengths); collapse three shared-input image-tag tests into one
  • Replace test_no_redirect_message_in_output (which asserted on PyGithub's internal log strings) with a direct assertion on the github logger level
  • Derive expected versions via bump_patch_version(RUNTIME_API_CHART_FULL_VERSION) instead of hardcoding "0.1.21" so tests survive fixture baseline changes
  • Move 5 get_deploy_config mock response factories (make_http_error_response, make_json_error_response, make_missing_key_response, make_invalid_base64_response, make_invalid_yaml_response) into conftest.py; use the shared make_workflow_response factory in TestGetDeployConfig; extract stub fixtures for resolve_openhands_version and the process_updates chain; reuse mock_main_early_exit in TestSkipVersionCheck
  • Drop TestAssertVersionBumped, TestGetDependencyVersion, TestGetChartValue — these tested conftest assertion helpers rather than production code; misuse surfaces immediately in real tests, and the helpers are still exercised
  • Drop a redundant ?ref= substring assertion implied by exact-URL equality; compare list counts to pre-update values instead of literals

Test plan

  • 187 tests passing
  • Manually verified --dry-run --skip-version-check produces no errors against current charts
  • Mutation testing on get_runtime_image_tag_from_sandbox_spec — 23/24 killed, 1 equivalent mutant ("utf-8" vs "UTF-8")
  • Test with next release

aivong-openhands and others added 5 commits May 10, 2026 19:06
Co-authored-by: openhands <openhands@all-hands.dev>
Remove OPENHANDS_RUNTIME_IMAGE_TAG from DeployConfig and deploy.yaml parsing.
Instead, fetch the AGENT_SERVER_IMAGE constant from sandbox_spec_service.py
in the OpenHands enterprise repo at the cloud release tag.
Kill mutant that changed ValueError message to None — assert error
output contains "AGENT_SERVER_IMAGE" not just the prefix.

Also add setup.cfg for mutmut v3 (also_copy conftest.py, tests_dir).
When charts are already up to date, the script exits early. This flag
bypasses that check so image tags are always re-fetched and applied.
Charts switched from ghcr.io/openhands/runtime to
ghcr.io/openhands/agent-server; update RUNTIME_TAG_PATTERN and
WARM_RUNTIMES_TAG_PATTERN to match.
Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean refactoring with excellent test coverage (158 tests + mutation testing at 96% kill rate). The switch to reading image tags from sandbox_spec_service.py is a pragmatic improvement that uses a more authoritative source.

[RISK ASSESSMENT]

  • Overall PR: ⚠️ Risk Assessment: 🟡 MEDIUM

This PR changes the source of truth for runtime image tags and removes DeployConfig.openhands_runtime_image_tag, which could impact downstream consumers. However, risk is well-mitigated by comprehensive test coverage (including mutation testing), proper error handling in get_runtime_image_tag_from_sandbox_spec(), and manual dry-run verification. The architectural change is sound and the implementation is clean.

VERDICT:
Worth merging - Well-tested refactoring that solves a real problem

KEY INSIGHT:
Mutation testing (23/24 killed) demonstrates unusually high test quality for infrastructure code.

@aivong-openhands
Copy link
Copy Markdown
Contributor Author

aivong-openhands commented May 11, 2026

  • Test with next release

- Split 7-assertion replicated wrapper test into focused parametrized tests
  (3 file-content cases + 4 result-key cases)
- Add TestResolveOpenhandsVersion (4 tests) and TestProcessUpdates (3 tests)
  covering guard-clause branches previously untested
- Collapse 3-parametrize scalar metadata test into single assert_file_contains_all
- Consolidate 3 shared-input image tag tests into one parametrized test

169 tests, 0 failures.
Replace cloud-X.Y.Z-nikolaik with X.Y.Z-python throughout test fixtures,
test inputs, and docstrings. Agent-server images use a -python suffix,
not the old runtime -nikolaik convention.

Add RUNTIME_IMAGE_TAG and NEW_RUNTIME_IMAGE_TAG constants to conftest.py
so all tests reference one definition.
Moves 5 module-level mock factory helpers
(_make_http_error_response, _make_json_error_response,
_make_missing_key_response, _make_invalid_base64_response,
_make_invalid_yaml_response) from test_update_openhands_charts.py into
conftest.py, drops their underscore prefix, and removes the redundant
Mock/base64 parameters now that conftest.py imports them directly.

Improves Maintainable/Understandable test properties: the helpers now sit
alongside the other test-support functions, are discoverable by future
test classes, and the parametrize lambdas shrink from
``lambda Mock, _: _make_http_error_response(Mock, 401, "Unauthorized")``
to ``lambda: make_http_error_response(401, "Unauthorized")``.
Removes the derivation ``count_property = field.rstrip("s") + "_count"``
in test_count_properties_return_correct_counts and instead passes each
count property name as an explicit pytest.param argument. The previous
form coupled the test to a naming convention on the UpdateResult
dataclass: a rename like ``errors`` -> ``errors_list`` or
``error_count`` -> ``errors_count`` would have made the test fail for a
naming reason rather than a logic reason. Explicit names also make each
parametrize row read like a small spec.
Splits test_returns_deploy_config_on_success into two atomic tests:
- test_returns_deploy_config_instance_on_success checks the type-level
  contract (isinstance DeployConfig).
- test_runtime_api_sha_parsed_from_workflow_env checks the
  value-extraction contract (RUNTIME_API_SHA -> runtime_api_sha).

The original test mixed three assertions (not None, isinstance, value
equality), which gave three possible failure-diagnosis paths from a
single named test. Splitting them improves the Granular property: a
failure now points directly at the broken concern.
Adds an ``error_if_missing: bool = True`` parameter to
update_tag_in_content, mirroring the existing parameter on its
sibling update_all_tags_in_content. The two call sites for
REPLICATED_PROXY_WARM_RUNTIME_IMAGE_PATTERN and
REPLICATED_LOCAL_WARM_RUNTIME_IMAGE_PATTERN previously wrapped the call
in a ``... if re.search(PATTERN, content) else content`` ternary to
suppress spurious "Could not find" errors when the replicated wrapper
patterns are absent from upstream values.yaml. That hand-coded guard
double-evaluated the regex and made the call shape inconsistent with the
neighbouring update_all_tags_in_content(..., error_if_missing=False)
calls. Both ternaries are now replaced with explicit
error_if_missing=False, restoring API symmetry between the two helpers.

Behaviour-preserving — all 170 tests still pass.
…onfig

Replace the local mock_successful_response fixture with the make_workflow_response
factory from conftest. Eliminates duplicate response-construction logic and the
test file's direct base64 import.
Remove TestAssertVersionBumped, TestGetDependencyVersion, and TestGetChartValue.
These verified conftest assertion helpers rather than production code; misuse
in real tests surfaces immediately as wrong assertion values. The helpers
themselves remain in conftest and are still exercised by the suite.
Split the two-SUT-call test_handles_various_tag_formats into a parametrized
test so each tag-format case is reported and diagnosed independently.
For each idempotency test in TestUpdateValues, TestUpdateReplicatedOpenhandsWrapperValues,
and TestUpdateRuntimeApiValues: introduce a fixture that does the apply-then-reapply
work once, then split assertions into one has_changes check plus a parametrized
per-key is_unchanged check. Failure messages now name the diverging key in the
test ID rather than blaming an opaque assert line.
Add a TestUpdateRuntimeApiWorkflow + TestUpdateOpenhandsWorkflow pair so the
two orchestration functions that previously had no direct coverage are
exercised against their wiring contract (has_changes propagation,
positional argument pass-through, dry_run threading, return values).

Tighten existing tests:
- Derive expected versions via bump_patch_version(RUNTIME_API_CHART_FULL_VERSION)
  instead of hardcoding "0.1.21", so tests survive fixture baseline changes.
- Split test_openhands_version_bump_when_has_changes and
  test_bump_runtime_api_version so each test asserts one behavior
  (file content vs. result/return value).
- Drop a redundant "?ref=" substring assertion implied by exact-URL equality.
- Replace test_no_redirect_message_in_output (which asserted on PyGithub's
  internal log strings) with a direct assertion on the github logger level.
- Collapse three "preserved after update" tests in TestUpdateChart into
  two parametrize-driven tests (scalar fields, list lengths).
all-hands-bot

This comment was marked as outdated.

@aivong-openhands
Copy link
Copy Markdown
Contributor Author

@OpenHands add a PR check to run the update openhands charts tests

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 11, 2026

I'm on it! aivong-openhands can track my progress at all-hands.dev

@aivong-openhands aivong-openhands changed the title Update chart script for agent-server images and sandbox spec image tag PLTF-2501: Update chart script for agent-server images and sandbox spec image tag May 11, 2026
Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Contributor Author

Added a dedicated PR workflow in .github/workflows/test-update-openhands-charts.yml that runs uv run scripts/update_openhands_charts/test_update_openhands_charts.py for changes under scripts/update_openhands_charts/**.

Pushed in commit 8196276.

This comment was created by an AI agent (OpenHands) on behalf of the user.

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 11, 2026

I'm on it! aivong-openhands can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>
Copy link
Copy Markdown
Contributor Author

Updated the PR check so it now runs all tests under scripts/ instead of only scripts/update_openhands_charts/test_update_openhands_charts.py.

Included in commit 1f43e1a.

Validation: uv run --with pytest --with requests --with PyGithub --with fastapi --with uvicorn --with httpx --with ruamel.yaml python -m pytest scripts -q313 passed.

This comment was created by an AI agent (OpenHands) on behalf of the user.

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 11, 2026

Final update since the last summary:

  • Addressed the new PR comment to broaden the PR check
  • Updated .github/workflows/test-update-openhands-charts.yml so it now:
    • triggers on scripts/**
    • runs all tests under scripts/, not just scripts/update_openhands_charts/test_update_openhands_charts.py
  • Validated the new workflow command locally: 313 passed
  • Committed and pushed the change in 1f43e1a with message: ci: run all scripts tests in PR check
  • Posted a PR reply with the update and AI disclosure

Conciseness:

  • The change set remained concise
  • Only the workflow file was modified
  • No extraneous repository changes were introduced

Current status:

  • The updated GitHub Actions check is pending on the PR.

@aivong-openhands aivong-openhands marked this pull request as ready for review May 11, 2026 19:59
all-hands-bot

This comment was marked as outdated.

@aivong-openhands
Copy link
Copy Markdown
Contributor Author

This script generated the updates in #631 except for the replicated folder enterprise image override and the automations chart updates

@aivong-openhands
Copy link
Copy Markdown
Contributor Author

aivong-openhands commented May 13, 2026

  • follow up on why the agent server image versions in the replicated folder were not updated with the script 765e4f2

aivong-openhands and others added 6 commits May 15, 2026 17:25
…#623)

* upgrade laminar helm chart to address noisy postgres error

* bump openhands chart version to 0.7.11
Adds a custom_model field to the KOTS config, plumbs the existing
custom_base_url / custom_api_key + new custom_model through to:
- OpenHands env (LLM_API_KEY, LLM_BASE_URL, LLM_MODEL=openai/<model>)
- LiteLLM env secrets (CUSTOM_API_KEY, CUSTOM_API_BASE)
- LiteLLM proxy model_list entry under model_name 'custom-llm', routed
  via the generic openai/* provider so any OpenAI-compatible endpoint
  (OpenRouter, vLLM, Ollama, LM Studio, LiteLLM gateway) works.

Sets LITELLM_DEFAULT_MODEL=litellm_proxy/custom-llm so new users land
on the configured model by default.
Add a regex validation to plugin_directory_marketplace_source so the
Replicated console rejects URIs that won't resolve. Only github://,
https://, and http:// are supported by the catalog loader; file://
was previously mentioned in help text but isn't a valid runtime input.
…rdening + configmap-checksum auto-rollout) (#625)

* PLTF-2504: upgrade laminar helm chart to address rabbitmq disk and quickwit memory issues

Bumps the laminar dependency from 0.1.9 to 0.1.10, which pulls in
lmnr-ai/lmnr-helm#24 — RabbitMQ self-protection
(disk_free_limit.absolute, vm_memory_high_watermark.absolute, larger PVC)
and Quickwit memory/PVC tuning. These address out-of-space and memory-limit
errors observed in SaaS prod; SaaS staging and Replicated deployments were
unaffected.

Bumps openhands chart 0.7.11 -> 0.7.12.

* PLTF-2504: pin laminar rabbitmq PVC size to 50Gi to unblock helm upgrade

The 0.1.10 chart raises the laminar-rabbitmq volumeClaimTemplates
storage default from 50Gi to 100Gi. K8s forbids changing volumeClaim-
Templates on an existing StatefulSet, so `helm upgrade` from 0.7.11 →
0.7.12 fails with "spec: Forbidden" (observed on a Replicated install).

Pinning the value in the parent chart keeps the rendered template
identical to what's already deployed at 0.1.9, letting the upgrade
go through while picking up every other 0.1.10 fix (RabbitMQ self-
protection, console logging, Quickwit hardening, StorageClass
allowVolumeExpansion).

To grow the actual rabbitmq disk where the backing storage enforces
size (e.g. GKE hyperdisk-balanced), patch the live PVC out of band.

* PLTF-2504: upgrade laminar helm chart to 0.1.11 for configmap-checksum auto-rollout

Bumps laminar from 0.1.10 to 0.1.11, which pulls in
lmnr-ai/lmnr-helm#25. That PR adds
`checksum/config` pod-template annotations to the rabbitmq StatefulSet
and all five quickwit workloads (control-plane, indexer, searcher,
metastore, janitor), so `helm upgrade` now rolls those pods
automatically whenever their configmaps change.

Resolves the operator step required by 0.1.10 in this PR: a manual
`kubectl rollout restart sts laminar-rabbitmq` was needed after every
helm upgrade that altered the rabbitmq configmap (subPath/init-copy
mounts don't live-update from configmap changes). With 0.1.11 the
admin-console-driven helm upgrade is sufficient end to end.

The rabbitmq.persistence.size=50Gi pin from the prior commit stays —
0.1.11 does not revert the 0.1.10 PVC default bump to 100Gi, and
StatefulSet volumeClaimTemplates remain immutable.
@aivong-openhands
Copy link
Copy Markdown
Contributor Author

@OpenHands resolve conflicts

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 15, 2026

@aivong-openhands your session has expired. Please login again at OpenHands Cloud and try again.

The four REPLICATED_* tag patterns lived inside update_openhands_values,
which only ever ran against charts/openhands/values.yaml — where the
KOTS templating they target never appears. They no-op'd silently
(error_if_missing=False), so cloud releases shipped the new
agent-server tag in the chart values but left replicated/openhands.yaml
on the old tag. The bump had to be applied by hand each release
(most recently 765e4f2).

- Extract the four replicated patterns into a dedicated
  update_replicated_openhands_values() targeted at REPLICATED_OPENHANDS_PATH
  with error_if_missing=True, so future pattern drift fails loudly.
- Wire it into update_openhands_workflow between the chart-values update
  and the Chart.yaml bump. Chart bump decision still keys off chart-values
  changes only — the replicated wrapper isn't part of the helm artifact.
- Allow optional comment lines between repository: and tag: in
  REPLICATED_PROXY_AGENT_SERVER_TAG_PATTERN; the real file documents the
  non-proxy fallback with a commented-out repository line, which broke
  the old \s*\n\s* gap.
- Mock update_replicated_openhands_values in TestUpdateOpenhandsWorkflow
  fixtures — without this, dry_run=False workflow tests would write to
  the real replicated/openhands.yaml on disk.
- Update sample fixture to mirror the production file's comment layout
  so the regex change is exercised end-to-end.
@aivong-openhands
Copy link
Copy Markdown
Contributor Author

aivong-openhands commented May 21, 2026

  • test with next release

@aivong-openhands
Copy link
Copy Markdown
Contributor Author

@OpenHands resolve conflicts

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 21, 2026

I'm on it! aivong-openhands can track my progress at all-hands.dev

…harts-for-agent-server-image

# Conflicts:
#	charts/openhands/Chart.lock
#	charts/openhands/Chart.yaml
Copy link
Copy Markdown
Contributor Author

Merged main into the PR branch to resolve conflicts.

  • Conflicts were in charts/openhands/Chart.lock and charts/openhands/Chart.yaml (chart/laminar version bumps from main). Took main's newer values (appVersion: cloud-1.29.1, chart version: 0.7.22, laminar: 0.1.13) since this PR only touches the update script, not chart versions themselves.
  • charts/openhands/values.yaml and replicated/openhands.yaml auto-merged cleanly.
  • Re-ran uv run ... python -m pytest scripts -q after the merge → 317 passed.

Merge commit: 822064e.

This comment was created by an AI agent (OpenHands) on behalf of the user.

@openhands-ai
Copy link
Copy Markdown

openhands-ai Bot commented May 21, 2026

Summary

Resolved the merge conflicts on PR #622 as requested.

Checklist

  • Fetched origin/main and merged into av/update-openhands-charts-for-agent-server-image
  • Resolved the two conflicted files:
    • charts/openhands/Chart.lock — took main's side (newer laminar: 0.1.13, updated digest, new plugin-directory and integrations-hub entries)
    • charts/openhands/Chart.yaml — took main's side (appVersion: cloud-1.29.1, chart version: 0.7.22, laminar: 0.1.13)
    • Rationale: this PR's scope is the update script, not chart versions, so main's newer chart metadata should win
  • Verified the other touched files (charts/openhands/values.yaml, replicated/openhands.yaml, replicated/secrets.yaml) auto-merged cleanly with no manual intervention
  • Re-ran the script test suite after the merge: 317 passed
  • Committed the merge (822064e) and pushed to the PR branch
  • Posted an AI-disclosed status comment on the PR

Concise change set

The only new commit on the branch is the merge commit itself. No script, workflow, or chart contents were modified beyond what conflict resolution required — nothing extraneous to revert.

Copy link
Copy Markdown
Contributor

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Well-architected refactoring with exceptional test coverage (191 tests, 96% mutation kill rate). The shift to reading runtime image tags from sandbox_spec_service.py as the authoritative source is cleaner and more maintainable than parsing workflow env vars.

[RISK ASSESSMENT]

  • [Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

Changes the source of truth for runtime image tags from workflow env vars to sandbox_spec_service.py, and removes openhands_runtime_image_tag from DeployConfig. The coupling to AGENT_SERVER_IMAGE constant format creates a fragile dependency, but this is acceptable internal coupling within the same organization. Risk is well-mitigated by comprehensive test coverage (191 tests including mutation testing), clear error messages when the pattern isn't found, and no external consumers of the removed field.

VERDICT:
Worth merging: Solid engineering with excellent test discipline. The architectural decision to use sandbox_spec_service.py as the source of truth is pragmatic and reduces indirection.

KEY INSIGHT:
Reading configuration from its point of use (sandbox_spec_service.py) rather than copying it through workflow env vars eliminates a synchronization problem and makes the system more resilient to drift.


Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/OpenHands-Cloud/actions/runs/26205625289


Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:

  1. Add a .agents/skills/custom-codereview-guide.md file to your branch (or edit it if one already exists) with the /codereview trigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.
  2. Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
  3. When your PR is merged, the guideline file goes through normal code review by repository maintainers.

Resolve with AI? Install the iterate skill in your agent and run /iterate to automatically drive this PR through CI, review, and QA until it's merge-ready.

@aivong-openhands
Copy link
Copy Markdown
Contributor Author

  • check why these changes needed to be made manually 00cca3a

@aivong-openhands aivong-openhands marked this pull request as draft May 26, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants