PLTF-2501: Update openhands charts script for agent-server images#622
PLTF-2501: Update openhands charts script for agent-server images#622aivong-openhands wants to merge 33 commits into
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
Remove OPENHANDS_RUNTIME_IMAGE_TAG from DeployConfig and deploy.yaml parsing. Instead, fetch the AGENT_SERVER_IMAGE constant from sandbox_spec_service.py in the OpenHands enterprise repo at the cloud release tag.
Kill mutant that changed ValueError message to None — assert error output contains "AGENT_SERVER_IMAGE" not just the prefix. Also add setup.cfg for mutmut v3 (also_copy conftest.py, tests_dir).
When charts are already up to date, the script exits early. This flag bypasses that check so image tags are always re-fetched and applied.
Charts switched from ghcr.io/openhands/runtime to ghcr.io/openhands/agent-server; update RUNTIME_TAG_PATTERN and WARM_RUNTIMES_TAG_PATTERN to match.
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean refactoring with excellent test coverage (158 tests + mutation testing at 96% kill rate). The switch to reading image tags from sandbox_spec_service.py is a pragmatic improvement that uses a more authoritative source.
[RISK ASSESSMENT]
- Overall PR:
⚠️ Risk Assessment: 🟡 MEDIUM
This PR changes the source of truth for runtime image tags and removes DeployConfig.openhands_runtime_image_tag, which could impact downstream consumers. However, risk is well-mitigated by comprehensive test coverage (including mutation testing), proper error handling in get_runtime_image_tag_from_sandbox_spec(), and manual dry-run verification. The architectural change is sound and the implementation is clean.
VERDICT:
✅ Worth merging - Well-tested refactoring that solves a real problem
KEY INSIGHT:
Mutation testing (23/24 killed) demonstrates unusually high test quality for infrastructure code.
|
- Split 7-assertion replicated wrapper test into focused parametrized tests (3 file-content cases + 4 result-key cases) - Add TestResolveOpenhandsVersion (4 tests) and TestProcessUpdates (3 tests) covering guard-clause branches previously untested - Collapse 3-parametrize scalar metadata test into single assert_file_contains_all - Consolidate 3 shared-input image tag tests into one parametrized test 169 tests, 0 failures.
Replace cloud-X.Y.Z-nikolaik with X.Y.Z-python throughout test fixtures, test inputs, and docstrings. Agent-server images use a -python suffix, not the old runtime -nikolaik convention. Add RUNTIME_IMAGE_TAG and NEW_RUNTIME_IMAGE_TAG constants to conftest.py so all tests reference one definition.
Moves 5 module-level mock factory helpers (_make_http_error_response, _make_json_error_response, _make_missing_key_response, _make_invalid_base64_response, _make_invalid_yaml_response) from test_update_openhands_charts.py into conftest.py, drops their underscore prefix, and removes the redundant Mock/base64 parameters now that conftest.py imports them directly. Improves Maintainable/Understandable test properties: the helpers now sit alongside the other test-support functions, are discoverable by future test classes, and the parametrize lambdas shrink from ``lambda Mock, _: _make_http_error_response(Mock, 401, "Unauthorized")`` to ``lambda: make_http_error_response(401, "Unauthorized")``.
Removes the derivation ``count_property = field.rstrip("s") + "_count"``
in test_count_properties_return_correct_counts and instead passes each
count property name as an explicit pytest.param argument. The previous
form coupled the test to a naming convention on the UpdateResult
dataclass: a rename like ``errors`` -> ``errors_list`` or
``error_count`` -> ``errors_count`` would have made the test fail for a
naming reason rather than a logic reason. Explicit names also make each
parametrize row read like a small spec.
Splits test_returns_deploy_config_on_success into two atomic tests: - test_returns_deploy_config_instance_on_success checks the type-level contract (isinstance DeployConfig). - test_runtime_api_sha_parsed_from_workflow_env checks the value-extraction contract (RUNTIME_API_SHA -> runtime_api_sha). The original test mixed three assertions (not None, isinstance, value equality), which gave three possible failure-diagnosis paths from a single named test. Splitting them improves the Granular property: a failure now points directly at the broken concern.
Adds an ``error_if_missing: bool = True`` parameter to update_tag_in_content, mirroring the existing parameter on its sibling update_all_tags_in_content. The two call sites for REPLICATED_PROXY_WARM_RUNTIME_IMAGE_PATTERN and REPLICATED_LOCAL_WARM_RUNTIME_IMAGE_PATTERN previously wrapped the call in a ``... if re.search(PATTERN, content) else content`` ternary to suppress spurious "Could not find" errors when the replicated wrapper patterns are absent from upstream values.yaml. That hand-coded guard double-evaluated the regex and made the call shape inconsistent with the neighbouring update_all_tags_in_content(..., error_if_missing=False) calls. Both ternaries are now replaced with explicit error_if_missing=False, restoring API symmetry between the two helpers. Behaviour-preserving — all 170 tests still pass.
…onfig Replace the local mock_successful_response fixture with the make_workflow_response factory from conftest. Eliminates duplicate response-construction logic and the test file's direct base64 import.
Remove TestAssertVersionBumped, TestGetDependencyVersion, and TestGetChartValue. These verified conftest assertion helpers rather than production code; misuse in real tests surfaces immediately as wrong assertion values. The helpers themselves remain in conftest and are still exercised by the suite.
Split the two-SUT-call test_handles_various_tag_formats into a parametrized test so each tag-format case is reported and diagnosed independently.
For each idempotency test in TestUpdateValues, TestUpdateReplicatedOpenhandsWrapperValues, and TestUpdateRuntimeApiValues: introduce a fixture that does the apply-then-reapply work once, then split assertions into one has_changes check plus a parametrized per-key is_unchanged check. Failure messages now name the diverging key in the test ID rather than blaming an opaque assert line.
Add a TestUpdateRuntimeApiWorkflow + TestUpdateOpenhandsWorkflow pair so the two orchestration functions that previously had no direct coverage are exercised against their wiring contract (has_changes propagation, positional argument pass-through, dry_run threading, return values). Tighten existing tests: - Derive expected versions via bump_patch_version(RUNTIME_API_CHART_FULL_VERSION) instead of hardcoding "0.1.21", so tests survive fixture baseline changes. - Split test_openhands_version_bump_when_has_changes and test_bump_runtime_api_version so each test asserts one behavior (file content vs. result/return value). - Drop a redundant "?ref=" substring assertion implied by exact-URL equality. - Replace test_no_redirect_message_in_output (which asserted on PyGithub's internal log strings) with a direct assertion on the github logger level. - Collapse three "preserved after update" tests in TestUpdateChart into two parametrize-driven tests (scalar fields, list lengths).
|
@OpenHands add a PR check to run the update openhands charts tests |
|
I'm on it! aivong-openhands can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Added a dedicated PR workflow in Pushed in commit This comment was created by an AI agent (OpenHands) on behalf of the user. |
|
I'm on it! aivong-openhands can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Updated the PR check so it now runs all tests under Included in commit Validation: This comment was created by an AI agent (OpenHands) on behalf of the user. |
|
Final update since the last summary:
Conciseness:
Current status:
|
|
This script generated the updates in #631 except for the replicated folder enterprise image override and the automations chart updates |
|
…#623) * upgrade laminar helm chart to address noisy postgres error * bump openhands chart version to 0.7.11
Adds a custom_model field to the KOTS config, plumbs the existing custom_base_url / custom_api_key + new custom_model through to: - OpenHands env (LLM_API_KEY, LLM_BASE_URL, LLM_MODEL=openai/<model>) - LiteLLM env secrets (CUSTOM_API_KEY, CUSTOM_API_BASE) - LiteLLM proxy model_list entry under model_name 'custom-llm', routed via the generic openai/* provider so any OpenAI-compatible endpoint (OpenRouter, vLLM, Ollama, LM Studio, LiteLLM gateway) works. Sets LITELLM_DEFAULT_MODEL=litellm_proxy/custom-llm so new users land on the configured model by default.
Add a regex validation to plugin_directory_marketplace_source so the Replicated console rejects URIs that won't resolve. Only github://, https://, and http:// are supported by the catalog loader; file:// was previously mentioned in help text but isn't a valid runtime input.
…rdening + configmap-checksum auto-rollout) (#625) * PLTF-2504: upgrade laminar helm chart to address rabbitmq disk and quickwit memory issues Bumps the laminar dependency from 0.1.9 to 0.1.10, which pulls in lmnr-ai/lmnr-helm#24 — RabbitMQ self-protection (disk_free_limit.absolute, vm_memory_high_watermark.absolute, larger PVC) and Quickwit memory/PVC tuning. These address out-of-space and memory-limit errors observed in SaaS prod; SaaS staging and Replicated deployments were unaffected. Bumps openhands chart 0.7.11 -> 0.7.12. * PLTF-2504: pin laminar rabbitmq PVC size to 50Gi to unblock helm upgrade The 0.1.10 chart raises the laminar-rabbitmq volumeClaimTemplates storage default from 50Gi to 100Gi. K8s forbids changing volumeClaim- Templates on an existing StatefulSet, so `helm upgrade` from 0.7.11 → 0.7.12 fails with "spec: Forbidden" (observed on a Replicated install). Pinning the value in the parent chart keeps the rendered template identical to what's already deployed at 0.1.9, letting the upgrade go through while picking up every other 0.1.10 fix (RabbitMQ self- protection, console logging, Quickwit hardening, StorageClass allowVolumeExpansion). To grow the actual rabbitmq disk where the backing storage enforces size (e.g. GKE hyperdisk-balanced), patch the live PVC out of band. * PLTF-2504: upgrade laminar helm chart to 0.1.11 for configmap-checksum auto-rollout Bumps laminar from 0.1.10 to 0.1.11, which pulls in lmnr-ai/lmnr-helm#25. That PR adds `checksum/config` pod-template annotations to the rabbitmq StatefulSet and all five quickwit workloads (control-plane, indexer, searcher, metastore, janitor), so `helm upgrade` now rolls those pods automatically whenever their configmaps change. Resolves the operator step required by 0.1.10 in this PR: a manual `kubectl rollout restart sts laminar-rabbitmq` was needed after every helm upgrade that altered the rabbitmq configmap (subPath/init-copy mounts don't live-update from configmap changes). With 0.1.11 the admin-console-driven helm upgrade is sufficient end to end. The rabbitmq.persistence.size=50Gi pin from the prior commit stays — 0.1.11 does not revert the 0.1.10 PVC default bump to 100Gi, and StatefulSet volumeClaimTemplates remain immutable.
|
@OpenHands resolve conflicts |
|
@aivong-openhands your session has expired. Please login again at OpenHands Cloud and try again. |
The four REPLICATED_* tag patterns lived inside update_openhands_values, which only ever ran against charts/openhands/values.yaml — where the KOTS templating they target never appears. They no-op'd silently (error_if_missing=False), so cloud releases shipped the new agent-server tag in the chart values but left replicated/openhands.yaml on the old tag. The bump had to be applied by hand each release (most recently 765e4f2). - Extract the four replicated patterns into a dedicated update_replicated_openhands_values() targeted at REPLICATED_OPENHANDS_PATH with error_if_missing=True, so future pattern drift fails loudly. - Wire it into update_openhands_workflow between the chart-values update and the Chart.yaml bump. Chart bump decision still keys off chart-values changes only — the replicated wrapper isn't part of the helm artifact. - Allow optional comment lines between repository: and tag: in REPLICATED_PROXY_AGENT_SERVER_TAG_PATTERN; the real file documents the non-proxy fallback with a commented-out repository line, which broke the old \s*\n\s* gap. - Mock update_replicated_openhands_values in TestUpdateOpenhandsWorkflow fixtures — without this, dry_run=False workflow tests would write to the real replicated/openhands.yaml on disk. - Update sample fixture to mirror the production file's comment layout so the regex change is exercised end-to-end.
|
|
@OpenHands resolve conflicts |
|
I'm on it! aivong-openhands can track my progress at all-hands.dev |
…harts-for-agent-server-image # Conflicts: # charts/openhands/Chart.lock # charts/openhands/Chart.yaml
|
Merged
Merge commit: This comment was created by an AI agent (OpenHands) on behalf of the user. |
SummaryResolved the merge conflicts on PR #622 as requested. Checklist
Concise change setThe only new commit on the branch is the merge commit itself. No script, workflow, or chart contents were modified beyond what conflict resolution required — nothing extraneous to revert. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Well-architected refactoring with exceptional test coverage (191 tests, 96% mutation kill rate). The shift to reading runtime image tags from sandbox_spec_service.py as the authoritative source is cleaner and more maintainable than parsing workflow env vars.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟡 MEDIUM
Changes the source of truth for runtime image tags from workflow env vars to sandbox_spec_service.py, and removes openhands_runtime_image_tag from DeployConfig. The coupling to AGENT_SERVER_IMAGE constant format creates a fragile dependency, but this is acceptable internal coupling within the same organization. Risk is well-mitigated by comprehensive test coverage (191 tests including mutation testing), clear error messages when the pattern isn't found, and no external consumers of the removed field.
VERDICT:
✅ Worth merging: Solid engineering with excellent test discipline. The architectural decision to use sandbox_spec_service.py as the source of truth is pragmatic and reduces indirection.
KEY INSIGHT:
Reading configuration from its point of use (sandbox_spec_service.py) rather than copying it through workflow env vars eliminates a synchronization problem and makes the system more resilient to drift.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/OpenHands-Cloud/actions/runs/26205625289
Improve this review? If any feedback above seems incorrect or irrelevant to this repository, you can teach the reviewer to do better:
- Add a
.agents/skills/custom-codereview-guide.mdfile to your branch (or edit it if one already exists) with the/codereviewtrigger and the context the reviewer is missing (e.g., "Security concerns about X do not apply here because Y"). See the customization docs for the required frontmatter format.- Re-request a review - the reviewer reads guidelines from the PR branch, so your changes take effect immediately.
- When your PR is merged, the guideline file goes through normal code review by repository maintainers.
Resolve with AI? Install the iterate skill in your agent and run
/iterateto automatically drive this PR through CI, review, and QA until it's merge-ready.
|
Summary
update_openhands_charts.pyto read the runtime image tag fromsandbox_spec_service.pyin the OpenHands enterprise repo at the cloud release tag, instead of fromOPENHANDS_RUNTIME_IMAGE_TAGin the deploy workflow env varsopenhands_runtime_image_tagfromDeployConfig— the field is no longer neededRUNTIME_TAG_PATTERNandWARM_RUNTIMES_TAG_PATTERNto matchghcr.io/openhands/agent-server(charts switched fromopenhands/runtime)--skip-version-checkflag to continue updating image tags even when charts are already at the latest versionfix: use correct agent-server image tag format (X.Y.Z-python)— replace thecloud-X.Y.Z-nikolaikformat inherited from the old runtime convention with theX.Y.Z-pythonsuffix that agent-server images actually use; centralize the value in sharedRUNTIME_IMAGE_TAG/NEW_RUNTIME_IMAGE_TAGconstantsrefactor: unify error_if_missing handling in update_tag_in_content— adderror_if_missing: bool = Truetoupdate_tag_in_contentmirroring its siblingupdate_all_tags_in_content; replace two... if re.search(PATTERN, content) else contentternaries at the replicated-wrapper call sites with expliciterror_if_missing=False, removing double regex evaluation and restoring API symmetry between the two helpersCoverage gaps closed
TestUpdateRuntimeApiWorkflow+TestUpdateOpenhandsWorkflow— the two orchestration functions previously had no direct tests; new tests exercise their wiring contract (has_changespropagation from values into the chart bump, positional-argument pass-through,dry_runthreading, return values)TestResolveOpenhandsVersion(4 tests) andTestProcessUpdates(3 tests) covering guard-clause branches that were previously untestedTest design improvements
test_openhands_version_bump_when_has_changes,test_bump_runtime_api_version,test_returns_deploy_config_on_success, and a 7-assertion replicated-wrapper test were each split so a failure points at one concernis_unchangedchecks acrossTestUpdateValues,TestUpdateReplicatedOpenhandsWrapperValues, andTestUpdateRuntimeApiValuesso failure messages name the diverging key in the test IDcloud_tag_existsref-format check,main()output format check, andUpdateResultcount-property check (with explicit names rather than derived.rstrip("s") + "_count")test_no_redirect_message_in_output(which asserted on PyGithub's internal log strings) with a direct assertion on the github logger levelbump_patch_version(RUNTIME_API_CHART_FULL_VERSION)instead of hardcoding"0.1.21"so tests survive fixture baseline changesget_deploy_configmock response factories (make_http_error_response,make_json_error_response,make_missing_key_response,make_invalid_base64_response,make_invalid_yaml_response) intoconftest.py; use the sharedmake_workflow_responsefactory inTestGetDeployConfig; extract stub fixtures forresolve_openhands_versionand theprocess_updateschain; reusemock_main_early_exitinTestSkipVersionCheckTestAssertVersionBumped,TestGetDependencyVersion,TestGetChartValue— these tested conftest assertion helpers rather than production code; misuse surfaces immediately in real tests, and the helpers are still exercised?ref=substring assertion implied by exact-URL equality; compare list counts to pre-update values instead of literalsTest plan
--dry-run --skip-version-checkproduces no errors against current chartsget_runtime_image_tag_from_sandbox_spec— 23/24 killed, 1 equivalent mutant ("utf-8"vs"UTF-8")