Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .claude/skills/kagent-dev/references/database-migrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ Files must follow `NNNNNN_description.up.sql` / `NNNNNN_description.down.sql` wi

Every `.up.sql` must have a corresponding `.down.sql` that exactly reverses it. Down migrations are used for rollbacks and by automatic rollback on migration failure. They must be **idempotent** — the two-track rollback logic (roll back core if vector fails) may call them more than once in failure scenarios.

A down file that never runs is a down file you cannot trust. There are no up-only migrations — a working down has shipped with every migration since the golang-migrate adoption. Exercising every migration up → down → up against the real migration set, to prove the reversal rather than assume it, is a *Target — not yet enforced* (see [Upgrade and rollback testing](#upgrade-and-rollback-testing)).
A down file that never runs is a down file you cannot trust. There are no up-only migrations — a working down has shipped with every migration since the golang-migrate adoption. The reversal is proven, not assumed: the upgrade round-trip applies `HEAD`'s migrations over a prior release and then reverses them back, asserting the reverted schema matches a clean install of that release and that seeded data survives (see [Upgrade and rollback testing](#upgrade-and-rollback-testing)).

## One Linear History

Expand Down Expand Up @@ -205,11 +205,11 @@ These tests catch policy violations at PR time without needing a running databas

## Upgrade and rollback testing

Static analysis covers file *content*; round-trip tests cover *behavior* against a real Postgres. Beyond `runner_test.go` (rollback and concurrency), two release-to-release tests make the rollback promise real. Both are *Target — not yet enforced*.
Static analysis covers file *content*; round-trip tests cover *behavior* against a real Postgres. Beyond `runner_test.go` (rollback and concurrency), release-to-release tests make the rollback promise real.

**Previous-minor round-trip.** Seed a database at the previous minor's latest release with representative data, apply migrations up to `HEAD`, and assert the schema matches a clean `HEAD` install and the data survives; then reverse to the previous minor and assert the schema matches a clean previous-minor install and the data survives. This exercises every changed down file rather than only reviewing it.
**Previous-release round-trip** (enforced by `TestUpgrade`, run by the `upgrade-tests` CI job). Seed a database at a prior release with representative data, apply migrations up to `HEAD`, and assert the controller rolls out without crashing, the schema matches a clean `HEAD` install, and the data survives; then reverse the migrations back to the prior release and assert the schema matches a clean install of that release and the data survives. It runs against two prior versions — the latest release reachable from `HEAD` and the previous stable line's latest patch (the `release/vX.Y.x` tip) — and `TestRollingUpgradeCompatibility` (the `rolling-upgrade-tests` job) additionally exercises the old-code/new-schema window, with the prior release's controller serving while `HEAD`'s migrations are applied.

**Query-level backward compatibility.** Run the previous minor's database test suite against a `HEAD`-migrated schema, proving old code's queries run against the newer schema — the exact property [ahead-schema tolerance](#rollback-and-ahead-schema-tolerance) relies on.
**Query-level backward compatibility** (*Target — not yet enforced*). Run the previous minor's database test suite against a `HEAD`-migrated schema, proving old code's queries run against the newer schema — the exact property [ahead-schema tolerance](#rollback-and-ahead-schema-tolerance) relies on.

## Downstream Extension Model

Expand Down
94 changes: 94 additions & 0 deletions .github/actions/upgrade-test-setup/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
name: Upgrade Test Setup

description: >-
Shared prelude for the upgrade-tests and rolling-upgrade-tests jobs: resolve
the upgrade-from version (with the prev-stable == adjacent skip) and bring up
the build/cluster toolchain. Exposes the resolved version and skip flag so the
caller can gate its test step. The caller MUST run actions/checkout (with
fetch-depth: 0 + fetch-tags) before this action — a local action is loaded
from the checked-out workspace and the version resolvers need full history.

inputs:
upgrade-from:
description: 'Which release to upgrade from: "adjacent" or "prev-stable".'
required: true
buildx-builder-name:
description: Name for the docker/setup-buildx-action builder.
required: true
buildx-version:
description: Buildx version for docker/setup-buildx-action.
required: true

outputs:
skip:
description: '"true" when this leg is redundant (prev-stable == adjacent) and the caller should skip its test step.'
value: ${{ steps.resolve.outputs.skip }}
version:
description: The resolved upgrade-from version (empty when skip is true).
value: ${{ steps.resolve.outputs.version }}

runs:
using: "composite"
steps:
# The caller must run actions/checkout (fetch-depth: 0 + fetch-tags) before
# this action: a local action is loaded from the checked-out workspace, so
# its files don't exist on the runner until checkout has run.
- name: Resolve upgrade-from version
id: resolve
shell: bash
run: |
ADJ="$(./scripts/upgrade-from-version.sh)"
if [ "${{ inputs.upgrade-from }}" = "prev-stable" ]; then
V="$(./scripts/prev-stable-version.sh)"
if [ -z "$V" ]; then
echo "no stable line below the current line; skipping prev-stable leg."
echo "skip=true" >> "$GITHUB_OUTPUT"
exit 0
fi
if [ "$V" = "$ADJ" ]; then
echo "prev-stable ($V) == adjacent; skipping (covered by the adjacent leg)."
echo "skip=true" >> "$GITHUB_OUTPUT"
exit 0
fi
else
V="$ADJ"
fi
echo "Upgrade-from target: $V"
echo "version=$V" >> "$GITHUB_OUTPUT"
- name: Initialize Environment
if: steps.resolve.outputs.skip != 'true'
uses: ./.github/actions/initialize-environment
- name: Allow unprivileged user namespaces
if: steps.resolve.outputs.skip != 'true'
shell: bash
run: |
sudo sysctl -w kernel.apparmor_restrict_unprivileged_userns=0 || true
- name: Set up QEMU
if: steps.resolve.outputs.skip != 'true'
uses: docker/setup-qemu-action@v4
with:
platforms: linux/amd64,linux/arm64
- name: Set up Docker Buildx
if: steps.resolve.outputs.skip != 'true'
uses: docker/setup-buildx-action@v4
with:
name: ${{ inputs.buildx-builder-name }}
version: ${{ inputs.buildx-version }}
platforms: linux/amd64,linux/arm64
use: "true"
driver-opts: network=host
- name: Set up Helm
if: steps.resolve.outputs.skip != 'true'
uses: azure/setup-helm@v5.0.0
with:
version: v3.18.0
- name: Install Kind
if: steps.resolve.outputs.skip != 'true'
uses: helm/kind-action@ef37e7f390d99f746eb8b610417061a60e82a6cc
with:
install_only: true
- name: Create Kind cluster
if: steps.resolve.outputs.skip != 'true'
shell: bash
run: |
make create-kind-cluster
96 changes: 96 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,102 @@ jobs:
echo "::error::Kubectl logs -n kagent deployment/kagent-controller"
kubectl logs -n kagent deployment/kagent-controller

upgrade-tests:
needs:
- setup
env:
VERSION: v0.0.1-test
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
# adjacent: the latest release reachable from HEAD (upgrade-from-version.sh)
# prev-stable: the previous stable line's latest patch (prev-stable-version.sh)
upgrade-from: [adjacent, prev-stable]
steps:
- name: Checkout repository
uses: actions/checkout@v6
with:
# Full history + tags so the version resolvers can derive the
# upgrade-from release, and so the local action below is on disk.
fetch-depth: 0
fetch-tags: true
- name: Prepare upgrade test environment
id: prep
uses: ./.github/actions/upgrade-test-setup
with:
upgrade-from: ${{ matrix.upgrade-from }}
buildx-builder-name: ${{ env.BUILDX_BUILDER_NAME }}
buildx-version: ${{ env.BUILDX_VERSION }}
- name: Run upgrade tests
if: steps.prep.outputs.skip != 'true'
env:
OPENAI_API_KEY: fake
BUILDX_BUILDER_NAME: ${{ env.BUILDX_BUILDER_NAME }}
KAGENT_HELM_EXTRA_ARGS: --cleanup-on-fail=false
DOCKER_BUILD_ARGS: >-
--cache-from=type=gha,scope=${{ needs.setup.outputs.cache-key }}-e2e
--cache-from=type=gha,scope=${{ env.CACHE_KEY_PREFIX }}-main-e2e
--platform=linux/amd64
--push
run: |
make run-upgrade-tests UPGRADE_FROM_VERSION="${{ steps.prep.outputs.version }}"
- name: fail print info
if: failure() && steps.prep.outputs.skip != 'true'
run: |
echo "::error::Failed to run upgrade tests"
kubectl describe pods -n kagent
kubectl get events -n kagent
kubectl logs -n kagent deployment/kagent-controller || true

rolling-upgrade-tests:
needs:
- setup
env:
VERSION: v0.0.1-test
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
# adjacent: the latest release reachable from HEAD (upgrade-from-version.sh)
# prev-stable: the previous stable line's latest patch (prev-stable-version.sh)
upgrade-from: [adjacent, prev-stable]
steps:
- name: Checkout repository
uses: actions/checkout@v6
with:
# Full history + tags so the version resolvers can derive the
# upgrade-from release, and so the local action below is on disk.
fetch-depth: 0
fetch-tags: true
- name: Prepare upgrade test environment
id: prep
uses: ./.github/actions/upgrade-test-setup
with:
upgrade-from: ${{ matrix.upgrade-from }}
buildx-builder-name: ${{ env.BUILDX_BUILDER_NAME }}
buildx-version: ${{ env.BUILDX_VERSION }}
- name: Run rolling upgrade tests
if: steps.prep.outputs.skip != 'true'
env:
OPENAI_API_KEY: fake
BUILDX_BUILDER_NAME: ${{ env.BUILDX_BUILDER_NAME }}
KAGENT_HELM_EXTRA_ARGS: --cleanup-on-fail=false
DOCKER_BUILD_ARGS: >-
--cache-from=type=gha,scope=${{ needs.setup.outputs.cache-key }}-e2e
--cache-from=type=gha,scope=${{ env.CACHE_KEY_PREFIX }}-main-e2e
--platform=linux/amd64
--push
run: |
make run-rolling-upgrade-tests UPGRADE_FROM_VERSION="${{ steps.prep.outputs.version }}"
- name: fail print info
if: failure() && steps.prep.outputs.skip != 'true'
run: |
echo "::error::Failed to run rolling upgrade tests"
kubectl describe pods -n kagent
kubectl get events -n kagent
kubectl logs -n kagent deployment/kagent-controller || true

go-unit-tests:
runs-on: ubuntu-latest
steps:
Expand Down
91 changes: 91 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,97 @@ helm-uninstall: ## Uninstall kagent and kagent-crds Helm releases from the kind
helm uninstall kagent --namespace kagent --kube-context kind-$(KIND_CLUSTER_NAME) --wait
helm uninstall kagent-crds --namespace kagent --kube-context kind-$(KIND_CLUSTER_NAME) --wait

# Upgrade test targets install the previous released kagent chart from the public
# OCI registry, build the current images, then run the e2e assertions in
# go/core/test/e2e/upgrade. The Go test performs the actual upgrade to the current
# build by invoking `make helm-install-provider`. UPGRADE_FROM_VERSION defaults to
# the latest release reachable from HEAD (scripts/upgrade-from-version.sh); CI runs
# this against two targets via a matrix — that adjacent release and the previous
# stable line's latest patch (scripts/prev-stable-version.sh) — and you can pin
# either locally, e.g. `UPGRADE_FROM_VERSION=$$(./scripts/prev-stable-version.sh)`.
# The previous install pins the bundled Postgres image to whatever the
# upgrade-from release's own install target shipped (see PREV_DB_SET_FLAGS), so
# the baseline matches how that release actually runs rather than a hardcoded
# guess; the upgrade then exercises the real app/migration (and any DB image)
# change between that release and the current build.
#
# Prerequisite (provided by CI as a separate step; run it locally first): a kind
# cluster (make create-kind-cluster). agent-sandbox is not required — the
# controller tolerates the missing CRD and these tests create no SandboxAgents.
UPGRADE_FROM_VERSION ?= $(shell ./scripts/upgrade-from-version.sh)

# The bundled-Postgres image is selected by the install target's --set flags, not
# by the chart defaults (the chart ships a non-vector image). So the previous
# install must use the exact pins the upgrade-from release shipped — otherwise the
# baseline DB would differ from how that release actually runs, and the upgrade
# would conflate a DB swap with the migration change under test. Read those flags
# straight from that release's own helm-install-provider target (via its tagged
# Makefile) instead of hardcoding values that drift as the bundled image changes.
# Assumes the flags are literal (no make/env variables); the guard in
# install-previous-release fails loudly if they can't be read.
PREV_DB_SET_FLAGS = $(shell git show v$(UPGRADE_FROM_VERSION):Makefile 2>/dev/null | \
grep -oE '\-\-set[[:space:]]+database\.postgres\.[^[:space:]\\]+')

.PHONY: install-previous-release
install-previous-release: ## Install the previous released kagent + kagent-crds charts from the public OCI registry
test -n "$(UPGRADE_FROM_VERSION)" || { echo "UPGRADE_FROM_VERSION is empty; set it explicitly or ensure git tags are fetched." >&2; exit 1; }
test -n "$(strip $(PREV_DB_SET_FLAGS))" || { echo "Could not read bundled-Postgres --set flags from v$(UPGRADE_FROM_VERSION):Makefile; the upgrade-from release's install target may have moved or renamed them." >&2; exit 1; }
@echo "=== Installing previous release: $(UPGRADE_FROM_VERSION) ==="
@echo " bundled-Postgres flags (from v$(UPGRADE_FROM_VERSION) install target): $(PREV_DB_SET_FLAGS)"
helm upgrade --install kagent-crds $(HELM_REPO)/kagent/helm/kagent-crds \
--version $(UPGRADE_FROM_VERSION) \
--namespace kagent --create-namespace \
--kube-context kind-$(KIND_CLUSTER_NAME) \
--timeout 5m --wait
helm upgrade --install kagent $(HELM_REPO)/kagent/helm/kagent \
--version $(UPGRADE_FROM_VERSION) \
--namespace kagent --create-namespace \
--kube-context kind-$(KIND_CLUSTER_NAME) \
--timeout 5m --wait \
--set ui.service.type=LoadBalancer \
--set controller.service.type=LoadBalancer \
--set providers.default=openAI \
--set providers.openAI.apiKey="$${OPENAI_API_KEY:-test}" \
$(PREV_DB_SET_FLAGS) \
$(UPGRADE_PREV_EXTRA_ARGS)

# run-upgrade-tests installs the previous release, builds the current images, and
# runs the DB-layer upgrade scenario in TestUpgrade: seed -> upgrade -> controller
# rollout (no crash) -> data survival -> schema-equivalence (upgraded == clean
# install) -> reverse schema to target (down files) + data survival.
# Prerequisite (provided by CI as a separate step; run it locally first): a kind
# cluster (make create-kind-cluster). The controller tolerates the missing
# agent-sandbox CRD (the owned-resource watch is skipped), and these tests create
# no SandboxAgents, so agent-sandbox is not required.
.PHONY: run-upgrade-tests
run-upgrade-tests: ## Install the previous release, build current images, and run the upgrade test (migration round-trip)
test -n "$(UPGRADE_FROM_VERSION)" || { echo "UPGRADE_FROM_VERSION is empty; set it explicitly or ensure git tags are fetched." >&2; exit 1; }
$(MAKE) build
$(MAKE) install-previous-release
@echo "=== Upgrade test: $(UPGRADE_FROM_VERSION) -> $(VERSION) (registry=$(DOCKER_REGISTRY)) ==="
cd go && \
RUN_UPGRADE_TESTS=true \
UPGRADE_FROM_VERSION=$(UPGRADE_FROM_VERSION) \
VERSION=$(VERSION) \
DOCKER_REGISTRY=$(DOCKER_REGISTRY) \
KIND_CLUSTER_NAME=$(KIND_CLUSTER_NAME) \
OPENAI_API_KEY="$${OPENAI_API_KEY:-test}" \
go test ./core/test/e2e/upgrade -run TestUpgrade -count=1 -timeout=20m -v

.PHONY: run-rolling-upgrade-tests
run-rolling-upgrade-tests: ## Install the previous release with 2 controller replicas, build the current images, and run the rolling upgrade e2e test
$(MAKE) build
$(MAKE) install-previous-release UPGRADE_PREV_EXTRA_ARGS="--set controller.replicas=2"
@echo "=== Rolling upgrade test: $(UPGRADE_FROM_VERSION) -> $(VERSION) (registry=$(DOCKER_REGISTRY)) ==="
cd go && \
RUN_ROLLING_UPGRADE_TESTS=true \
UPGRADE_FROM_VERSION=$(UPGRADE_FROM_VERSION) \
VERSION=$(VERSION) \
DOCKER_REGISTRY=$(DOCKER_REGISTRY) \
KIND_CLUSTER_NAME=$(KIND_CLUSTER_NAME) \
OPENAI_API_KEY="$${OPENAI_API_KEY:-test}" \
go test ./core/test/e2e/upgrade -run TestRollingUpgradeCompatibility -count=1 -timeout=20m -v

.PHONY: helm-publish
helm-publish: ## Package and push all Helm charts to the OCI registry
helm-publish: helm-version
Expand Down
Loading
Loading