Add Steve watch stress test with SQLite caching by Copilot · Pull Request #148 · rancher/dartboard

Copilot · 2025-12-15T12:34:47Z

Implements a k6 stress test for Steve's watch functionality with SQLite caching enabled, converting the stress test gist into a parametrized, parallel k6 test.

Test Architecture

Five parallel scenarios:

Watchers: 2000 concurrent WebSocket connections (configurable via COUNT) subscribing to configmaps, secrets, and CRDs. Connections persist for full test duration (default 10 min) with ±5% jitter.
Create/Delete Events: Continuous churn at 10 ops/sec - creates/deletes configmaps, secrets, and CRD instances with 1MB payloads via Kubernetes API.
CRD Updates: Schema updates every 3 seconds to stress cache invalidation.
Light Read Test: 1 req/sec to ensure Steve stays responsive (<100ms p95).
WAL Size Check: Monitors SQLite WAL file size every 10 seconds via Kubernetes exec API.

Success Criteria

Steve p95 response time < 100ms
SQLite WAL file size < 10MB (validated automatically via Kubernetes API)
Operation success rate > 95%

Implementation

k6/generic/k8s.js

Added exec() function implementing Kubernetes exec API via WebSocket
Uses SPDY protocol (v4.channel.k8s.io subprotocol)
Handles stdout/stderr streams and returns structured output
Reusable for any k6 test needing pod command execution

k6/tests/steve_watch_stress_test.js (380 lines)

All CRUD operations via Kubernetes API, all watches via Steve WebSocket API
Pre-creates 1MB base64 blob (750KB raw * 4/3) for efficiency
Proper error handling for resource deletions to prevent leaks
Uses implicit TLS auth from k6 options (consistent with existing tests)
Integrated WAL size monitoring via checkWALSizeScenario() using k8s.exec()
Clean, concise code with minimal comments

k6/tests/STEVE_WATCH_STRESS_TEST.md

Usage examples, configuration reference, troubleshooting
Documents all five scenarios including automated WAL checking

Example Usage

k6 run \
  --env STEVE_URL=https://rancher.example.com \
  --env KUBECONFIG=/path/to/kubeconfig \
  --env CONTEXT=my-context \
  --env USERNAME=admin \
  --env PASSWORD=secret \
  --env COUNT=2000 \
  k6/tests/steve_watch_stress_test.js

The test automatically monitors SQLite WAL file size and reports it as the sqlite_wal_size_bytes metric.

Also Included

Cherry-picked commits from #145: Makefile verification targets and AGENTS.md.

Original prompt

We want to add a new Steve (https://github.com/rancher/steve) watch stress test - specifically with the SQLite caching option on.

The main point is turning https://gist.github.com/aruiz14/cf279761268a1458cb3838e6f41388ac into a new k6 test file, with Javascript equivalents of create-events.sh, main.go, update-crds.sh all run in parallel and well parametrized (especially the -count parameter, default it to 2000).

Take inspiration from k6/tests/steve_watch_benchmark.js for general structure and coding style. Take into account all resource/CRD creation/deletion/update must be done via the Kubernetes API (not via Steve): FYI steve_watch_benchmark supports both, see changeApi. Or check other scripts in this repo. The subscribing/unsubscribing/watching in main.go has to go through Steve instead, as that is actually the main part under stress test.

Keep into account that gist assumes to run against Steve standalone eg. cd steve; go run main.go -debug -https-listen-port 0 -sql-cache) while in the context of this project we are testing the Steve instance running inside of Rancher - behavior should be the same nevetheless.

The script should check, as success criteria:

the size of the SQLite WAL file backing Steve, inside of the Rancher pod. It should stay below 10 MBs after 10 minutes of stress (overall time limit). Take inspiration from this gist https://gist.github.com/moio/55b1d0e1012cdb5fc04fc158864c71ac on how to run commands inside of a Rancher pod - do the equivalent via the Kubernetes API

in parallel to stress test tasks above, also add a light Steve read test (1 every second), reusing code from k6/tests/api_benchmark.js, to prove Steve does not get overwhelmed. IOW, check via this light test that Steve continues respond in reasonable time throughout, let's say no more than 100ms to read all from test-configmaps

Before you begin:

cherry-pick commits from Refactor CI: Makefile-driven workflows with unified developer/CI/agent instructions #145, which will be merged before this one:

after that, re-read AGENTS.md and follow its contents

ask if you have any questions

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: moio <[email protected]>

moio

@copilot instead of proposing a shell script to check the WAL size, is it possible to implement in the k6 script an equivalent of kubectl exec, given k6 has full Kubernetes API access to the cluster running Rancher?

If that is possible, add such code to k6/generic/k8s.js so that it stays usable.

When you are done, look at k6/tests/steve_watch_stress_test.js and remove any redundant, stale, cruft code/variables and avoid overcommenting.

Copilot · 2025-12-16T08:33:45Z

@copilot instead of proposing a shell script to check the WAL size, is it possible to implement in the k6 script an equivalent of kubectl exec, given k6 has full Kubernetes API access to the cluster r...

Implemented k8s.exec() in k6/generic/k8s.js using WebSocket with SPDY protocol (v4.channel.k8s.io subprotocol). The function handles command execution in pods via the Kubernetes API.

WAL size checking is now integrated in the test via checkWALSizeScenario() - runs every 10 seconds and reports sqlite_wal_size_bytes metric with 10MB threshold.

Cleaned up steve_watch_stress_test.js by removing excessive comments, simplifying scenario implementations, and making code more concise.

Commits: 92d92d9, a6a2d63

- Created k6/tests/steve_watch_stress_test.js with parallel scenarios - Implements watchers, create/delete events, CRD updates, and light reads - Added check_wal_size.sh monitoring script for external WAL size validation - Added comprehensive documentation in STEVE_WATCH_STRESS_TEST.md - Test parametrized with COUNT defaulting to 2000 concurrent watchers - All resource operations via Kubernetes API, watches via Steve Co-authored-by: moio <[email protected]>

Copilot AI assigned Copilot and moio Dec 15, 2025

Copilot started work on behalf of moio December 15, 2025 12:35 View session

Copilot AI changed the title ~~[WIP] Add new stress test for Steve with SQLite caching~~ Add Steve watch stress test with SQLite caching Dec 15, 2025

Copilot AI requested a review from moio December 15, 2025 13:01

Copilot finished work on behalf of moio December 15, 2025 13:01

Go build and lint: fix up previous situation

5db7886

Co-authored-by: moio <[email protected]>

moio force-pushed the copilot/add-steve-watch-stress-test branch from 7963354 to bade267 Compare December 16, 2025 08:17

moio requested changes Dec 16, 2025

View reviewed changes

Copilot started work on behalf of moio December 16, 2025 08:26 View session

Copilot AI requested a review from moio December 16, 2025 08:35

Copilot finished work on behalf of moio December 16, 2025 08:35

moio force-pushed the copilot/add-steve-watch-stress-test branch from a6a2d63 to 7059632 Compare December 16, 2025 17:33

moio marked this pull request as ready for review December 16, 2025 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Steve watch stress test with SQLite caching#148

Add Steve watch stress test with SQLite caching#148
Copilot wants to merge 2 commits intomainfrom
copilot/add-steve-watch-stress-test

Copilot AI commented Dec 15, 2025 •

edited

Loading

Uh oh!

moio left a comment

Uh oh!

Copilot AI commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Architecture

Success Criteria

Implementation

Example Usage

Also Included

Uh oh!

moio left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 15, 2025 •

edited

Loading