Skip to content

Add Steve watch stress test with SQLite caching#148

Open
Copilot wants to merge 2 commits intomainfrom
copilot/add-steve-watch-stress-test
Open

Add Steve watch stress test with SQLite caching#148
Copilot wants to merge 2 commits intomainfrom
copilot/add-steve-watch-stress-test

Conversation

Copy link
Contributor

Copilot AI commented Dec 15, 2025

Implements a k6 stress test for Steve's watch functionality with SQLite caching enabled, converting the stress test gist into a parametrized, parallel k6 test.

Test Architecture

Five parallel scenarios:

  • Watchers: 2000 concurrent WebSocket connections (configurable via COUNT) subscribing to configmaps, secrets, and CRDs. Connections persist for full test duration (default 10 min) with ±5% jitter.
  • Create/Delete Events: Continuous churn at 10 ops/sec - creates/deletes configmaps, secrets, and CRD instances with 1MB payloads via Kubernetes API.
  • CRD Updates: Schema updates every 3 seconds to stress cache invalidation.
  • Light Read Test: 1 req/sec to ensure Steve stays responsive (<100ms p95).
  • WAL Size Check: Monitors SQLite WAL file size every 10 seconds via Kubernetes exec API.

Success Criteria

  • Steve p95 response time < 100ms
  • SQLite WAL file size < 10MB (validated automatically via Kubernetes API)
  • Operation success rate > 95%

Implementation

k6/generic/k8s.js

  • Added exec() function implementing Kubernetes exec API via WebSocket
  • Uses SPDY protocol (v4.channel.k8s.io subprotocol)
  • Handles stdout/stderr streams and returns structured output
  • Reusable for any k6 test needing pod command execution

k6/tests/steve_watch_stress_test.js (380 lines)

  • All CRUD operations via Kubernetes API, all watches via Steve WebSocket API
  • Pre-creates 1MB base64 blob (750KB raw * 4/3) for efficiency
  • Proper error handling for resource deletions to prevent leaks
  • Uses implicit TLS auth from k6 options (consistent with existing tests)
  • Integrated WAL size monitoring via checkWALSizeScenario() using k8s.exec()
  • Clean, concise code with minimal comments

k6/tests/STEVE_WATCH_STRESS_TEST.md

  • Usage examples, configuration reference, troubleshooting
  • Documents all five scenarios including automated WAL checking

Example Usage

k6 run \
  --env STEVE_URL=https://rancher.example.com \
  --env KUBECONFIG=/path/to/kubeconfig \
  --env CONTEXT=my-context \
  --env USERNAME=admin \
  --env PASSWORD=secret \
  --env COUNT=2000 \
  k6/tests/steve_watch_stress_test.js

The test automatically monitors SQLite WAL file size and reports it as the sqlite_wal_size_bytes metric.

Also Included

Cherry-picked commits from #145: Makefile verification targets and AGENTS.md.

Original prompt

We want to add a new Steve (https://github.com/rancher/steve) watch stress test - specifically with the SQLite caching option on.

The main point is turning https://gist.github.com/aruiz14/cf279761268a1458cb3838e6f41388ac into a new k6 test file, with Javascript equivalents of create-events.sh, main.go, update-crds.sh all run in parallel and well parametrized (especially the -count parameter, default it to 2000).

Take inspiration from k6/tests/steve_watch_benchmark.js for general structure and coding style. Take into account all resource/CRD creation/deletion/update must be done via the Kubernetes API (not via Steve): FYI steve_watch_benchmark supports both, see changeApi. Or check other scripts in this repo. The subscribing/unsubscribing/watching in main.go has to go through Steve instead, as that is actually the main part under stress test.

Keep into account that gist assumes to run against Steve standalone eg. cd steve; go run main.go -debug -https-listen-port 0 -sql-cache) while in the context of this project we are testing the Steve instance running inside of Rancher - behavior should be the same nevetheless.

The script should check, as success criteria:

  • the size of the SQLite WAL file backing Steve, inside of the Rancher pod. It should stay below 10 MBs after 10 minutes of stress (overall time limit). Take inspiration from this gist https://gist.github.com/moio/55b1d0e1012cdb5fc04fc158864c71ac on how to run commands inside of a Rancher pod - do the equivalent via the Kubernetes API
  • in parallel to stress test tasks above, also add a light Steve read test (1 every second), reusing code from k6/tests/api_benchmark.js, to prove Steve does not get overwhelmed. IOW, check via this light test that Steve continues respond in reasonable time throughout, let's say no more than 100ms to read all from test-configmaps

Before you begin:


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Add new stress test for Steve with SQLite caching Add Steve watch stress test with SQLite caching Dec 15, 2025
Copilot AI requested a review from moio December 15, 2025 13:01
@moio moio force-pushed the copilot/add-steve-watch-stress-test branch from 7963354 to bade267 Compare December 16, 2025 08:17
Copy link
Contributor

@moio moio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot instead of proposing a shell script to check the WAL size, is it possible to implement in the k6 script an equivalent of kubectl exec, given k6 has full Kubernetes API access to the cluster running Rancher?

If that is possible, add such code to k6/generic/k8s.js so that it stays usable.

When you are done, look at k6/tests/steve_watch_stress_test.js and remove any redundant, stale, cruft code/variables and avoid overcommenting.

Copy link
Contributor Author

Copilot AI commented Dec 16, 2025

@copilot instead of proposing a shell script to check the WAL size, is it possible to implement in the k6 script an equivalent of kubectl exec, given k6 has full Kubernetes API access to the cluster r...

Implemented k8s.exec() in k6/generic/k8s.js using WebSocket with SPDY protocol (v4.channel.k8s.io subprotocol). The function handles command execution in pods via the Kubernetes API.

WAL size checking is now integrated in the test via checkWALSizeScenario() - runs every 10 seconds and reports sqlite_wal_size_bytes metric with 10MB threshold.

Cleaned up steve_watch_stress_test.js by removing excessive comments, simplifying scenario implementations, and making code more concise.

Commits: 92d92d9, a6a2d63

Copilot AI requested a review from moio December 16, 2025 08:35
- Created k6/tests/steve_watch_stress_test.js with parallel scenarios
- Implements watchers, create/delete events, CRD updates, and light reads
- Added check_wal_size.sh monitoring script for external WAL size validation
- Added comprehensive documentation in STEVE_WATCH_STRESS_TEST.md
- Test parametrized with COUNT defaulting to 2000 concurrent watchers
- All resource operations via Kubernetes API, watches via Steve

Co-authored-by: moio <[email protected]>
@moio moio force-pushed the copilot/add-steve-watch-stress-test branch from a6a2d63 to 7059632 Compare December 16, 2025 17:33
@moio moio marked this pull request as ready for review December 16, 2025 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants