From 07a63eb339f02acb75d55dacb5bd3182180837e1 Mon Sep 17 00:00:00 2001 From: Bilal Mutahir <170749017+bong000@users.noreply.github.com> Date: Sat, 30 May 2026 19:30:03 +0300 Subject: [PATCH] Add AI regression triage article Signed-off-by: Bilal Mutahir <170749017+bong000@users.noreply.github.com> --- ...0260530_ai_regression_triage_in_daytona.md | 264 ++++++++++++++++++ ..._regression_triage_in_daytona_workflow.svg | 57 ++++ authors/bilal_mutahir.md | 13 + ...0260530_definition_ai_regression_triage.md | 23 ++ 4 files changed, 357 insertions(+) create mode 100644 articles/20260530_ai_regression_triage_in_daytona.md create mode 100644 articles/assets/20260530_ai_regression_triage_in_daytona_workflow.svg create mode 100644 authors/bilal_mutahir.md create mode 100644 definitions/20260530_definition_ai_regression_triage.md diff --git a/articles/20260530_ai_regression_triage_in_daytona.md b/articles/20260530_ai_regression_triage_in_daytona.md new file mode 100644 index 00000000..e93e8ecf --- /dev/null +++ b/articles/20260530_ai_regression_triage_in_daytona.md @@ -0,0 +1,264 @@ +--- +title: 'AI Regression Triage in Daytona' +description: 'Run Omni Engineer and Claude Engineer in Daytona workspaces to reproduce, patch, and review regressions without leaking secrets.' +date: 2026-05-30 +author: 'Bilal Mutahir' +tags: ['Daytona', 'Dev Containers', 'AI Engineering'] +--- + +# AI Regression Triage in Daytona + +When a regression appears in a repository, the slowest part is often not the +patch. It is rebuilding the same environment twice, collecting enough context +for a reviewer, and proving that the fix was not made against a machine-specific +setup. A [Daytona workspace](/definitions/20240819_definition_daytona workspace.md) +is useful here because every agent starts from the same repository, the same +[development container](/definitions/20240819_definition_development container.md), +and the same environment-variable contract. + +This article shows a practical workflow for running +[Omni Engineer](https://github.com/Doriandarko/omni-engineer) and +[Claude Engineer](https://github.com/Doriandarko/claude-engineer) inside Daytona +as two separate AI engineering workspaces. Omni Engineer is used for first-pass +regression triage: reproduce the failure, find the likely file, and sketch the +patch. Claude Engineer is used as a second-pass reviewer: check the patch plan, +look for missing tests, and write a short release note. The goal is not to let +two agents make uncontrolled changes. The goal is to keep the investigation +reproducible, reviewable, and easy to reset. + +The companion Dev Container contributions for this workflow are: + +- [Omni Engineer Dev Container PR](https://github.com/Doriandarko/omni-engineer/pull/42) +- [Claude Engineer Dev Container PR](https://github.com/Doriandarko/claude-engineer/pull/266) + +![AI regression triage workflow](assets/20260530_ai_regression_triage_in_daytona_workflow.svg) + +## TL;DR + +- Use Daytona to create isolated Omni Engineer and Claude Engineer workspaces. +- Pass `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, and optional `E2B_API_KEY` + through environment variables, not committed files. +- Ask Omni Engineer to reproduce and map the regression before patching. +- Ask Claude Engineer to review the proposed patch and test plan from a clean + workspace. +- Commit only human-reviewed code, validation commands, and concise notes. + +## What the Dev Containers Add + +Both upstream repositories are Python projects. The Dev Container files add the +same basic contract to each project: + +- use the official Python 3.11 Dev Containers image; +- install dependencies from the repository's `requirements.txt`; +- make Git available in the workspace; +- expose only environment-variable names, never secret values; +- give the developer a short attach message with the command to run. + +For Omni Engineer, the Dev Container passes `OPENROUTER_API_KEY` from the local +environment: + +```json +"containerEnv": { + "OPENROUTER_API_KEY": "${localEnv:OPENROUTER_API_KEY}" +} +``` + +For Claude Engineer, the Dev Container passes `ANTHROPIC_API_KEY` and the +optional `E2B_API_KEY`, and forwards port `5000` for the Flask web interface: + +```json +"containerEnv": { + "ANTHROPIC_API_KEY": "${localEnv:ANTHROPIC_API_KEY}", + "E2B_API_KEY": "${localEnv:E2B_API_KEY}" +} +``` + +This keeps the workspace reproducible without putting API keys in +`.env.example`, commit history, screenshots, or prompt transcripts. + +## Set the Required Secrets in Daytona + +Start by storing the keys as Daytona environment variables. Use the values from +your provider dashboards. + +```bash +daytona env set OPENROUTER_API_KEY=sk-or-your-openrouter-key +daytona env set ANTHROPIC_API_KEY=sk-ant-your-anthropic-key +daytona env set E2B_API_KEY=your-optional-e2b-key +``` + +Verify the variables are available before creating workspaces: + +```bash +daytona env list +``` + +If `E2B_API_KEY` is not part of your workflow, leave it empty. Claude Engineer +can still run its local CLI and web interface without committing an E2B key. + +## Create the Omni Engineer Workspace + +After the Dev Container PR is merged, create a Daytona workspace from Omni +Engineer: + +```bash +daytona create https://github.com/Doriandarko/omni-engineer +``` + +The Dev Container installs the Python packages from `requirements.txt`. When the +workspace opens, run: + +```bash +python main.py +``` + +Use Omni Engineer for the first pass. For a regression triage workflow, give it +a narrow job: + +```text +We are investigating a failing test in a small Python package. +First, inspect the repository structure. Then identify the failing command, +run only the focused test, and explain the smallest patch you would try. +Do not edit files until the failure is reproduced. +``` + +This instruction keeps the first workspace focused on evidence gathering. Ask +for file paths, failing commands, and the suspected behavior change. If Omni +Engineer proposes a patch before reproducing the failure, stop it and ask for +the failing command first. + +## Create the Claude Engineer Review Workspace + +Create a second Daytona workspace from Claude Engineer: + +```bash +daytona create https://github.com/Doriandarko/claude-engineer +``` + +Run the CLI: + +```bash +python ce3.py +``` + +Or run the web interface: + +```bash +python app.py +``` + +Daytona will forward port `5000`, so the browser UI is available without manual +port wiring. Use this workspace for review, not duplicate patching. Paste a +short summary from the Omni workspace: + +```text +Review this regression plan: +- failing command: pytest tests/test_parser.py::test_quotes +- suspected file: parser/tokenizer.py +- proposed patch: preserve escaped quotes before splitting tokens +- proposed tests: add escaped single and double quote cases + +Look for missing edge cases and suggest a minimal validation checklist. +``` + +This second-pass review is valuable because it starts from a clean environment. +It is less likely to inherit a half-edited workspace state, local caches, or +untracked files from the triage run. + +## Example: Patch a Parser Regression + +Use a small, realistic target project for the actual fix. A parser regression +works well because it has a clear failing input and focused tests. + +Clone the target repository in both workspaces: + +```bash +git clone https://github.com/example/parser-demo.git +cd parser-demo +python -m venv .venv +. .venv/bin/activate +pip install -e ".[test]" +``` + +In Omni Engineer, reproduce the failure: + +```bash +pytest tests/test_parser.py::test_escaped_quotes -q +``` + +Ask Omni Engineer to explain the smallest safe patch. After reviewing the plan, +make the code change manually or let the agent propose a diff that you inspect +line by line. Then run: + +```bash +pytest tests/test_parser.py::test_escaped_quotes -q +pytest tests/test_parser.py -q +git diff --check +``` + +In Claude Engineer, review the final diff and ask for a release note: + +```text +Review this diff for a parser regression fix. Check whether the tests cover +escaped single quotes, escaped double quotes, and normal quoted strings. Then +write a two-sentence release note that does not mention implementation details. +``` + +The outcome should be a small patch, a focused regression test, and a concise +review note that can be pasted into the pull request. + +## Keep Agent Work Reviewable + +AI engineering work is easier to trust when the repository history stays boring. +Use these guardrails: + +| Step | What to Keep | What to Avoid | +| --- | --- | --- | +| Environment | Dev Container config and environment variable names | API keys, `.env` files, private prompts | +| Triage | Failing command, focused file paths, suspected cause | Broad rewrites before reproducing the bug | +| Patch | Minimal diff plus regression test | Formatting churn and unrelated refactors | +| Review | Validation commands and reviewer notes | Raw chat transcripts or tool logs | + +If an agent suggests touching files outside the failure path, ask it to justify +the change. If it cannot connect the change to a failing test or acceptance +criterion, leave it out of the pull request. + +## Troubleshooting + +If Omni Engineer starts without a model response, check that +`OPENROUTER_API_KEY` is present in the Daytona workspace. If Claude Engineer +raises a missing API key error, check `ANTHROPIC_API_KEY`. + +If the Claude Engineer web UI is not reachable, confirm that `python app.py` is +running and that Daytona forwarded port `5000`. If package installation fails, +open the Dev Container logs and rerun: + +```bash +python -m pip install --upgrade pip +pip install -r requirements.txt +``` + +If a workspace becomes noisy after several agent runs, create a fresh Daytona +workspace from the same repository. That is the main advantage of this setup: +resetting the investigation is cheaper than cleaning up an uncertain local +state. + +## Conclusion + +Running Omni Engineer and Claude Engineer in Daytona gives you two clean +perspectives on the same regression. One workspace can focus on reproduction +and the smallest patch. The other can review the plan, strengthen the test +checklist, and prepare release notes. Because both workspaces are created from +Dev Containers, the workflow is repeatable without committing secrets or +machine-specific setup. + +Use this pattern when a regression needs more than a quick fix: one agent to +map the failure, one agent to challenge the patch, and Daytona to keep both +workspaces disposable. + +## References + +- [Daytona](https://github.com/daytonaio/daytona) +- [Omni Engineer](https://github.com/Doriandarko/omni-engineer) +- [Claude Engineer](https://github.com/Doriandarko/claude-engineer) +- [Dev Containers](https://containers.dev/) diff --git a/articles/assets/20260530_ai_regression_triage_in_daytona_workflow.svg b/articles/assets/20260530_ai_regression_triage_in_daytona_workflow.svg new file mode 100644 index 00000000..ccdeab07 --- /dev/null +++ b/articles/assets/20260530_ai_regression_triage_in_daytona_workflow.svg @@ -0,0 +1,57 @@ + + AI regression triage workflow in Daytona + A workflow diagram showing Daytona environment variables flowing into Omni Engineer and Claude Engineer workspaces, then into a reviewed pull request. + + + + + + + + AI Regression Triage in Daytona + Use isolated workspaces for reproduction, patch planning, and review. + + + + Daytona env + OPENROUTER_API_KEY + ANTHROPIC_API_KEY + + + + Omni workspace + Reproduce failure + Find suspect files + Draft smallest patch + + + + Claude workspace + Review patch plan + Check regression tests + Write release notes + + + + Human PR + Minimal diff + Focused tests + Validation notes + + + + + + + Secrets stay in Daytona environment variables. Agent work stays reviewable through commands, diffs, and tests. + diff --git a/authors/bilal_mutahir.md b/authors/bilal_mutahir.md new file mode 100644 index 00000000..6d78bd08 --- /dev/null +++ b/authors/bilal_mutahir.md @@ -0,0 +1,13 @@ +Author: Bilal Mutahir +Title: Open Source Contributor +Description: Bilal Mutahir is an open-source contributor focused on practical +developer workflows, reproducible environments, and small, reviewable changes. +I write about using automation carefully while keeping code, secrets, and +validation steps easy for maintainers to inspect. +Author Image: ![Bilal Mutahir](https://avatars.githubusercontent.com/u/170749017?v=4) +Author LinkedIn: +Author Twitter: +Company Name: Independent +Company Description: Independent open-source contributor +Company Logo Dark: +Company Logo White: diff --git a/definitions/20260530_definition_ai_regression_triage.md b/definitions/20260530_definition_ai_regression_triage.md new file mode 100644 index 00000000..f7f8128a --- /dev/null +++ b/definitions/20260530_definition_ai_regression_triage.md @@ -0,0 +1,23 @@ +--- +title: 'AI Regression Triage' +description: 'A workflow for using AI tools to reproduce, isolate, review, and document software regressions in a controlled environment.' +date: 2026-05-30 +author: 'Bilal Mutahir' +--- + +# AI Regression Triage + +## Definition + +AI regression triage is a debugging workflow where an AI assistant helps +reproduce a failing behavior, identify the smallest relevant code path, propose +a focused patch, and review the validation checklist before a human submits the +final change. + +## Context and Usage + +AI regression triage is most useful when the development environment is +reproducible and disposable. In a Daytona workspace, teams can give separate AI +assistants the same repository and dependency setup while keeping secrets in +environment variables. One assistant can investigate the failure, while another +reviews the proposed patch and test coverage from a clean workspace.