daytonaio · bong000 · May 30, 2026
diff --git a/articles/20260530_ai_regression_triage_in_daytona.md b/articles/20260530_ai_regression_triage_in_daytona.md
@@ -0,0 +1,264 @@
+---
+title: 'AI Regression Triage in Daytona'
+description: 'Run Omni Engineer and Claude Engineer in Daytona workspaces to reproduce, patch, and review regressions without leaking secrets.'
+date: 2026-05-30
+author: 'Bilal Mutahir'
+tags: ['Daytona', 'Dev Containers', 'AI Engineering']
+---
+
+# AI Regression Triage in Daytona
+
+When a regression appears in a repository, the slowest part is often not the
+patch. It is rebuilding the same environment twice, collecting enough context
+for a reviewer, and proving that the fix was not made against a machine-specific
+setup. A [Daytona workspace](/definitions/20240819_definition_daytona workspace.md)
+is useful here because every agent starts from the same repository, the same
+[development container](/definitions/20240819_definition_development container.md),
+and the same environment-variable contract.
+
+This article shows a practical workflow for running
+[Omni Engineer](https://github.com/Doriandarko/omni-engineer) and
+[Claude Engineer](https://github.com/Doriandarko/claude-engineer) inside Daytona
+as two separate AI engineering workspaces. Omni Engineer is used for first-pass
+regression triage: reproduce the failure, find the likely file, and sketch the
+patch. Claude Engineer is used as a second-pass reviewer: check the patch plan,
+look for missing tests, and write a short release note. The goal is not to let
+two agents make uncontrolled changes. The goal is to keep the investigation
+reproducible, reviewable, and easy to reset.
+
+The companion Dev Container contributions for this workflow are:
+
+- [Omni Engineer Dev Container PR](https://github.com/Doriandarko/omni-engineer/pull/42)
+- [Claude Engineer Dev Container PR](https://github.com/Doriandarko/claude-engineer/pull/266)
+
+![AI regression triage workflow](assets/20260530_ai_regression_triage_in_daytona_workflow.svg)
+
+## TL;DR
+
+- Use Daytona to create isolated Omni Engineer and Claude Engineer workspaces.
+- Pass `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, and optional `E2B_API_KEY`
+  through environment variables, not committed files.
+- Ask Omni Engineer to reproduce and map the regression before patching.
+- Ask Claude Engineer to review the proposed patch and test plan from a clean
+  workspace.
+- Commit only human-reviewed code, validation commands, and concise notes.
+
+## What the Dev Containers Add
+
+Both upstream repositories are Python projects. The Dev Container files add the
+same basic contract to each project:
+
+- use the official Python 3.11 Dev Containers image;
+- install dependencies from the repository's `requirements.txt`;
+- make Git available in the workspace;
+- expose only environment-variable names, never secret values;
+- give the developer a short attach message with the command to run.
+
+For Omni Engineer, the Dev Container passes `OPENROUTER_API_KEY` from the local
+environment:
+
+```json
+"containerEnv": {
+  "OPENROUTER_API_KEY": "${localEnv:OPENROUTER_API_KEY}"
+}
+```
+
+For Claude Engineer, the Dev Container passes `ANTHROPIC_API_KEY` and the
+optional `E2B_API_KEY`, and forwards port `5000` for the Flask web interface:
+
+```json
+"containerEnv": {
+  "ANTHROPIC_API_KEY": "${localEnv:ANTHROPIC_API_KEY}",
+  "E2B_API_KEY": "${localEnv:E2B_API_KEY}"
+}
+```
+
+This keeps the workspace reproducible without putting API keys in
+`.env.example`, commit history, screenshots, or prompt transcripts.
+
+## Set the Required Secrets in Daytona
+
+Start by storing the keys as Daytona environment variables. Use the values from
+your provider dashboards.
+
+```bash
+daytona env set OPENROUTER_API_KEY=sk-or-your-openrouter-key
+daytona env set ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
+daytona env set E2B_API_KEY=your-optional-e2b-key
+```
+
+Verify the variables are available before creating workspaces:
+
+```bash
+daytona env list
+```
+
+If `E2B_API_KEY` is not part of your workflow, leave it empty. Claude Engineer
+can still run its local CLI and web interface without committing an E2B key.
+
+## Create the Omni Engineer Workspace
+
+After the Dev Container PR is merged, create a Daytona workspace from Omni
+Engineer:
+
+```bash
+daytona create https://github.com/Doriandarko/omni-engineer
+```
+
+The Dev Container installs the Python packages from `requirements.txt`. When the
+workspace opens, run:
+
+```bash
+python main.py
+```
+
+Use Omni Engineer for the first pass. For a regression triage workflow, give it
+a narrow job:
+
+```text
+We are investigating a failing test in a small Python package.
+First, inspect the repository structure. Then identify the failing command,
+run only the focused test, and explain the smallest patch you would try.
+Do not edit files until the failure is reproduced.
+```
+
+This instruction keeps the first workspace focused on evidence gathering. Ask
+for file paths, failing commands, and the suspected behavior change. If Omni
+Engineer proposes a patch before reproducing the failure, stop it and ask for
+the failing command first.
+
+## Create the Claude Engineer Review Workspace
+
+Create a second Daytona workspace from Claude Engineer:
+
+```bash
+daytona create https://github.com/Doriandarko/claude-engineer
+```
+
+Run the CLI:
+
+```bash
+python ce3.py
+```
+
+Or run the web interface:
+
+```bash
+python app.py
+```
+
+Daytona will forward port `5000`, so the browser UI is available without manual
+port wiring. Use this workspace for review, not duplicate patching. Paste a
+short summary from the Omni workspace:
+
+```text
+Review this regression plan:
+- failing command: pytest tests/test_parser.py::test_quotes
+- suspected file: parser/tokenizer.py
+- proposed patch: preserve escaped quotes before splitting tokens
+- proposed tests: add escaped single and double quote cases
+
+Look for missing edge cases and suggest a minimal validation checklist.
+```
+
+This second-pass review is valuable because it starts from a clean environment.
+It is less likely to inherit a half-edited workspace state, local caches, or
+untracked files from the triage run.
+
+## Example: Patch a Parser Regression
+
+Use a small, realistic target project for the actual fix. A parser regression
+works well because it has a clear failing input and focused tests.
+
+Clone the target repository in both workspaces:
+
+```bash
+git clone https://github.com/example/parser-demo.git
+cd parser-demo
+python -m venv .venv
+. .venv/bin/activate
+pip install -e ".[test]"
+```
+
+In Omni Engineer, reproduce the failure:
+
+```bash
+pytest tests/test_parser.py::test_escaped_quotes -q
+```
+
+Ask Omni Engineer to explain the smallest safe patch. After reviewing the plan,
+make the code change manually or let the agent propose a diff that you inspect
+line by line. Then run:
+
+```bash
+pytest tests/test_parser.py::test_escaped_quotes -q
+pytest tests/test_parser.py -q
+git diff --check
+```
+
+In Claude Engineer, review the final diff and ask for a release note:
+
+```text
+Review this diff for a parser regression fix. Check whether the tests cover
+escaped single quotes, escaped double quotes, and normal quoted strings. Then
+write a two-sentence release note that does not mention implementation details.
+```
+
+The outcome should be a small patch, a focused regression test, and a concise
+review note that can be pasted into the pull request.
+
+## Keep Agent Work Reviewable
+
+AI engineering work is easier to trust when the repository history stays boring.
+Use these guardrails:
+
+| Step | What to Keep | What to Avoid |
+| --- | --- | --- |
+| Environment | Dev Container config and environment variable names | API keys, `.env` files, private prompts |
+| Triage | Failing command, focused file paths, suspected cause | Broad rewrites before reproducing the bug |
+| Patch | Minimal diff plus regression test | Formatting churn and unrelated refactors |
+| Review | Validation commands and reviewer notes | Raw chat transcripts or tool logs |
+
+If an agent suggests touching files outside the failure path, ask it to justify
+the change. If it cannot connect the change to a failing test or acceptance
+criterion, leave it out of the pull request.
+
+## Troubleshooting
+
+If Omni Engineer starts without a model response, check that
+`OPENROUTER_API_KEY` is present in the Daytona workspace. If Claude Engineer
+raises a missing API key error, check `ANTHROPIC_API_KEY`.
+
+If the Claude Engineer web UI is not reachable, confirm that `python app.py` is
+running and that Daytona forwarded port `5000`. If package installation fails,
+open the Dev Container logs and rerun:
+
+```bash
+python -m pip install --upgrade pip
+pip install -r requirements.txt
+```
+
+If a workspace becomes noisy after several agent runs, create a fresh Daytona
+workspace from the same repository. That is the main advantage of this setup:
+resetting the investigation is cheaper than cleaning up an uncertain local
+state.
+
+## Conclusion
+
+Running Omni Engineer and Claude Engineer in Daytona gives you two clean
+perspectives on the same regression. One workspace can focus on reproduction
+and the smallest patch. The other can review the plan, strengthen the test
+checklist, and prepare release notes. Because both workspaces are created from
+Dev Containers, the workflow is repeatable without committing secrets or
+machine-specific setup.
+
+Use this pattern when a regression needs more than a quick fix: one agent to
+map the failure, one agent to challenge the patch, and Daytona to keep both
+workspaces disposable.
+
+## References
+
+- [Daytona](https://github.com/daytonaio/daytona)
+- [Omni Engineer](https://github.com/Doriandarko/omni-engineer)
+- [Claude Engineer](https://github.com/Doriandarko/claude-engineer)
+- [Dev Containers](https://containers.dev/)
diff --git a/articles/assets/20260530_ai_regression_triage_in_daytona_workflow.svg b/articles/assets/20260530_ai_regression_triage_in_daytona_workflow.svg
diff --git a/authors/bilal_mutahir.md b/authors/bilal_mutahir.md
@@ -0,0 +1,13 @@
+Author: Bilal Mutahir
+Title: Open Source Contributor
+Description: Bilal Mutahir is an open-source contributor focused on practical
+developer workflows, reproducible environments, and small, reviewable changes.
+I write about using automation carefully while keeping code, secrets, and
+validation steps easy for maintainers to inspect.
+Author Image: ![Bilal Mutahir](https://avatars.githubusercontent.com/u/170749017?v=4)
+Author LinkedIn:
+Author Twitter:
+Company Name: Independent
+Company Description: Independent open-source contributor
+Company Logo Dark:
+Company Logo White:
diff --git a/definitions/20260530_definition_ai_regression_triage.md b/definitions/20260530_definition_ai_regression_triage.md
@@ -0,0 +1,23 @@
+---
+title: 'AI Regression Triage'
+description: 'A workflow for using AI tools to reproduce, isolate, review, and document software regressions in a controlled environment.'
+date: 2026-05-30
+author: 'Bilal Mutahir'
+---
+
+# AI Regression Triage
+
+## Definition
+
+AI regression triage is a debugging workflow where an AI assistant helps
+reproduce a failing behavior, identify the smallest relevant code path, propose
+a focused patch, and review the validation checklist before a human submits the
+final change.
+
+## Context and Usage
+
+AI regression triage is most useful when the development environment is
+reproducible and disposable. In a Daytona workspace, teams can give separate AI
+assistants the same repository and dependency setup while keeping secrets in
+environment variables. One assistant can investigate the failure, while another
+reviews the proposed patch and test coverage from a clean workspace.