From 07a63eb339f02acb75d55dacb5bd3182180837e1 Mon Sep 17 00:00:00 2001
From: Bilal Mutahir <170749017+bong000@users.noreply.github.com>
Date: Sat, 30 May 2026 19:30:03 +0300
Subject: [PATCH] Add AI regression triage article

Signed-off-by: Bilal Mutahir <170749017+bong000@users.noreply.github.com>
---
 ...0260530_ai_regression_triage_in_daytona.md | 264 ++++++++++++++++++
 ..._regression_triage_in_daytona_workflow.svg |  57 ++++
 authors/bilal_mutahir.md                      |  13 +
 ...0260530_definition_ai_regression_triage.md |  23 ++
 4 files changed, 357 insertions(+)
 create mode 100644 articles/20260530_ai_regression_triage_in_daytona.md
 create mode 100644 articles/assets/20260530_ai_regression_triage_in_daytona_workflow.svg
 create mode 100644 authors/bilal_mutahir.md
 create mode 100644 definitions/20260530_definition_ai_regression_triage.md

diff --git a/articles/20260530_ai_regression_triage_in_daytona.md b/articles/20260530_ai_regression_triage_in_daytona.md
new file mode 100644
index 00000000..e93e8ecf
--- /dev/null
+++ b/articles/20260530_ai_regression_triage_in_daytona.md
@@ -0,0 +1,264 @@
+---
+title: 'AI Regression Triage in Daytona'
+description: 'Run Omni Engineer and Claude Engineer in Daytona workspaces to reproduce, patch, and review regressions without leaking secrets.'
+date: 2026-05-30
+author: 'Bilal Mutahir'
+tags: ['Daytona', 'Dev Containers', 'AI Engineering']
+---
+
+# AI Regression Triage in Daytona
+
+When a regression appears in a repository, the slowest part is often not the
+patch. It is rebuilding the same environment twice, collecting enough context
+for a reviewer, and proving that the fix was not made against a machine-specific
+setup. A [Daytona workspace](/definitions/20240819_definition_daytona workspace.md)
+is useful here because every agent starts from the same repository, the same
+[development container](/definitions/20240819_definition_development container.md),
+and the same environment-variable contract.
+
+This article shows a practical workflow for running
+[Omni Engineer](https://github.com/Doriandarko/omni-engineer) and
+[Claude Engineer](https://github.com/Doriandarko/claude-engineer) inside Daytona
+as two separate AI engineering workspaces. Omni Engineer is used for first-pass
+regression triage: reproduce the failure, find the likely file, and sketch the
+patch. Claude Engineer is used as a second-pass reviewer: check the patch plan,
+look for missing tests, and write a short release note. The goal is not to let
+two agents make uncontrolled changes. The goal is to keep the investigation
+reproducible, reviewable, and easy to reset.
+
+The companion Dev Container contributions for this workflow are:
+
+- [Omni Engineer Dev Container PR](https://github.com/Doriandarko/omni-engineer/pull/42)
+- [Claude Engineer Dev Container PR](https://github.com/Doriandarko/claude-engineer/pull/266)
+
+![AI regression triage workflow](assets/20260530_ai_regression_triage_in_daytona_workflow.svg)
+
+## TL;DR
+
+- Use Daytona to create isolated Omni Engineer and Claude Engineer workspaces.
+- Pass `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, and optional `E2B_API_KEY`
+  through environment variables, not committed files.
+- Ask Omni Engineer to reproduce and map the regression before patching.
+- Ask Claude Engineer to review the proposed patch and test plan from a clean
+  workspace.
+- Commit only human-reviewed code, validation commands, and concise notes.
+
+## What the Dev Containers Add
+
+Both upstream repositories are Python projects. The Dev Container files add the
+same basic contract to each project:
+
+- use the official Python 3.11 Dev Containers image;
+- install dependencies from the repository's `requirements.txt`;
+- make Git available in the workspace;
+- expose only environment-variable names, never secret values;
+- give the developer a short attach message with the command to run.
+
+For Omni Engineer, the Dev Container passes `OPENROUTER_API_KEY` from the local
+environment:
+
+```json
+"containerEnv": {
+  "OPENROUTER_API_KEY": "${localEnv:OPENROUTER_API_KEY}"
+}
+```
+
+For Claude Engineer, the Dev Container passes `ANTHROPIC_API_KEY` and the
+optional `E2B_API_KEY`, and forwards port `5000` for the Flask web interface:
+
+```json
+"containerEnv": {
+  "ANTHROPIC_API_KEY": "${localEnv:ANTHROPIC_API_KEY}",
+  "E2B_API_KEY": "${localEnv:E2B_API_KEY}"
+}
+```
+
+This keeps the workspace reproducible without putting API keys in
+`.env.example`, commit history, screenshots, or prompt transcripts.
+
+## Set the Required Secrets in Daytona
+
+Start by storing the keys as Daytona environment variables. Use the values from
+your provider dashboards.
+
+```bash
+daytona env set OPENROUTER_API_KEY=sk-or-your-openrouter-key
+daytona env set ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
+daytona env set E2B_API_KEY=your-optional-e2b-key
+```
+
+Verify the variables are available before creating workspaces:
+
+```bash
+daytona env list
+```
+
+If `E2B_API_KEY` is not part of your workflow, leave it empty. Claude Engineer
+can still run its local CLI and web interface without committing an E2B key.
+
+## Create the Omni Engineer Workspace
+
+After the Dev Container PR is merged, create a Daytona workspace from Omni
+Engineer:
+
+```bash
+daytona create https://github.com/Doriandarko/omni-engineer
+```
+
+The Dev Container installs the Python packages from `requirements.txt`. When the
+workspace opens, run:
+
+```bash
+python main.py
+```
+
+Use Omni Engineer for the first pass. For a regression triage workflow, give it
+a narrow job:
+
+```text
+We are investigating a failing test in a small Python package.
+First, inspect the repository structure. Then identify the failing command,
+run only the focused test, and explain the smallest patch you would try.
+Do not edit files until the failure is reproduced.
+```
+
+This instruction keeps the first workspace focused on evidence gathering. Ask
+for file paths, failing commands, and the suspected behavior change. If Omni
+Engineer proposes a patch before reproducing the failure, stop it and ask for
+the failing command first.
+
+## Create the Claude Engineer Review Workspace
+
+Create a second Daytona workspace from Claude Engineer:
+
+```bash
+daytona create https://github.com/Doriandarko/claude-engineer
+```
+
+Run the CLI:
+
+```bash
+python ce3.py
+```
+
+Or run the web interface:
+
+```bash
+python app.py
+```
+
+Daytona will forward port `5000`, so the browser UI is available without manual
+port wiring. Use this workspace for review, not duplicate patching. Paste a
+short summary from the Omni workspace:
+
+```text
+Review this regression plan:
+- failing command: pytest tests/test_parser.py::test_quotes
+- suspected file: parser/tokenizer.py
+- proposed patch: preserve escaped quotes before splitting tokens
+- proposed tests: add escaped single and double quote cases
+
+Look for missing edge cases and suggest a minimal validation checklist.
+```
+
+This second-pass review is valuable because it starts from a clean environment.
+It is less likely to inherit a half-edited workspace state, local caches, or
+untracked files from the triage run.
+
+## Example: Patch a Parser Regression
+
+Use a small, realistic target project for the actual fix. A parser regression
+works well because it has a clear failing input and focused tests.
+
+Clone the target repository in both workspaces:
+
+```bash
+git clone https://github.com/example/parser-demo.git
+cd parser-demo
+python -m venv .venv
+. .venv/bin/activate
+pip install -e ".[test]"
+```
+
+In Omni Engineer, reproduce the failure:
+
+```bash
+pytest tests/test_parser.py::test_escaped_quotes -q
+```
+
+Ask Omni Engineer to explain the smallest safe patch. After reviewing the plan,
+make the code change manually or let the agent propose a diff that you inspect
+line by line. Then run:
+
+```bash
+pytest tests/test_parser.py::test_escaped_quotes -q
+pytest tests/test_parser.py -q
+git diff --check
+```
+
+In Claude Engineer, review the final diff and ask for a release note:
+
+```text
+Review this diff for a parser regression fix. Check whether the tests cover
+escaped single quotes, escaped double quotes, and normal quoted strings. Then
+write a two-sentence release note that does not mention implementation details.
+```
+
+The outcome should be a small patch, a focused regression test, and a concise
+review note that can be pasted into the pull request.
+
+## Keep Agent Work Reviewable
+
+AI engineering work is easier to trust when the repository history stays boring.
+Use these guardrails:
+
+| Step | What to Keep | What to Avoid |
+| --- | --- | --- |
+| Environment | Dev Container config and environment variable names | API keys, `.env` files, private prompts |
+| Triage | Failing command, focused file paths, suspected cause | Broad rewrites before reproducing the bug |
+| Patch | Minimal diff plus regression test | Formatting churn and unrelated refactors |
+| Review | Validation commands and reviewer notes | Raw chat transcripts or tool logs |
+
+If an agent suggests touching files outside the failure path, ask it to justify
+the change. If it cannot connect the change to a failing test or acceptance
+criterion, leave it out of the pull request.
+
+## Troubleshooting
+
+If Omni Engineer starts without a model response, check that
+`OPENROUTER_API_KEY` is present in the Daytona workspace. If Claude Engineer
+raises a missing API key error, check `ANTHROPIC_API_KEY`.
+
+If the Claude Engineer web UI is not reachable, confirm that `python app.py` is
+running and that Daytona forwarded port `5000`. If package installation fails,
+open the Dev Container logs and rerun:
+
+```bash
+python -m pip install --upgrade pip
+pip install -r requirements.txt
+```
+
+If a workspace becomes noisy after several agent runs, create a fresh Daytona
+workspace from the same repository. That is the main advantage of this setup:
+resetting the investigation is cheaper than cleaning up an uncertain local
+state.
+
+## Conclusion
+
+Running Omni Engineer and Claude Engineer in Daytona gives you two clean
+perspectives on the same regression. One workspace can focus on reproduction
+and the smallest patch. The other can review the plan, strengthen the test
+checklist, and prepare release notes. Because both workspaces are created from
+Dev Containers, the workflow is repeatable without committing secrets or
+machine-specific setup.
+
+Use this pattern when a regression needs more than a quick fix: one agent to
+map the failure, one agent to challenge the patch, and Daytona to keep both
+workspaces disposable.
+
+## References
+
+- [Daytona](https://github.com/daytonaio/daytona)
+- [Omni Engineer](https://github.com/Doriandarko/omni-engineer)
+- [Claude Engineer](https://github.com/Doriandarko/claude-engineer)
+- [Dev Containers](https://containers.dev/)
diff --git a/articles/assets/20260530_ai_regression_triage_in_daytona_workflow.svg b/articles/assets/20260530_ai_regression_triage_in_daytona_workflow.svg
new file mode 100644
index 00000000..ccdeab07
--- /dev/null
+++ b/articles/assets/20260530_ai_regression_triage_in_daytona_workflow.svg
@@ -0,0 +1,57 @@
+<svg xmlns="http://www.w3.org/2000/svg" width="1200" height="640" viewBox="0 0 1200 640" role="img" aria-labelledby="title desc">
+  <title id="title">AI regression triage workflow in Daytona</title>
+  <desc id="desc">A workflow diagram showing Daytona environment variables flowing into Omni Engineer and Claude Engineer workspaces, then into a reviewed pull request.</desc>
+  <defs>
+    <style>
+      .bg { fill: #f7fafc; }
+      .panel { fill: #ffffff; stroke: #1f2937; stroke-width: 2; rx: 14; }
+      .accent { fill: #2563eb; }
+      .accent2 { fill: #0f766e; }
+      .accent3 { fill: #7c3aed; }
+      .text { font-family: Arial, sans-serif; fill: #111827; font-size: 24px; font-weight: 700; }
+      .small { font-family: Arial, sans-serif; fill: #374151; font-size: 17px; }
+      .tiny { font-family: Arial, sans-serif; fill: #4b5563; font-size: 15px; }
+      .arrow { stroke: #374151; stroke-width: 3; fill: none; marker-end: url(#arrowhead); }
+    </style>
+    <marker id="arrowhead" markerWidth="12" markerHeight="8" refX="10" refY="4" orient="auto">
+      <path d="M0,0 L12,4 L0,8 z" fill="#374151" />
+    </marker>
+  </defs>
+  <rect class="bg" x="0" y="0" width="1200" height="640" />
+  <text class="text" x="72" y="70">AI Regression Triage in Daytona</text>
+  <text class="small" x="72" y="105">Use isolated workspaces for reproduction, patch planning, and review.</text>
+
+  <rect class="panel" x="70" y="155" width="250" height="145" />
+  <circle class="accent" cx="105" cy="195" r="13" />
+  <text class="text" x="130" y="203">Daytona env</text>
+  <text class="small" x="95" y="238">OPENROUTER_API_KEY</text>
+  <text class="small" x="95" y="265">ANTHROPIC_API_KEY</text>
+
+  <rect class="panel" x="450" y="110" width="300" height="165" />
+  <circle class="accent2" cx="487" cy="150" r="13" />
+  <text class="text" x="512" y="158">Omni workspace</text>
+  <text class="small" x="475" y="195">Reproduce failure</text>
+  <text class="small" x="475" y="223">Find suspect files</text>
+  <text class="small" x="475" y="251">Draft smallest patch</text>
+
+  <rect class="panel" x="450" y="365" width="300" height="165" />
+  <circle class="accent3" cx="487" cy="405" r="13" />
+  <text class="text" x="512" y="413">Claude workspace</text>
+  <text class="small" x="475" y="450">Review patch plan</text>
+  <text class="small" x="475" y="478">Check regression tests</text>
+  <text class="small" x="475" y="506">Write release notes</text>
+
+  <rect class="panel" x="880" y="245" width="250" height="150" />
+  <circle class="accent" cx="915" cy="285" r="13" />
+  <text class="text" x="940" y="293">Human PR</text>
+  <text class="small" x="905" y="330">Minimal diff</text>
+  <text class="small" x="905" y="357">Focused tests</text>
+  <text class="small" x="905" y="384">Validation notes</text>
+
+  <path class="arrow" d="M320 225 C370 225, 395 190, 445 190" />
+  <path class="arrow" d="M320 235 C370 235, 395 445, 445 445" />
+  <path class="arrow" d="M750 195 C815 210, 830 270, 875 300" />
+  <path class="arrow" d="M750 445 C815 430, 830 370, 875 340" />
+
+  <text class="tiny" x="78" y="555">Secrets stay in Daytona environment variables. Agent work stays reviewable through commands, diffs, and tests.</text>
+</svg>
diff --git a/authors/bilal_mutahir.md b/authors/bilal_mutahir.md
new file mode 100644
index 00000000..6d78bd08
--- /dev/null
+++ b/authors/bilal_mutahir.md
@@ -0,0 +1,13 @@
+Author: Bilal Mutahir
+Title: Open Source Contributor
+Description: Bilal Mutahir is an open-source contributor focused on practical
+developer workflows, reproducible environments, and small, reviewable changes.
+I write about using automation carefully while keeping code, secrets, and
+validation steps easy for maintainers to inspect.
+Author Image: ![Bilal Mutahir](https://avatars.githubusercontent.com/u/170749017?v=4)
+Author LinkedIn:
+Author Twitter:
+Company Name: Independent
+Company Description: Independent open-source contributor
+Company Logo Dark:
+Company Logo White:
diff --git a/definitions/20260530_definition_ai_regression_triage.md b/definitions/20260530_definition_ai_regression_triage.md
new file mode 100644
index 00000000..f7f8128a
--- /dev/null
+++ b/definitions/20260530_definition_ai_regression_triage.md
@@ -0,0 +1,23 @@
+---
+title: 'AI Regression Triage'
+description: 'A workflow for using AI tools to reproduce, isolate, review, and document software regressions in a controlled environment.'
+date: 2026-05-30
+author: 'Bilal Mutahir'
+---
+
+# AI Regression Triage
+
+## Definition
+
+AI regression triage is a debugging workflow where an AI assistant helps
+reproduce a failing behavior, identify the smallest relevant code path, propose
+a focused patch, and review the validation checklist before a human submits the
+final change.
+
+## Context and Usage
+
+AI regression triage is most useful when the development environment is
+reproducible and disposable. In a Daytona workspace, teams can give separate AI
+assistants the same repository and dependency setup while keeping secrets in
+environment variables. One assistant can investigate the failure, while another
+reviews the proposed patch and test coverage from a clean workspace.