Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
264 changes: 264 additions & 0 deletions articles/20260530_ai_regression_triage_in_daytona.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,264 @@
---
title: 'AI Regression Triage in Daytona'
description: 'Run Omni Engineer and Claude Engineer in Daytona workspaces to reproduce, patch, and review regressions without leaking secrets.'
date: 2026-05-30
author: 'Bilal Mutahir'
tags: ['Daytona', 'Dev Containers', 'AI Engineering']
---

# AI Regression Triage in Daytona

When a regression appears in a repository, the slowest part is often not the
patch. It is rebuilding the same environment twice, collecting enough context
for a reviewer, and proving that the fix was not made against a machine-specific
setup. A [Daytona workspace](/definitions/20240819_definition_daytona workspace.md)
is useful here because every agent starts from the same repository, the same
[development container](/definitions/20240819_definition_development container.md),
and the same environment-variable contract.

This article shows a practical workflow for running
[Omni Engineer](https://github.com/Doriandarko/omni-engineer) and
[Claude Engineer](https://github.com/Doriandarko/claude-engineer) inside Daytona
as two separate AI engineering workspaces. Omni Engineer is used for first-pass
regression triage: reproduce the failure, find the likely file, and sketch the
patch. Claude Engineer is used as a second-pass reviewer: check the patch plan,
look for missing tests, and write a short release note. The goal is not to let
two agents make uncontrolled changes. The goal is to keep the investigation
reproducible, reviewable, and easy to reset.

The companion Dev Container contributions for this workflow are:

- [Omni Engineer Dev Container PR](https://github.com/Doriandarko/omni-engineer/pull/42)
- [Claude Engineer Dev Container PR](https://github.com/Doriandarko/claude-engineer/pull/266)

![AI regression triage workflow](assets/20260530_ai_regression_triage_in_daytona_workflow.svg)

## TL;DR

- Use Daytona to create isolated Omni Engineer and Claude Engineer workspaces.
- Pass `OPENROUTER_API_KEY`, `ANTHROPIC_API_KEY`, and optional `E2B_API_KEY`
through environment variables, not committed files.
- Ask Omni Engineer to reproduce and map the regression before patching.
- Ask Claude Engineer to review the proposed patch and test plan from a clean
workspace.
- Commit only human-reviewed code, validation commands, and concise notes.

## What the Dev Containers Add

Both upstream repositories are Python projects. The Dev Container files add the
same basic contract to each project:

- use the official Python 3.11 Dev Containers image;
- install dependencies from the repository's `requirements.txt`;
- make Git available in the workspace;
- expose only environment-variable names, never secret values;
- give the developer a short attach message with the command to run.

For Omni Engineer, the Dev Container passes `OPENROUTER_API_KEY` from the local
environment:

```json
"containerEnv": {
"OPENROUTER_API_KEY": "${localEnv:OPENROUTER_API_KEY}"
}
```

For Claude Engineer, the Dev Container passes `ANTHROPIC_API_KEY` and the
optional `E2B_API_KEY`, and forwards port `5000` for the Flask web interface:

```json
"containerEnv": {
"ANTHROPIC_API_KEY": "${localEnv:ANTHROPIC_API_KEY}",
"E2B_API_KEY": "${localEnv:E2B_API_KEY}"
}
```

This keeps the workspace reproducible without putting API keys in
`.env.example`, commit history, screenshots, or prompt transcripts.

## Set the Required Secrets in Daytona

Start by storing the keys as Daytona environment variables. Use the values from
your provider dashboards.

```bash
daytona env set OPENROUTER_API_KEY=sk-or-your-openrouter-key
daytona env set ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
daytona env set E2B_API_KEY=your-optional-e2b-key
```

Verify the variables are available before creating workspaces:

```bash
daytona env list
```

If `E2B_API_KEY` is not part of your workflow, leave it empty. Claude Engineer
can still run its local CLI and web interface without committing an E2B key.

## Create the Omni Engineer Workspace

After the Dev Container PR is merged, create a Daytona workspace from Omni
Engineer:

```bash
daytona create https://github.com/Doriandarko/omni-engineer
```

The Dev Container installs the Python packages from `requirements.txt`. When the
workspace opens, run:

```bash
python main.py
```

Use Omni Engineer for the first pass. For a regression triage workflow, give it
a narrow job:

```text
We are investigating a failing test in a small Python package.
First, inspect the repository structure. Then identify the failing command,
run only the focused test, and explain the smallest patch you would try.
Do not edit files until the failure is reproduced.
```

This instruction keeps the first workspace focused on evidence gathering. Ask
for file paths, failing commands, and the suspected behavior change. If Omni
Engineer proposes a patch before reproducing the failure, stop it and ask for
the failing command first.

## Create the Claude Engineer Review Workspace

Create a second Daytona workspace from Claude Engineer:

```bash
daytona create https://github.com/Doriandarko/claude-engineer
```

Run the CLI:

```bash
python ce3.py
```

Or run the web interface:

```bash
python app.py
```

Daytona will forward port `5000`, so the browser UI is available without manual
port wiring. Use this workspace for review, not duplicate patching. Paste a
short summary from the Omni workspace:

```text
Review this regression plan:
- failing command: pytest tests/test_parser.py::test_quotes
- suspected file: parser/tokenizer.py
- proposed patch: preserve escaped quotes before splitting tokens
- proposed tests: add escaped single and double quote cases

Look for missing edge cases and suggest a minimal validation checklist.
```

This second-pass review is valuable because it starts from a clean environment.
It is less likely to inherit a half-edited workspace state, local caches, or
untracked files from the triage run.

## Example: Patch a Parser Regression

Use a small, realistic target project for the actual fix. A parser regression
works well because it has a clear failing input and focused tests.

Clone the target repository in both workspaces:

```bash
git clone https://github.com/example/parser-demo.git
cd parser-demo
python -m venv .venv
. .venv/bin/activate
pip install -e ".[test]"
```

In Omni Engineer, reproduce the failure:

```bash
pytest tests/test_parser.py::test_escaped_quotes -q
```

Ask Omni Engineer to explain the smallest safe patch. After reviewing the plan,
make the code change manually or let the agent propose a diff that you inspect
line by line. Then run:

```bash
pytest tests/test_parser.py::test_escaped_quotes -q
pytest tests/test_parser.py -q
git diff --check
```

In Claude Engineer, review the final diff and ask for a release note:

```text
Review this diff for a parser regression fix. Check whether the tests cover
escaped single quotes, escaped double quotes, and normal quoted strings. Then
write a two-sentence release note that does not mention implementation details.
```

The outcome should be a small patch, a focused regression test, and a concise
review note that can be pasted into the pull request.

## Keep Agent Work Reviewable

AI engineering work is easier to trust when the repository history stays boring.
Use these guardrails:

| Step | What to Keep | What to Avoid |
| --- | --- | --- |
| Environment | Dev Container config and environment variable names | API keys, `.env` files, private prompts |
| Triage | Failing command, focused file paths, suspected cause | Broad rewrites before reproducing the bug |
| Patch | Minimal diff plus regression test | Formatting churn and unrelated refactors |
| Review | Validation commands and reviewer notes | Raw chat transcripts or tool logs |

If an agent suggests touching files outside the failure path, ask it to justify
the change. If it cannot connect the change to a failing test or acceptance
criterion, leave it out of the pull request.

## Troubleshooting

If Omni Engineer starts without a model response, check that
`OPENROUTER_API_KEY` is present in the Daytona workspace. If Claude Engineer
raises a missing API key error, check `ANTHROPIC_API_KEY`.

If the Claude Engineer web UI is not reachable, confirm that `python app.py` is
running and that Daytona forwarded port `5000`. If package installation fails,
open the Dev Container logs and rerun:

```bash
python -m pip install --upgrade pip
pip install -r requirements.txt
```

If a workspace becomes noisy after several agent runs, create a fresh Daytona
workspace from the same repository. That is the main advantage of this setup:
resetting the investigation is cheaper than cleaning up an uncertain local
state.

## Conclusion

Running Omni Engineer and Claude Engineer in Daytona gives you two clean
perspectives on the same regression. One workspace can focus on reproduction
and the smallest patch. The other can review the plan, strengthen the test
checklist, and prepare release notes. Because both workspaces are created from
Dev Containers, the workflow is repeatable without committing secrets or
machine-specific setup.

Use this pattern when a regression needs more than a quick fix: one agent to
map the failure, one agent to challenge the patch, and Daytona to keep both
workspaces disposable.

## References

- [Daytona](https://github.com/daytonaio/daytona)
- [Omni Engineer](https://github.com/Doriandarko/omni-engineer)
- [Claude Engineer](https://github.com/Doriandarko/claude-engineer)
- [Dev Containers](https://containers.dev/)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 13 additions & 0 deletions authors/bilal_mutahir.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Author: Bilal Mutahir
Title: Open Source Contributor
Description: Bilal Mutahir is an open-source contributor focused on practical
developer workflows, reproducible environments, and small, reviewable changes.
I write about using automation carefully while keeping code, secrets, and
validation steps easy for maintainers to inspect.
Author Image: ![Bilal Mutahir](https://avatars.githubusercontent.com/u/170749017?v=4)
Author LinkedIn:
Author Twitter:
Company Name: Independent
Company Description: Independent open-source contributor
Company Logo Dark:
Company Logo White:
23 changes: 23 additions & 0 deletions definitions/20260530_definition_ai_regression_triage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: 'AI Regression Triage'
description: 'A workflow for using AI tools to reproduce, isolate, review, and document software regressions in a controlled environment.'
date: 2026-05-30
author: 'Bilal Mutahir'
---

# AI Regression Triage

## Definition

AI regression triage is a debugging workflow where an AI assistant helps
reproduce a failing behavior, identify the smallest relevant code path, propose
a focused patch, and review the validation checklist before a human submits the
final change.

## Context and Usage

AI regression triage is most useful when the development environment is
reproducible and disposable. In a Daytona workspace, teams can give separate AI
assistants the same repository and dependency setup while keeping secrets in
environment variables. One assistant can investigate the failure, while another
reviews the proposed patch and test coverage from a clean workspace.