From 5d0d5cbd2321a15ce43d02c6c1f9bc50fd23152f Mon Sep 17 00:00:00 2001 From: davidrsdiaz Date: Thu, 28 May 2026 02:51:12 -0500 Subject: [PATCH] Add Daytona AI engineer rehearsal article Signed-off-by: davidrsdiaz --- ...8_run_ai_engineer_rehearsals_in_daytona.md | 215 ++++++++++++++++++ ...ngineer_rehearsals_in_daytona_workflow.svg | 24 ++ authors/david_rsd.md | 6 + ...528_definition_prompt_to_patch_workflow.md | 24 ++ 4 files changed, 269 insertions(+) create mode 100644 articles/20260528_run_ai_engineer_rehearsals_in_daytona.md create mode 100644 articles/assets/20260528_run_ai_engineer_rehearsals_in_daytona_workflow.svg create mode 100644 authors/david_rsd.md create mode 100644 definitions/20260528_definition_prompt_to_patch_workflow.md diff --git a/articles/20260528_run_ai_engineer_rehearsals_in_daytona.md b/articles/20260528_run_ai_engineer_rehearsals_in_daytona.md new file mode 100644 index 00000000..2d7a4214 --- /dev/null +++ b/articles/20260528_run_ai_engineer_rehearsals_in_daytona.md @@ -0,0 +1,215 @@ +--- +title: 'Run AI Engineer Rehearsals in Daytona' +description: 'Use Daytona workspaces to compare Omni Engineer and Claude Engineer on the same prompt-to-patch task.' +date: 2026-05-28 +author: 'David RSD' +tags: ['daytona', 'ai engineering', 'devcontainer', 'openrouter', 'claude'] +--- + +# Run AI Engineer Rehearsals in Daytona + +## Introduction + +AI coding tools are easiest to evaluate when the environment stays the same. +If one agent runs on a laptop with old packages, another runs in a fresh virtual +environment, and a third runs without the same API keys, the comparison is not +about the agents anymore. It is about drift. A repeatable +[development container](/definitions/20240819_definition_development%20container.md) +keeps that noise out of the experiment. + +This article shows how to run two open source AI engineer projects in Daytona: +[Omni Engineer](https://github.com/Doriandarko/omni-engineer) and +[Claude Engineer](https://github.com/Doriandarko/claude-engineer). The goal is +not to let an agent push directly to production. The goal is to create a +controlled [prompt-to-patch workflow](/definitions/20260528_definition_prompt_to_patch_workflow.md) +where each agent receives the same task, produces a proposed patch, and leaves +the developer with a diff that can be reviewed and tested. + +The companion setup work is available in two pull requests: +[Doriandarko/omni-engineer#38](https://github.com/Doriandarko/omni-engineer/pull/38) +adds a Daytona-ready devcontainer for Omni Engineer, and +[Doriandarko/claude-engineer#262](https://github.com/Doriandarko/claude-engineer/pull/262) +adds the same kind of workspace entry point for Claude Engineer. + +![AI engineer rehearsal workflow](assets/20260528_run_ai_engineer_rehearsals_in_daytona_workflow.svg) + +## TL;DR + +- Use Daytona to create a clean workspace for each AI engineer project. +- Forward only the API keys each tool needs from your local machine. +- Give both agents the same small, reviewable coding task. +- Compare the generated diffs, then run the same validation command in each + workspace before accepting either patch. + +## Why Rehearse AI Engineering Work + +AI coding agents are useful when they shorten the path from intent to reviewed +code. They are risky when the developer cannot reproduce how the patch was +created. A rehearsal makes the workflow observable. You keep the prompt, the +repository state, the dependency install, the generated diff, and the validation +commands close together. + +That structure matters for teams adopting AI-assisted development. Without it, +one successful demo can hide fragile setup steps. With it, a maintainer can +rerun the same task after a package update, compare models, or ask a second +agent to solve the same issue without changing the surrounding environment. + +Daytona is a good fit because the workspace can be tied to repository setup +instead of an individual's laptop. When the repository has a devcontainer, the +workspace knows which base image to use, which dependencies to install, which +ports to expose, and which environment variables must be passed in at runtime. + +## Workspace Layout + +The Omni Engineer devcontainer uses a Python 3.11 image, installs the project's +`requirements.txt`, and forwards `OPENROUTER_API_KEY` from the local machine. +That matches Omni Engineer's OpenRouter-based client setup and keeps the key out +of the repository. The attach command compiles `main.py` so a syntax problem is +caught as soon as the workspace opens. + +The Claude Engineer devcontainer also uses Python 3.11 and installs from +`requirements.txt`. It forwards `ANTHROPIC_API_KEY` for Claude access and +`E2B_API_KEY` for the optional code execution tool. It also forwards port +`5000`, which is the Flask web interface documented by Claude Engineer. + +Both setup files intentionally avoid committing `.env` files. The workspace +should receive secrets from the developer's machine or from a secret manager. +That makes the setup reusable without turning the repository into a storage +place for personal credentials. + +## Step 1: Prepare Local Keys + +Export the keys you plan to use before opening the Daytona workspace. For Omni +Engineer, set OpenRouter: + +```bash +export OPENROUTER_API_KEY="your-openrouter-key" +``` + +For Claude Engineer, set Anthropic and, if needed, E2B: + +```bash +export ANTHROPIC_API_KEY="your-anthropic-key" +export E2B_API_KEY="your-e2b-key" +``` + +If you are testing the companion branches before they are merged, create a +workspace from your fork or select the branch that contains +`.devcontainer/devcontainer.json` in your Daytona project configuration. After +the pull requests are merged, the upstream repositories can be used directly. + +## Step 2: Open Omni Engineer in Daytona + +Create a workspace for Omni Engineer: + +```bash +daytona create https://github.com/Doriandarko/omni-engineer --code +``` + +When the devcontainer is present, Daytona builds the Python workspace and runs +the dependency install command. From the workspace terminal, start the console: + +```bash +python main.py +``` + +Use a small task for the first rehearsal. A good prompt asks for a narrow +change, names the file, and defines the validation command. For example: + +```text +Add a --version command that prints the package version. Keep the change +minimal. After editing, show the diff and run python -m compileall main.py. +``` + +The point is not to make the agent solve the largest possible feature. The point +is to observe how it gathers context, edits the file, reports the diff, and +responds to a failing validation command. + +## Step 3: Open Claude Engineer in Daytona + +Create a separate workspace for Claude Engineer: + +```bash +daytona create https://github.com/Doriandarko/claude-engineer --code +``` + +Start either the web interface or the CLI: + +```bash +python app.py +``` + +```bash +python ce3.py +``` + +The web interface is useful when you want a browser-based chat surface and +visual token feedback. The CLI is better for terminal-heavy work where the diff +and validation commands are the main focus. + +Give Claude Engineer the same task you used with Omni Engineer. Keep the +repository state equivalent. If one workspace has local edits, reset or recreate +it before comparing results. Otherwise, you are measuring the difference between +starting states instead of the difference between agent behavior. + +## Step 4: Compare the Diffs + +After each agent finishes, inspect the patch before running it: + +```bash +git diff +``` + +Look for three things. First, check scope. The diff should touch only the files +needed for the prompt. Second, check reversibility. A small patch is easier to +discard or revise than a broad rewrite. Third, check explanation quality. A +good AI engineer should be able to tell you what it changed, why it changed it, +and how it validated the result. + +Then run the validation command in the same workspace where the patch was +created. For these repositories, a fast first check is Python compilation: + +```bash +python -m compileall main.py +``` + +For Claude Engineer, include the main entry points and tool directories: + +```bash +python -m compileall app.py ce3.py config.py tools prompts +``` + +Compilation does not prove the behavior is correct, but it catches syntax +errors before you spend time on manual review. For a real contribution, add the +project's test command or a targeted smoke test. + +## Step 5: Keep the Best Patch + +Once both agents have produced a patch, keep the version that is smallest, +clearest, and easiest to validate. If neither patch is good, that is still a +useful result. The rehearsal exposed a task that needs a better prompt, more +context, or a human implementation. + +For team use, save the prompt, agent name, model, validation command, and final +diff in the issue or pull request. That record turns an AI-generated patch into +an auditable engineering artifact. Future reviewers can see the same inputs and +rerun the same checks in Daytona. + +## Conclusion + +Omni Engineer and Claude Engineer can both be useful coding assistants, but they +are more valuable when they run inside a controlled environment. Daytona gives +each tool a repeatable workspace, forwards only the secrets it needs, and keeps +the generated patch close to the validation commands. + +Use this pattern for small, reviewable tasks first. Once the team trusts the +workflow, expand it to more complex fixes, test generation, and documentation +updates. The discipline stays the same: stable environment, clear prompt, small +diff, explicit validation, and human review before merge. + +## References + +- [Omni Engineer](https://github.com/Doriandarko/omni-engineer) +- [Claude Engineer](https://github.com/Doriandarko/claude-engineer) +- [Omni Engineer Daytona devcontainer PR](https://github.com/Doriandarko/omni-engineer/pull/38) +- [Claude Engineer Daytona devcontainer PR](https://github.com/Doriandarko/claude-engineer/pull/262) diff --git a/articles/assets/20260528_run_ai_engineer_rehearsals_in_daytona_workflow.svg b/articles/assets/20260528_run_ai_engineer_rehearsals_in_daytona_workflow.svg new file mode 100644 index 00000000..1b128a98 --- /dev/null +++ b/articles/assets/20260528_run_ai_engineer_rehearsals_in_daytona_workflow.svg @@ -0,0 +1,24 @@ + + AI engineer rehearsal workflow in Daytona + A Daytona workspace prepares Omni Engineer and Claude Engineer for repeatable prompt-to-patch rehearsals. + + + Daytona workspace + One reproducible environment for setup, prompting, patch review, and validation. + + Repo branch + devcontainer config + + + + AI engineer + Omni Engineer + Claude Engineer + + + + Patch diff + review and test + + Repeat the same prompt with the same dependencies before trusting the result. + diff --git a/authors/david_rsd.md b/authors/david_rsd.md new file mode 100644 index 00000000..7f48f1f9 --- /dev/null +++ b/authors/david_rsd.md @@ -0,0 +1,6 @@ +Author: David RSD Title: Software Engineer Description: David RSD is a software +engineer focused on practical automation, developer tools, and AI-assisted +workflows that turn messy operational work into maintainable systems. Author +Image: Author LinkedIn: +Author Twitter: Company Name: Independent Company Description: Independent +software engineering and automation work. diff --git a/definitions/20260528_definition_prompt_to_patch_workflow.md b/definitions/20260528_definition_prompt_to_patch_workflow.md new file mode 100644 index 00000000..ebdcbc6b --- /dev/null +++ b/definitions/20260528_definition_prompt_to_patch_workflow.md @@ -0,0 +1,24 @@ +--- +title: 'Prompt-to-Patch Workflow' +description: 'A development workflow where an AI agent turns a scoped prompt into proposed code changes that a human can review.' +date: 2026-05-28 +author: 'David RSD' +--- + +# Prompt-to-Patch Workflow + +## Definition + +A prompt-to-patch workflow is a software development loop where a developer gives +an AI coding agent a scoped task, relevant project context, and constraints, then +reviews the patch that the agent proposes. The output is not treated as complete +until it is inspected, tested, and either revised or rejected by a human +maintainer. + +## Context and Usage + +Teams use prompt-to-patch workflows for bug fixes, codebase exploration, +prototype implementation, test generation, and repetitive maintenance work. A +good workflow keeps the agent inside a reproducible development environment, +captures the exact prompt, runs validation commands, and makes the resulting +diff easy to compare against the original repository state.