Merge pull request #12 from runloopai/james/step1

james-rl · web-flow · commit 0ac2b47e1757 · 2026-01-07T11:27:03.000-08:00
added more details to readme
diff --git a/README.md b/README.md
@@ -9,7 +9,8 @@ Use agentic AI on Runloop to create a (fake) tax preparation service.
 This demo illustrates the power of Agentic AI on the Runloop platform
 through a mock tax preperation service. We take you through
 converting a simple manual tax preparation website into an automated
-tax preparation service based on a Codex-based AI agent.
+tax preparation service based on a Codex-based AI agent. After setting up
+a simple baseline service that invokes agent on the Runloop platform, we then extend the example to show how to use Runloop to measure and improve agent performance.
 
 ### Development Stages
 
@@ -20,15 +21,14 @@ prepare the customer's W2.
 
 **Step 1: Incorporating Codex.** Add a simple tax preparation agent to
 automatically convert customer tax information into formatted 1040
-forms. The agent code runs safely on a Runloop Devbox.
+forms. The agent code runs safely and in isolation on a Runloop Devbox.
 
 **Step 2: Add Benchmarks.** The first version of our tax prep agent is
-overly simple and makes a lot of mistakes. We can use Runloop
-Benchmarks to measure how well it is doing.
+overly simple and makes a lot of mistakes. The first step in improving agent performance is to measure it and generate a score. Here we introduce Runloop Benchmarks to measure how well it is doing.
 
 **Step 3: Iterate on Agent Improvements.** With Benchmarks set up, we
 can try out different agent implementations to see which one works the
-best.
+best. We can perform quick experiments to measure performance after making changes to our prompt, model, and arguments to Codex.
 
 ## Quick Start
 
@@ -77,6 +77,8 @@ best.
 
    Then open your browser and go to the [http://localhost:3000/](demo site).
 
+   Setup for Step 0 is complete after this step.
+
 3. **(Optional) Enable Weave Tracing**
 
    To track and monitor LLM calls with Weights & Biases Weave:
@@ -105,8 +107,13 @@ best.
 
 4. **Set up Runloop resources**
 
-- run the setup script to add agents and a blueprint
--
+   Continue the demo by proceeding to Step 1: run the setup script to add agents and a blueprint. Stop the process running `pnpm dev` and then run the following command:
+
+   ```bash
+   pnpm step1_runloop_setup
+   ```
+
+   Then restart the local webserver with `pnpm dev`.
 
 5. **Start the local webserver**
    TODO: replace with pnpm run_demo
@@ -154,26 +161,113 @@ When you visit the [step 0](http://localhost:3000/step0) landing page, you see t
 We want to use a Codex-based agent to replace the manual conversion of
 the client's tax information into form 1040 values. For this step, we
 will utilize a simple tax prep agent running on a Runloop devbox. To
-set up the Runloop environment, run `pnpm step1_runloop_setup`. This
-command will do the following:
+set up the Runloop environment, run `pnpm step1_runloop_setup`. Under the hood, this
+command does the following:
 
-1. Upload the demo Agent as a Runloop object
-2. Create the Agent from the uploaded object
-3. Create a Runloop Blueprint with the agent mounted and required
-   packages installed. (This ensures that we can launch Devboxes with
-   this agent very quickly)
+1. Uploads the demo Agent as a Runloop object
+2. Creates the Agent from the uploaded object
+3. Creates a Runloop Blueprint with the agent mounted and required
+   packages installed. Creating a blueprint ensures that we can launch Devboxes using
+   this agent quickly
 
-After running the setup script, visit [step 1](http://localhost:3000/step1) to see this in action.
-For this step, the interaction above has changed:
+After running the setup script, restart the service with `pnpm dev` and visit [step 1](http://localhost:3000/step1) to see this in action.
 
-- A client uses the site to upload their tax information
-- The server starts a Devbox using the Blueprint for this agent.
+After running the script, the user flow is as follows:
+
+- A client wanting to file their taxes uses the site to upload their tax information
+- After hitting submit, the server starts a Devbox using the Blueprint for this agent.
 - The server uploads the tax info to the Devbox and runs uses an exec
   command to invoke the agent and produce the 1040 json values.
 - The server takes the form 1040 values output by the agent and
   prepares the 1040.pdf.
 
-TODO: add pointers to relevant code
+Note that the process of generating the 1040 values from input is now completely agent-driven and takes place on demand.
+
+**Key Code Snippets:**
+Step 1 largely reuses code from our original implementation: reading PDFs, parsing and rendering are untouched.
+
+- **API endpoint**:
+  You can walk through the streaming API route that orchestrates the entire processing flow:
+
+  ```84:238:packages/frontend/src/app/api/tax/process-step1-stream/route.ts
+  // Handles file upload, creates devbox, runs agent, generates PDF
+  // Returns Server-Sent Events for real-time progress updates
+  ```
+
+- **Devbox creation and agent execution**:
+  The API endpoint creates an instance of `TaxService` then calls `processTaxReturn`. In turn, this spins up the Runloop devbox, uploads files, and executes the agent:
+
+  ```70:178:packages/frontend/src/lib/tax-processing-service.ts
+  // Creates devbox from blueprint, uploads W-2 file and agent prompt,
+  // executes the agent via execAsync, and retrieves the JSON result
+  ```
+
+  Starting a devbox with the agent and wiring in our OpenAI secret is handled here:
+
+  ```ts
+  this.devbox = await this.runloop.devbox.create({
+    name: `tax-processing-${Date.now()}`,
+    // blueprint created by step1_runloop_setup.ts script
+    blueprint_id: blueprintId,
+    environment_variables: {
+      CODEX_SKIP_GIT_REPO_CHECK: 'true',
+      RUNLOOP_DEVBOX: '1',
+    },
+    // wire in the OpenAI key from the Runloop secret store
+    secrets: { OPENAI_API_KEY: 'OPENAI_API_KEY' },
+  });
+  ```
+
+  After the devbox has been started, we load tax processing instructions as a prompt to the agent here:
+
+  ```ts
+  await this.devbox.file.write({
+    file_path: '/home/user/agent-prompt.txt',
+    contents: agentPromptContent,
+  });
+  ```
+
+  Then the specific user's W2 tax information:
+
+  ```ts
+  // Upload W2 file (use upload() for binary files like PDFs)
+  logger.log(`Uploading ${w2Filename} to devbox...`);
+  await this.devbox.file.upload({
+    path: `/home/user/input/${w2Filename}`,
+    file: w2File,
+  });
+  ```
+
+- **Agent execution script**:
+  After setting up the evironment, we invoke the agent using a standalone script that runs on the devbox to process the W-2 and generate Form 1040 JSON:
+
+  ```11:97:packages/tax-processing/src/bin/run-agent-turn.ts
+  // Runs a single agent turn using CodexService to process W-2
+  // and write Form 1040 JSON output to the specified file
+  ```
+
+  Here the script uses a prompt to define the role and instruct the LLM to return output conforming to well defined JSON schemas. The prompt and the W2 information from the user are used to repeatedly call Cortex and stream the output. This is the core agent processing loop.
+
+  Rather than have the LLM perform calculations directly, we instead use the agent to process individual line items and return the results as JSON. This lets us use LLMs to do what they're best at while leveraging traditional code to perform the actual math and generate a PDF.
+
+  Importantly, since Runloop provides a secure isolated environment, Codex is allowed to run
+  with broad permissions: the burden of knowing what commands are safe to run in the execution environment is solved:
+
+  ```ts
+  // RUNLOOP_DEVBOX is set as env var during devbox startup
+  this.sandboxMode =
+    process.env.RUNLOOP_DEVBOX === '1'
+      ? 'danger-full-access'
+      : 'workspace-write';
+  ```
+
+- **PDF generation**:
+  After processing the input, the final step is to generate the 1040 PDF form:
+  ```29:50:packages/frontend/src/lib/pdf-generator.ts
+  // Loads IRS Form 1040 template, fills form fields with agent output,
+  // and saves the completed PDF to the output directory
+  ```
+  This code is the same as in Step 0.
 
 ### Step 2: Testing Tax Agent Performance
 
diff --git a/packages/frontend/src/lib/tax-processing-service.ts b/packages/frontend/src/lib/tax-processing-service.ts
@@ -78,11 +78,13 @@ export class TaxProcessingService {
       logger.log('Creating Devbox from tax blueprint...');
       this.devbox = await this.runloop.devbox.create({
         name: `tax-processing-${Date.now()}`,
+        // blueprint created by step1_runloop_setup.ts script
         blueprint_id: blueprintId,
         environment_variables: {
           CODEX_SKIP_GIT_REPO_CHECK: 'true',
           RUNLOOP_DEVBOX: '1',
         },
+        // wire in the OpenAI key from the Runloop secret store
         secrets: { OPENAI_API_KEY: 'OPENAI_API_KEY' },
       });