Skip to content

Commit 0ac2b47

Browse files
authored
Merge pull request #12 from runloopai/james/step1
added more details to readme
2 parents 5ef2564 + 2cf7192 commit 0ac2b47

2 files changed

Lines changed: 115 additions & 19 deletions

File tree

README.md

Lines changed: 113 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,8 @@ Use agentic AI on Runloop to create a (fake) tax preparation service.
99
This demo illustrates the power of Agentic AI on the Runloop platform
1010
through a mock tax preperation service. We take you through
1111
converting a simple manual tax preparation website into an automated
12-
tax preparation service based on a Codex-based AI agent.
12+
tax preparation service based on a Codex-based AI agent. After setting up
13+
a simple baseline service that invokes agent on the Runloop platform, we then extend the example to show how to use Runloop to measure and improve agent performance.
1314

1415
### Development Stages
1516

@@ -20,15 +21,14 @@ prepare the customer's W2.
2021

2122
**Step 1: Incorporating Codex.** Add a simple tax preparation agent to
2223
automatically convert customer tax information into formatted 1040
23-
forms. The agent code runs safely on a Runloop Devbox.
24+
forms. The agent code runs safely and in isolation on a Runloop Devbox.
2425

2526
**Step 2: Add Benchmarks.** The first version of our tax prep agent is
26-
overly simple and makes a lot of mistakes. We can use Runloop
27-
Benchmarks to measure how well it is doing.
27+
overly simple and makes a lot of mistakes. The first step in improving agent performance is to measure it and generate a score. Here we introduce Runloop Benchmarks to measure how well it is doing.
2828

2929
**Step 3: Iterate on Agent Improvements.** With Benchmarks set up, we
3030
can try out different agent implementations to see which one works the
31-
best.
31+
best. We can perform quick experiments to measure performance after making changes to our prompt, model, and arguments to Codex.
3232

3333
## Quick Start
3434

@@ -77,6 +77,8 @@ best.
7777

7878
Then open your browser and go to the [http://localhost:3000/](demo site).
7979

80+
Setup for Step 0 is complete after this step.
81+
8082
3. **(Optional) Enable Weave Tracing**
8183

8284
To track and monitor LLM calls with Weights & Biases Weave:
@@ -105,8 +107,13 @@ best.
105107

106108
4. **Set up Runloop resources**
107109

108-
- run the setup script to add agents and a blueprint
109-
-
110+
Continue the demo by proceeding to Step 1: run the setup script to add agents and a blueprint. Stop the process running `pnpm dev` and then run the following command:
111+
112+
```bash
113+
pnpm step1_runloop_setup
114+
```
115+
116+
Then restart the local webserver with `pnpm dev`.
110117

111118
5. **Start the local webserver**
112119
TODO: replace with pnpm run_demo
@@ -154,26 +161,113 @@ When you visit the [step 0](http://localhost:3000/step0) landing page, you see t
154161
We want to use a Codex-based agent to replace the manual conversion of
155162
the client's tax information into form 1040 values. For this step, we
156163
will utilize a simple tax prep agent running on a Runloop devbox. To
157-
set up the Runloop environment, run `pnpm step1_runloop_setup`. This
158-
command will do the following:
164+
set up the Runloop environment, run `pnpm step1_runloop_setup`. Under the hood, this
165+
command does the following:
159166

160-
1. Upload the demo Agent as a Runloop object
161-
2. Create the Agent from the uploaded object
162-
3. Create a Runloop Blueprint with the agent mounted and required
163-
packages installed. (This ensures that we can launch Devboxes with
164-
this agent very quickly)
167+
1. Uploads the demo Agent as a Runloop object
168+
2. Creates the Agent from the uploaded object
169+
3. Creates a Runloop Blueprint with the agent mounted and required
170+
packages installed. Creating a blueprint ensures that we can launch Devboxes using
171+
this agent quickly
165172

166-
After running the setup script, visit [step 1](http://localhost:3000/step1) to see this in action.
167-
For this step, the interaction above has changed:
173+
After running the setup script, restart the service with `pnpm dev` and visit [step 1](http://localhost:3000/step1) to see this in action.
168174

169-
- A client uses the site to upload their tax information
170-
- The server starts a Devbox using the Blueprint for this agent.
175+
After running the script, the user flow is as follows:
176+
177+
- A client wanting to file their taxes uses the site to upload their tax information
178+
- After hitting submit, the server starts a Devbox using the Blueprint for this agent.
171179
- The server uploads the tax info to the Devbox and runs uses an exec
172180
command to invoke the agent and produce the 1040 json values.
173181
- The server takes the form 1040 values output by the agent and
174182
prepares the 1040.pdf.
175183

176-
TODO: add pointers to relevant code
184+
Note that the process of generating the 1040 values from input is now completely agent-driven and takes place on demand.
185+
186+
**Key Code Snippets:**
187+
Step 1 largely reuses code from our original implementation: reading PDFs, parsing and rendering are untouched.
188+
189+
- **API endpoint**:
190+
You can walk through the streaming API route that orchestrates the entire processing flow:
191+
192+
```84:238:packages/frontend/src/app/api/tax/process-step1-stream/route.ts
193+
// Handles file upload, creates devbox, runs agent, generates PDF
194+
// Returns Server-Sent Events for real-time progress updates
195+
```
196+
197+
- **Devbox creation and agent execution**:
198+
The API endpoint creates an instance of `TaxService` then calls `processTaxReturn`. In turn, this spins up the Runloop devbox, uploads files, and executes the agent:
199+
200+
```70:178:packages/frontend/src/lib/tax-processing-service.ts
201+
// Creates devbox from blueprint, uploads W-2 file and agent prompt,
202+
// executes the agent via execAsync, and retrieves the JSON result
203+
```
204+
205+
Starting a devbox with the agent and wiring in our OpenAI secret is handled here:
206+
207+
```ts
208+
this.devbox = await this.runloop.devbox.create({
209+
name: `tax-processing-${Date.now()}`,
210+
// blueprint created by step1_runloop_setup.ts script
211+
blueprint_id: blueprintId,
212+
environment_variables: {
213+
CODEX_SKIP_GIT_REPO_CHECK: 'true',
214+
RUNLOOP_DEVBOX: '1',
215+
},
216+
// wire in the OpenAI key from the Runloop secret store
217+
secrets: { OPENAI_API_KEY: 'OPENAI_API_KEY' },
218+
});
219+
```
220+
221+
After the devbox has been started, we load tax processing instructions as a prompt to the agent here:
222+
223+
```ts
224+
await this.devbox.file.write({
225+
file_path: '/home/user/agent-prompt.txt',
226+
contents: agentPromptContent,
227+
});
228+
```
229+
230+
Then the specific user's W2 tax information:
231+
232+
```ts
233+
// Upload W2 file (use upload() for binary files like PDFs)
234+
logger.log(`Uploading ${w2Filename} to devbox...`);
235+
await this.devbox.file.upload({
236+
path: `/home/user/input/${w2Filename}`,
237+
file: w2File,
238+
});
239+
```
240+
241+
- **Agent execution script**:
242+
After setting up the evironment, we invoke the agent using a standalone script that runs on the devbox to process the W-2 and generate Form 1040 JSON:
243+
244+
```11:97:packages/tax-processing/src/bin/run-agent-turn.ts
245+
// Runs a single agent turn using CodexService to process W-2
246+
// and write Form 1040 JSON output to the specified file
247+
```
248+
249+
Here the script uses a prompt to define the role and instruct the LLM to return output conforming to well defined JSON schemas. The prompt and the W2 information from the user are used to repeatedly call Cortex and stream the output. This is the core agent processing loop.
250+
251+
Rather than have the LLM perform calculations directly, we instead use the agent to process individual line items and return the results as JSON. This lets us use LLMs to do what they're best at while leveraging traditional code to perform the actual math and generate a PDF.
252+
253+
Importantly, since Runloop provides a secure isolated environment, Codex is allowed to run
254+
with broad permissions: the burden of knowing what commands are safe to run in the execution environment is solved:
255+
256+
```ts
257+
// RUNLOOP_DEVBOX is set as env var during devbox startup
258+
this.sandboxMode =
259+
process.env.RUNLOOP_DEVBOX === '1'
260+
? 'danger-full-access'
261+
: 'workspace-write';
262+
```
263+
264+
- **PDF generation**:
265+
After processing the input, the final step is to generate the 1040 PDF form:
266+
```29:50:packages/frontend/src/lib/pdf-generator.ts
267+
// Loads IRS Form 1040 template, fills form fields with agent output,
268+
// and saves the completed PDF to the output directory
269+
```
270+
This code is the same as in Step 0.
177271

178272
### Step 2: Testing Tax Agent Performance
179273

packages/frontend/src/lib/tax-processing-service.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,11 +78,13 @@ export class TaxProcessingService {
7878
logger.log('Creating Devbox from tax blueprint...');
7979
this.devbox = await this.runloop.devbox.create({
8080
name: `tax-processing-${Date.now()}`,
81+
// blueprint created by step1_runloop_setup.ts script
8182
blueprint_id: blueprintId,
8283
environment_variables: {
8384
CODEX_SKIP_GIT_REPO_CHECK: 'true',
8485
RUNLOOP_DEVBOX: '1',
8586
},
87+
// wire in the OpenAI key from the Runloop secret store
8688
secrets: { OPENAI_API_KEY: 'OPENAI_API_KEY' },
8789
});
8890

0 commit comments

Comments
 (0)