Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
132 changes: 113 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ Use agentic AI on Runloop to create a (fake) tax preparation service.
This demo illustrates the power of Agentic AI on the Runloop platform
through a mock tax preperation service. We take you through
converting a simple manual tax preparation website into an automated
tax preparation service based on a Codex-based AI agent.
tax preparation service based on a Codex-based AI agent. After setting up
a simple baseline service that invokes agent on the Runloop platform, we then extend the example to show how to use Runloop to measure and improve agent performance.

### Development Stages

Expand All @@ -20,15 +21,14 @@ prepare the customer's W2.

**Step 1: Incorporating Codex.** Add a simple tax preparation agent to
automatically convert customer tax information into formatted 1040
forms. The agent code runs safely on a Runloop Devbox.
forms. The agent code runs safely and in isolation on a Runloop Devbox.

**Step 2: Add Benchmarks.** The first version of our tax prep agent is
overly simple and makes a lot of mistakes. We can use Runloop
Benchmarks to measure how well it is doing.
overly simple and makes a lot of mistakes. The first step in improving agent performance is to measure it and generate a score. Here we introduce Runloop Benchmarks to measure how well it is doing.

**Step 3: Iterate on Agent Improvements.** With Benchmarks set up, we
can try out different agent implementations to see which one works the
best.
best. We can perform quick experiments to measure performance after making changes to our prompt, model, and arguments to Codex.

## Quick Start

Expand Down Expand Up @@ -77,6 +77,8 @@ best.

Then open your browser and go to the [http://localhost:3000/](demo site).

Setup for Step 0 is complete after this step.

3. **(Optional) Enable Weave Tracing**

To track and monitor LLM calls with Weights & Biases Weave:
Expand Down Expand Up @@ -105,8 +107,13 @@ best.

4. **Set up Runloop resources**

- run the setup script to add agents and a blueprint
-
Continue the demo by proceeding to Step 1: run the setup script to add agents and a blueprint. Stop the process running `pnpm dev` and then run the following command:

```bash
pnpm step1_runloop_setup
```

Then restart the local webserver with `pnpm dev`.

5. **Start the local webserver**
TODO: replace with pnpm run_demo
Expand Down Expand Up @@ -154,26 +161,113 @@ When you visit the [step 0](http://localhost:3000/step0) landing page, you see t
We want to use a Codex-based agent to replace the manual conversion of
the client's tax information into form 1040 values. For this step, we
will utilize a simple tax prep agent running on a Runloop devbox. To
set up the Runloop environment, run `pnpm step1_runloop_setup`. This
command will do the following:
set up the Runloop environment, run `pnpm step1_runloop_setup`. Under the hood, this
command does the following:

1. Upload the demo Agent as a Runloop object
2. Create the Agent from the uploaded object
3. Create a Runloop Blueprint with the agent mounted and required
packages installed. (This ensures that we can launch Devboxes with
this agent very quickly)
1. Uploads the demo Agent as a Runloop object
2. Creates the Agent from the uploaded object
3. Creates a Runloop Blueprint with the agent mounted and required
packages installed. Creating a blueprint ensures that we can launch Devboxes using
this agent quickly

After running the setup script, visit [step 1](http://localhost:3000/step1) to see this in action.
For this step, the interaction above has changed:
After running the setup script, restart the service with `pnpm dev` and visit [step 1](http://localhost:3000/step1) to see this in action.

- A client uses the site to upload their tax information
- The server starts a Devbox using the Blueprint for this agent.
After running the script, the user flow is as follows:

- A client wanting to file their taxes uses the site to upload their tax information
- After hitting submit, the server starts a Devbox using the Blueprint for this agent.
- The server uploads the tax info to the Devbox and runs uses an exec
command to invoke the agent and produce the 1040 json values.
- The server takes the form 1040 values output by the agent and
prepares the 1040.pdf.

TODO: add pointers to relevant code
Note that the process of generating the 1040 values from input is now completely agent-driven and takes place on demand.

**Key Code Snippets:**
Step 1 largely reuses code from our original implementation: reading PDFs, parsing and rendering are untouched.

- **API endpoint**:
You can walk through the streaming API route that orchestrates the entire processing flow:

```84:238:packages/frontend/src/app/api/tax/process-step1-stream/route.ts
// Handles file upload, creates devbox, runs agent, generates PDF
// Returns Server-Sent Events for real-time progress updates
```

- **Devbox creation and agent execution**:
The API endpoint creates an instance of `TaxService` then calls `processTaxReturn`. In turn, this spins up the Runloop devbox, uploads files, and executes the agent:

```70:178:packages/frontend/src/lib/tax-processing-service.ts
// Creates devbox from blueprint, uploads W-2 file and agent prompt,
// executes the agent via execAsync, and retrieves the JSON result
```

Starting a devbox with the agent and wiring in our OpenAI secret is handled here:

```ts
this.devbox = await this.runloop.devbox.create({
name: `tax-processing-${Date.now()}`,
// blueprint created by step1_runloop_setup.ts script
blueprint_id: blueprintId,
environment_variables: {
CODEX_SKIP_GIT_REPO_CHECK: 'true',
RUNLOOP_DEVBOX: '1',
},
// wire in the OpenAI key from the Runloop secret store
secrets: { OPENAI_API_KEY: 'OPENAI_API_KEY' },
});
```

After the devbox has been started, we load tax processing instructions as a prompt to the agent here:

```ts
await this.devbox.file.write({
file_path: '/home/user/agent-prompt.txt',
contents: agentPromptContent,
});
```

Then the specific user's W2 tax information:

```ts
// Upload W2 file (use upload() for binary files like PDFs)
logger.log(`Uploading ${w2Filename} to devbox...`);
await this.devbox.file.upload({
path: `/home/user/input/${w2Filename}`,
file: w2File,
});
```

- **Agent execution script**:
After setting up the evironment, we invoke the agent using a standalone script that runs on the devbox to process the W-2 and generate Form 1040 JSON:

```11:97:packages/tax-processing/src/bin/run-agent-turn.ts
// Runs a single agent turn using CodexService to process W-2
// and write Form 1040 JSON output to the specified file
```

Here the script uses a prompt to define the role and instruct the LLM to return output conforming to well defined JSON schemas. The prompt and the W2 information from the user are used to repeatedly call Cortex and stream the output. This is the core agent processing loop.

Rather than have the LLM perform calculations directly, we instead use the agent to process individual line items and return the results as JSON. This lets us use LLMs to do what they're best at while leveraging traditional code to perform the actual math and generate a PDF.

Importantly, since Runloop provides a secure isolated environment, Codex is allowed to run
with broad permissions: the burden of knowing what commands are safe to run in the execution environment is solved:

```ts
// RUNLOOP_DEVBOX is set as env var during devbox startup
this.sandboxMode =
process.env.RUNLOOP_DEVBOX === '1'
? 'danger-full-access'
: 'workspace-write';
```

- **PDF generation**:
After processing the input, the final step is to generate the 1040 PDF form:
```29:50:packages/frontend/src/lib/pdf-generator.ts
// Loads IRS Form 1040 template, fills form fields with agent output,
// and saves the completed PDF to the output directory
```
This code is the same as in Step 0.

### Step 2: Testing Tax Agent Performance

Expand Down
2 changes: 2 additions & 0 deletions packages/frontend/src/lib/tax-processing-service.ts
Original file line number Diff line number Diff line change
Expand Up @@ -78,11 +78,13 @@ export class TaxProcessingService {
logger.log('Creating Devbox from tax blueprint...');
this.devbox = await this.runloop.devbox.create({
name: `tax-processing-${Date.now()}`,
// blueprint created by step1_runloop_setup.ts script
blueprint_id: blueprintId,
environment_variables: {
CODEX_SKIP_GIT_REPO_CHECK: 'true',
RUNLOOP_DEVBOX: '1',
},
// wire in the OpenAI key from the Runloop secret store
secrets: { OPENAI_API_KEY: 'OPENAI_API_KEY' },
});

Expand Down