You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -9,7 +9,8 @@ Use agentic AI on Runloop to create a (fake) tax preparation service.
9
9
This demo illustrates the power of Agentic AI on the Runloop platform
10
10
through a mock tax preperation service. We take you through
11
11
converting a simple manual tax preparation website into an automated
12
-
tax preparation service based on a Codex-based AI agent.
12
+
tax preparation service based on a Codex-based AI agent. After setting up
13
+
a simple baseline service that invokes agent on the Runloop platform, we then extend the example to show how to use Runloop to measure and improve agent performance.
13
14
14
15
### Development Stages
15
16
@@ -20,15 +21,14 @@ prepare the customer's W2.
20
21
21
22
**Step 1: Incorporating Codex.** Add a simple tax preparation agent to
22
23
automatically convert customer tax information into formatted 1040
23
-
forms. The agent code runs safely on a Runloop Devbox.
24
+
forms. The agent code runs safely and in isolation on a Runloop Devbox.
24
25
25
26
**Step 2: Add Benchmarks.** The first version of our tax prep agent is
26
-
overly simple and makes a lot of mistakes. We can use Runloop
27
-
Benchmarks to measure how well it is doing.
27
+
overly simple and makes a lot of mistakes. The first step in improving agent performance is to measure it and generate a score. Here we introduce Runloop Benchmarks to measure how well it is doing.
28
28
29
29
**Step 3: Iterate on Agent Improvements.** With Benchmarks set up, we
30
30
can try out different agent implementations to see which one works the
31
-
best.
31
+
best. We can perform quick experiments to measure performance after making changes to our prompt, model, and arguments to Codex.
32
32
33
33
## Quick Start
34
34
@@ -77,6 +77,8 @@ best.
77
77
78
78
Then open your browser and go to the [http://localhost:3000/](demo site).
79
79
80
+
Setup for Step 0 is complete after this step.
81
+
80
82
3.**(Optional) Enable Weave Tracing**
81
83
82
84
To track and monitor LLM calls with Weights & Biases Weave:
@@ -105,8 +107,13 @@ best.
105
107
106
108
4.**Set up Runloop resources**
107
109
108
-
- run the setup script to add agents and a blueprint
109
-
-
110
+
Continue the demo by proceeding to Step 1: run the setup script to add agents and a blueprint. Stop the process running `pnpm dev` and then run the following command:
111
+
112
+
```bash
113
+
pnpm step1_runloop_setup
114
+
```
115
+
116
+
Then restart the local webserver with `pnpm dev`.
110
117
111
118
5.**Start the local webserver**
112
119
TODO: replace with pnpm run_demo
@@ -154,26 +161,113 @@ When you visit the [step 0](http://localhost:3000/step0) landing page, you see t
154
161
We want to use a Codex-based agent to replace the manual conversion of
155
162
the client's tax information into form 1040 values. For this step, we
156
163
will utilize a simple tax prep agent running on a Runloop devbox. To
157
-
set up the Runloop environment, run `pnpm step1_runloop_setup`. This
158
-
command will do the following:
164
+
set up the Runloop environment, run `pnpm step1_runloop_setup`. Under the hood, this
165
+
command does the following:
159
166
160
-
1.Upload the demo Agent as a Runloop object
161
-
2.Create the Agent from the uploaded object
162
-
3.Create a Runloop Blueprint with the agent mounted and required
163
-
packages installed. (This ensures that we can launch Devboxes with
164
-
this agent very quickly)
167
+
1.Uploads the demo Agent as a Runloop object
168
+
2.Creates the Agent from the uploaded object
169
+
3.Creates a Runloop Blueprint with the agent mounted and required
170
+
packages installed. Creating a blueprint ensures that we can launch Devboxes using
171
+
this agent quickly
165
172
166
-
After running the setup script, visit [step 1](http://localhost:3000/step1) to see this in action.
167
-
For this step, the interaction above has changed:
173
+
After running the setup script, restart the service with `pnpm dev` and visit [step 1](http://localhost:3000/step1) to see this in action.
168
174
169
-
- A client uses the site to upload their tax information
170
-
- The server starts a Devbox using the Blueprint for this agent.
175
+
After running the script, the user flow is as follows:
176
+
177
+
- A client wanting to file their taxes uses the site to upload their tax information
178
+
- After hitting submit, the server starts a Devbox using the Blueprint for this agent.
171
179
- The server uploads the tax info to the Devbox and runs uses an exec
172
180
command to invoke the agent and produce the 1040 json values.
173
181
- The server takes the form 1040 values output by the agent and
174
182
prepares the 1040.pdf.
175
183
176
-
TODO: add pointers to relevant code
184
+
Note that the process of generating the 1040 values from input is now completely agent-driven and takes place on demand.
185
+
186
+
**Key Code Snippets:**
187
+
Step 1 largely reuses code from our original implementation: reading PDFs, parsing and rendering are untouched.
188
+
189
+
-**API endpoint**:
190
+
You can walk through the streaming API route that orchestrates the entire processing flow:
// Handles file upload, creates devbox, runs agent, generates PDF
194
+
// Returns Server-Sent Events for real-time progress updates
195
+
```
196
+
197
+
-**Devbox creation and agent execution**:
198
+
The API endpoint creates an instance of `TaxService` then calls `processTaxReturn`. In turn, this spins up the Runloop devbox, uploads files, and executes the agent:
// Runs a single agent turn using CodexService to process W-2
246
+
// and write Form 1040 JSON output to the specified file
247
+
```
248
+
249
+
Here the script uses a prompt to define the role and instruct the LLM to return output conforming to well defined JSON schemas. The prompt and the W2 information from the user are used to repeatedly call Cortex and stream the output. This is the core agent processing loop.
250
+
251
+
Rather than have the LLM perform calculations directly, we instead use the agent to process individual line items and return the results as JSON. This lets us use LLMs to do what they're best at while leveraging traditional code to perform the actual math and generate a PDF.
252
+
253
+
Importantly, since Runloop provides a secure isolated environment, Codex is allowed to run
254
+
with broad permissions: the burden of knowing what commands are safe to run in the execution environment is solved:
255
+
256
+
```ts
257
+
// RUNLOOP_DEVBOX is set as env var during devbox startup
258
+
this.sandboxMode=
259
+
process.env.RUNLOOP_DEVBOX==='1'
260
+
?'danger-full-access'
261
+
:'workspace-write';
262
+
```
263
+
264
+
-**PDF generation**:
265
+
After processing the input, the final step is to generate the 1040 PDF form:
0 commit comments