Use agentic AI on Runloop to create a (fake) tax preparation service.
Full project docs live in docs/README.md.
This demo illustrates the power of Agentic AI on the Runloop platform through a mock tax preperation service. We take you through converting a simple manual tax preparation website into an automated tax preparation service based on a Codex-based AI agent. After setting up a simple baseline service that invokes agent on the Runloop platform, we then extend the example to show how to use Runloop to measure and improve agent performance.
Step 0: Manual Tax Website. We start with an existing tax preparation website, which accepts W2s and allows the customer to answer a few questions. A human tax preparer would then take this and prepare the customer's W2.
Step 1: Incorporating Codex. Add a simple tax preparation agent to automatically convert customer tax information into formatted 1040 forms. The agent code runs safely and in isolation on a Runloop Devbox.
Step 2: Add Benchmarks. The first version of our tax prep agent is overly simple and makes a lot of mistakes. The first step in improving agent performance is to measure it and generate a score. Here we introduce Runloop Benchmarks to measure how well it is doing.
Step 3: Iterate on Agent Improvements. With Benchmarks set up, we can try out different agent implementations to see which one works the best. We can perform quick experiments to measure performance after making changes to our prompt, model, and arguments to Codex.
- Node.js 24+
- pnpm 8+
- OpenAI API key (if required by Codex SDK)
- Runloop account and API key (as described in the Runloop Quickstart)
- (Optional) Weights & Biases account for LLM tracing with Weave
-
Install dependencies
pnpm install
-
Configure your environment
Set up your API keys in your local environment and configure a .env file.
First, visit the https://platform.runloop.ai/settings page on Runloop.
a. Create a Runloop API key in the https://platform.runloop.ai/settings page.
b. Generate an OpenAI API key using the [https://platform.openai.com/settings/organization/api-keys](OpenAI console).
c. Create a Secret for the OpenAI key you just generated in the https://platform.runloop.ai/settings page. Name the secret name
OPENAI_API_KEYand paste the key value from the OpenAI site.d. (Optional) Create a Secret for W&B API key for Weave LLM tracing. Get your W&B API key from https://wandb.ai/authorize, then create a secret named
WANDB_API_KEYin the https://platform.runloop.ai/settings page. This enables Weave tracing on Runloop devboxes. Note: Thepnpm step1_runloop_setupscript will also prompt you for this.e. Now configure your environment:
export RUNLOOP_API_KEY=<your_runloop_api_key_here> cp packages/tax-processing/.env.example packages/tax-processing/.env
Open the
.envfile and update values where prompted.f. Launch the environment to test the setup:
pnpm dev
Then open your browser and go to the [http://localhost:3000/](demo site): http://localhost:3000/
Setup for Step 0 (baseline site with manual processing) is complete after this step.
-
(Optional) Enable Weave Tracing
The tax agent integrates with Weights & Biases Weave for comprehensive LLM call tracing and monitoring.
For Local Development:
- Get your W&B API key from https://wandb.ai/authorize
- Add the API key to
packages/tax-processing/.env:WANDB_API_KEY=your-wandb-api-key-here
- When the agent server starts locally, you'll see:
- Success:
[CodexService] Weave tracing initialized successfully - Disabled:
[CodexService] Weave tracing disabled: WANDB_API_KEY not set
- Success:
For Runloop Devboxes:
- The
WANDB_API_KEYmust be configured as a Runloop secret (see step 2d above) - The
pnpm step1_runloop_setupscript will prompt you to add this secret - Weave automatically initializes when the agent runs on a devbox if the secret is present
Viewing Traces:
- Navigate to https://wandb.ai/
- Select project:
tax-preparation-agent - Explore traces, latency distributions, and token usage
Weave automatically captures:
- All OpenAI API calls made through the Codex SDK
- Complete input prompts and output responses
- Token usage, latency, and cost metrics
- Error traces and debugging information
- Agent tool usage (taxctl commands)
Note: Weave is completely optional. The system works without it.
-
Set up Runloop resources
Continue the demo by proceeding to Step 1: run the setup script to add agents and a blueprint. Stop the process running
pnpm devand then run the following command:pnpm step1_runloop_setup
Then restart the local webserver with
pnpm dev.Now check out the demo dashboard again for step 1 to see it in action.
MIT