Skip to content

Commit d38e36b

Browse files
authored
[integration]: first version of cartesia line form fill integration (#25)
* first version of cartesia line form fill integration * remove agent toml * make browser fill actions async * in-prog async browser creation and form fill * adding cartesia deploy to docs + toml file * update integration code to be simplified * update readme for line + config * update to meet google code quality standards * ruff + cartesia line formatting * remove headless options, readme nits, adding image
1 parent 25753b7 commit d38e36b

File tree

11 files changed

+1080
-0
lines changed

11 files changed

+1080
-0
lines changed
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Gemini API Key for language model (default)
2+
GEMINI_API_KEY=your_gemini_api_key_here
3+
4+
# Browserbase API key and Project ID
5+
BROWSERBASE_API_KEY=your_browserbase_api_key_here
6+
BROWSERBASE_PROJECT_ID=your_browserbase_project_id_here
7+
8+
# Optional: Model configuration
9+
# MODEL_NAME=google/gemini-2.0-flash-exp
10+
# MODEL_API_KEY=your_model_api_key_here
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*.pyd
5+
.Python
6+
7+
# Virtual environments
8+
.env
9+
.venv/
10+
venv/
11+
env/
12+
13+
virtualenv/
14+
15+
# Conda environments
16+
conda-env/
17+
envs/
18+
.conda/
19+
conda-meta/
20+
21+
# uv environments (in addition to uv.lock at top)
22+
uv.lock
23+
.python-version
24+
25+
# Python package managers
26+
poetry.lock
27+
Pipfile.lock
28+
pip-log.txt
29+
30+
# pyenv
31+
.pyenv/
32+
33+
# Distribution / packaging
34+
*.egg-info/
35+
dist/
36+
build/
37+
38+
# Editor / OS files
39+
.DS_Store
40+
41+
.cartesia/
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# Voice Agent with Real-time Web Form Filling
2+
3+
This project demonstrates an advanced voice agent that conducts phone questionnaires while automatically filling out web forms in real-time using Stagehand browser automation.
4+
5+
Here's what the system architecture looks like:
6+
7+
![Workflow](workflow_diagram.png)
8+
9+
## Features
10+
11+
- **Voice Conversations**: Natural voice interactions using Cartesia Line
12+
- **Real-time Form Filling**: Automatically fills web forms as answers are collected
13+
- **Browser Automation**: Uses Stagehand AI to interact with any web form
14+
- **Intelligent Mapping**: AI-powered mapping of voice answers to form fields
15+
- **Async Processing**: Non-blocking form filling maintains conversation flow - form fields are filled in background tasks without delaying voice responses
16+
- **Auto-submission**: Submits forms automatically when complete
17+
18+
## Architecture
19+
20+
```
21+
Voice Call (Cartesia) → Form Filling Node → Records Answer
22+
23+
Stagehand Browser API
24+
25+
Fills Web Form Field
26+
27+
Continues Conversation
28+
29+
Submits Form on Completion
30+
```
31+
32+
## Getting Started
33+
34+
First things first, here is what you will need:
35+
- A [Cartesia](https://play.cartesia.ai/agents) account and API key
36+
- A [Gemini API Key](https://aistudio.google.com/apikey)
37+
- A [Browserbase API Key and Project ID](https://www.browserbase.com/overview)
38+
39+
Make sure to add the API keys in your `.env` file or to the API keys section in your Cartesia account.
40+
41+
- Required packages:
42+
```bash
43+
cartesia-line
44+
stagehand>=0.5.4
45+
google-genai>=1.26.0
46+
python-dotenv>=1.0.0
47+
PyYAML>=6.0.0
48+
loguru>=0.7.0
49+
aiohttp>=3.12.0
50+
pydantic>=2.0.0
51+
```
52+
53+
## Setup
54+
55+
1. Install dependencies:
56+
```bash
57+
pip install -r requirements.txt
58+
```
59+
60+
2. Set up environment variables - create a `.env` file:
61+
```bash
62+
GEMINI_API_KEY=your_gemini_api_key_here
63+
BROWSERBASE_API_KEY=your_browserbase_api_key_here
64+
BROWSERBASE_PROJECT_ID=your_browserbase_project_id_here
65+
```
66+
67+
3. Run the agent:
68+
```bash
69+
python main.py
70+
```
71+
72+
## Project Structure
73+
74+
### `main.py`
75+
Entry point for the voice agent. Handles call initialization with `VoiceAgentApp` class and orchestrates the conversation flow with form filling integration.
76+
77+
### `form_filling_node.py`
78+
ReasoningNode subclass customized for voice-optimized form filling. Integrates Stagehand browser automation and manages async form filling during conversation without blocking the voice flow. Provides status updates and error handling.
79+
80+
### `stagehand_form_filler.py`
81+
Browser automation manager that handles all web interactions. Opens and controls web forms, maps conversation data to form fields using AI, transforms voice answers to form-compatible formats, and handles form submission. Supports different field types (text, select, checkbox, etc.).
82+
83+
### `config.py`
84+
System configuration file including system prompts, model IDs, and temperature
85+
86+
### `config.toml`
87+
Your Cartesia Line agent id.
88+
89+
## Configuration
90+
91+
The system can be configured through multiple files:
92+
93+
- **`config.py`**: System prompts, model IDs (Gemini model selection), hyperparameters, and boolean flags for features
94+
- **`config.toml`** / **YAML files**: Questionnaire structure and questions flow
95+
- **`cartesia.toml`**: Deployment configuration for Cartesia platform (installs dependencies and runs the script)
96+
- **Variables**:
97+
- `FORM_URL`: Target web form to fill
98+
99+
## Example Flow
100+
101+
1. User calls the voice agent
102+
2. Agent asks: "What type of voice agent are you building?"
103+
3. User responds: "A customer service agent"
104+
4. System:
105+
- Records the answer
106+
- Opens browser to form (if not already open)
107+
- Fills "Customer Service" in the role selection field
108+
- Takes screenshot for debugging
109+
5. Agent asks next question
110+
6. Process continues until all questions answered
111+
7. Form is automatically submitted
112+
113+
## Advanced Features
114+
115+
- **Background Processing**: Form filling happens asynchronously using background tasks - conversation remains smooth and responsive
116+
- **Error Recovery**: Continues conversation even if form filling fails
117+
- **Progress Tracking**: Monitor form completion status
118+
- **Screenshot Debugging**: Captures screenshots after each field
119+
- **Flexible Mapping**: AI interprets answers for different field types
120+
121+
## Deploying the Agent
122+
123+
The `cartesia.toml` file defines how your agent will be installed and run when deployed on the Cartesia platform. This file tells the platform to install dependencies from `requirements.txt` and execute `main.py`.
124+
125+
You can clone this repository and add it to your [agents dashboard](https://play.cartesia.ai/agents) along with your API Keys (set them in the Cartesia Platform's API keys section).
126+
127+
For detailed deployment instructions, see [how to deploy an agent from the Cartesia Docs](https://docs.cartesia.ai/line/start-building/talk-to-your-first-agent).
128+
129+
## Testing
130+
131+
Test with different scenarios:
132+
- Complete questionnaire flow
133+
- Interruptions and corrections
134+
- Various answer formats
135+
- Multi-page forms
136+
- Form validation errors
137+
138+
## Production Considerations
139+
140+
- Configure proper error logging
141+
- Add retry logic for form submission
142+
- Implement form validation checks
143+
- Consider rate limiting for API calls
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[app]
2+
name = "form-filling"
3+
4+
[build]
5+
cmd = "pip install -r requirements.txt"
6+
7+
[run]
8+
cmd = "python main.py"
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
"""Configuration settings for the voice agent.
2+
3+
This module contains system prompts, model configurations, and
4+
hyperparameters for the Cartesia voice agent with form filling.
5+
"""
6+
7+
import os
8+
9+
DEFAULT_MODEL_ID = os.getenv("MODEL_ID", "gemini-2.5-flash")
10+
11+
DEFAULT_TEMPERATURE = 0.7
12+
SYSTEM_PROMPT = """
13+
### You and your role
14+
You are a friendly assistant conducting a questionnaire.
15+
Be professional but conversational. Confirm answers when appropriate.
16+
If a user's answer is unclear, ask for clarification.
17+
For sensitive information, be especially tactful and professional.
18+
19+
IMPORTANT: When you receive a clear answer from the user, use the
20+
record_answer tool to record their response.
21+
22+
### Your tone
23+
When having a conversation, you should:
24+
- Always polite and respectful, even when users are challenging
25+
- Concise and brief but never curt. Keep your responses to 1-2
26+
sentences and less than 35 words
27+
- When asking a question, be sure to ask in a short and concise manner
28+
- Only ask one question at a time
29+
30+
If the user is rude, or curses, respond with exceptional politeness
31+
and genuine curiosity. You should always be polite.
32+
33+
Remember, you're on the phone, so do not use emojis or abbreviations.
34+
Spell out units and dates.
35+
"""
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
agent-id = 'your-agent-id'

0 commit comments

Comments
 (0)