|
| 1 | +# Voice Agent with Real-time Web Form Filling |
| 2 | + |
| 3 | +This project demonstrates an advanced voice agent that conducts phone questionnaires while automatically filling out web forms in real-time using Stagehand browser automation. |
| 4 | + |
| 5 | +Here's what the system architecture looks like: |
| 6 | + |
| 7 | + |
| 8 | + |
| 9 | +## Features |
| 10 | + |
| 11 | +- **Voice Conversations**: Natural voice interactions using Cartesia Line |
| 12 | +- **Real-time Form Filling**: Automatically fills web forms as answers are collected |
| 13 | +- **Browser Automation**: Uses Stagehand AI to interact with any web form |
| 14 | +- **Intelligent Mapping**: AI-powered mapping of voice answers to form fields |
| 15 | +- **Async Processing**: Non-blocking form filling maintains conversation flow - form fields are filled in background tasks without delaying voice responses |
| 16 | +- **Auto-submission**: Submits forms automatically when complete |
| 17 | + |
| 18 | +## Architecture |
| 19 | + |
| 20 | +``` |
| 21 | +Voice Call (Cartesia) → Form Filling Node → Records Answer |
| 22 | + ↓ |
| 23 | + Stagehand Browser API |
| 24 | + ↓ |
| 25 | + Fills Web Form Field |
| 26 | + ↓ |
| 27 | + Continues Conversation |
| 28 | + ↓ |
| 29 | + Submits Form on Completion |
| 30 | +``` |
| 31 | + |
| 32 | +## Getting Started |
| 33 | + |
| 34 | +First things first, here is what you will need: |
| 35 | +- A [Cartesia](https://play.cartesia.ai/agents) account and API key |
| 36 | +- A [Gemini API Key](https://aistudio.google.com/apikey) |
| 37 | +- A [Browserbase API Key and Project ID](https://www.browserbase.com/overview) |
| 38 | + |
| 39 | +Make sure to add the API keys in your `.env` file or to the API keys section in your Cartesia account. |
| 40 | + |
| 41 | +- Required packages: |
| 42 | + ```bash |
| 43 | + cartesia-line |
| 44 | + stagehand>=0.5.4 |
| 45 | + google-genai>=1.26.0 |
| 46 | + python-dotenv>=1.0.0 |
| 47 | + PyYAML>=6.0.0 |
| 48 | + loguru>=0.7.0 |
| 49 | + aiohttp>=3.12.0 |
| 50 | + pydantic>=2.0.0 |
| 51 | + ``` |
| 52 | + |
| 53 | +## Setup |
| 54 | + |
| 55 | +1. Install dependencies: |
| 56 | +```bash |
| 57 | +pip install -r requirements.txt |
| 58 | +``` |
| 59 | + |
| 60 | +2. Set up environment variables - create a `.env` file: |
| 61 | +```bash |
| 62 | +GEMINI_API_KEY=your_gemini_api_key_here |
| 63 | +BROWSERBASE_API_KEY=your_browserbase_api_key_here |
| 64 | +BROWSERBASE_PROJECT_ID=your_browserbase_project_id_here |
| 65 | +``` |
| 66 | + |
| 67 | +3. Run the agent: |
| 68 | +```bash |
| 69 | +python main.py |
| 70 | +``` |
| 71 | + |
| 72 | +## Project Structure |
| 73 | + |
| 74 | +### `main.py` |
| 75 | +Entry point for the voice agent. Handles call initialization with `VoiceAgentApp` class and orchestrates the conversation flow with form filling integration. |
| 76 | + |
| 77 | +### `form_filling_node.py` |
| 78 | +ReasoningNode subclass customized for voice-optimized form filling. Integrates Stagehand browser automation and manages async form filling during conversation without blocking the voice flow. Provides status updates and error handling. |
| 79 | + |
| 80 | +### `stagehand_form_filler.py` |
| 81 | +Browser automation manager that handles all web interactions. Opens and controls web forms, maps conversation data to form fields using AI, transforms voice answers to form-compatible formats, and handles form submission. Supports different field types (text, select, checkbox, etc.). |
| 82 | + |
| 83 | +### `config.py` |
| 84 | +System configuration file including system prompts, model IDs, and temperature |
| 85 | + |
| 86 | +### `config.toml` |
| 87 | +Your Cartesia Line agent id. |
| 88 | + |
| 89 | +## Configuration |
| 90 | + |
| 91 | +The system can be configured through multiple files: |
| 92 | + |
| 93 | +- **`config.py`**: System prompts, model IDs (Gemini model selection), hyperparameters, and boolean flags for features |
| 94 | +- **`config.toml`** / **YAML files**: Questionnaire structure and questions flow |
| 95 | +- **`cartesia.toml`**: Deployment configuration for Cartesia platform (installs dependencies and runs the script) |
| 96 | +- **Variables**: |
| 97 | + - `FORM_URL`: Target web form to fill |
| 98 | + |
| 99 | +## Example Flow |
| 100 | + |
| 101 | +1. User calls the voice agent |
| 102 | +2. Agent asks: "What type of voice agent are you building?" |
| 103 | +3. User responds: "A customer service agent" |
| 104 | +4. System: |
| 105 | + - Records the answer |
| 106 | + - Opens browser to form (if not already open) |
| 107 | + - Fills "Customer Service" in the role selection field |
| 108 | + - Takes screenshot for debugging |
| 109 | +5. Agent asks next question |
| 110 | +6. Process continues until all questions answered |
| 111 | +7. Form is automatically submitted |
| 112 | + |
| 113 | +## Advanced Features |
| 114 | + |
| 115 | +- **Background Processing**: Form filling happens asynchronously using background tasks - conversation remains smooth and responsive |
| 116 | +- **Error Recovery**: Continues conversation even if form filling fails |
| 117 | +- **Progress Tracking**: Monitor form completion status |
| 118 | +- **Screenshot Debugging**: Captures screenshots after each field |
| 119 | +- **Flexible Mapping**: AI interprets answers for different field types |
| 120 | + |
| 121 | +## Deploying the Agent |
| 122 | + |
| 123 | +The `cartesia.toml` file defines how your agent will be installed and run when deployed on the Cartesia platform. This file tells the platform to install dependencies from `requirements.txt` and execute `main.py`. |
| 124 | + |
| 125 | +You can clone this repository and add it to your [agents dashboard](https://play.cartesia.ai/agents) along with your API Keys (set them in the Cartesia Platform's API keys section). |
| 126 | + |
| 127 | +For detailed deployment instructions, see [how to deploy an agent from the Cartesia Docs](https://docs.cartesia.ai/line/start-building/talk-to-your-first-agent). |
| 128 | + |
| 129 | +## Testing |
| 130 | + |
| 131 | +Test with different scenarios: |
| 132 | +- Complete questionnaire flow |
| 133 | +- Interruptions and corrections |
| 134 | +- Various answer formats |
| 135 | +- Multi-page forms |
| 136 | +- Form validation errors |
| 137 | + |
| 138 | +## Production Considerations |
| 139 | + |
| 140 | +- Configure proper error logging |
| 141 | +- Add retry logic for form submission |
| 142 | +- Implement form validation checks |
| 143 | +- Consider rate limiting for API calls |
0 commit comments