The application now supports text-to-speech functionality using ElevenLabs API. AI responses are automatically converted to speech and sent back to the client for playback.
Create a .env file in the root directory (/news_app/.env) with your ElevenLabs API key:
# OpenAI API Configuration (existing)
OPENAI_API_KEY=your_openai_api_key_here
# ElevenLabs API Configuration (new)
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here
ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAMGet your ElevenLabs API key from: https://elevenlabs.io/app/settings/api-keys
The default voice ID (21m00Tcm4TlvDq8ikWAM) is Rachel's voice. You can:
- Use any ElevenLabs voice ID by setting
ELEVENLABS_VOICE_ID - Browse available voices at: https://elevenlabs.io/app/voice-library
- Leave it unset to use the default Rachel voice
# Terminal 1: Start LangGraph server
cd apps/agents && pnpm dev
# Terminal 2: Start web app
cd apps/web && pnpm dev- User Input → Speech-to-Text (if voice) or direct to AI
- AI Processing → Simple LangChain model generates response
- TTS Processing → AI response converted to speech via ElevenLabs
- Client Playback → Audio sent to frontend for playback
__start__ → speech_to_text → ai_node → text_to_speech → __end__
speech_to_text: Converts voice input to text using OpenAI Whisper (passes through if no audio)ai_node: Processes input and generates AI response using LangChain (no tools)text_to_speech: Converts AI response to speech using ElevenLabs
This is now a completely linear flow where every input goes through all three nodes:
- All inputs go to
speech_to_textfirst (audio gets transcribed, text passes through) - All inputs then go to
ai_nodefor AI processing - All AI responses go to
text_to_speechfor audio generation
- All AI responses are automatically converted to speech
- Local file saving - TTS outputs are saved to
apps/agents/tts-outputs/for testing - Error handling - TTS failures don't break the conversation
- Base64 encoding for efficient transmission
- MP3 format for broad compatibility
- TTS audio files are automatically saved to
apps/agents/tts-outputs/ - Files are named with timestamps:
tts-output-2024-01-15T10-30-45-123Z.mp3 - You can play these files directly to test TTS quality
- Files are saved even if frontend playback fails
- Stability: 0.5 (balance between consistency and expressiveness)
- Similarity Boost: 0.5 (voice similarity to original)
- Style: 0.0 (neutral style)
- Speaker Boost: true (enhanced clarity)
- eleven_monolingual_v1 - Optimized for English, good quality/speed balance
- Can be changed in
speech-processor.tsif needed
The console will show:
🔑 ElevenLabs API key found, length: 32
If you see:
🔑 ELEVENLABS_API_KEY not found in environment variables
Then your .env file isn't being loaded correctly.
Successful processing logs:
🔊 TTS Debug - Messages in state: {
totalMessages: 2,
lastMessageType: "AIMessage",
lastMessageContent: "string",
lastMessagePreview: "Hello! How can I help you today?..."
}
🔊 Converting text to speech: {
textLength: 45,
textPreview: "Hello! How can I help you today?..."
}
🎵 Audio generated successfully
✅ Audio buffer created, size: 15234 bytes
💾 TTS audio saved locally: /path/to/tts-outputs/tts-output-2024-01-15T10-30-45-123Z.mp3
🔊 TTS processing completed successfully
🚨 Problem Signs:
🔊 No AI message to convert to speech - lastMessage: {
exists: true,
type: "HumanMessage",
isAIMessage: false
}
This means the AI node isn't generating AI messages properly.
If you see "No AI message to convert to speech", check:
- AI node is working: Look for successful AI responses in logs
- Message types: The debug logs show what type of message is being passed
- Message flow: Ensure the linear flow is working: speech_to_text → ai_node → text_to_speech
The frontend should receive TTS output in the state:
{
ttsOutput: {
audioData: "base64_encoded_audio_data",
mimeType: "audio/mpeg",
size: 15234
}
}Cause: Missing or incorrect API key Solutions:
- Check .env file exists in root directory
- Verify API key is correct (32 characters)
- Restart the server after adding the key
- Check API key permissions on ElevenLabs dashboard
Cause: Invalid voice ID Solutions:
- Use default voice by removing
ELEVENLABS_VOICE_IDfrom .env - Check voice ID on ElevenLabs voice library
- Verify voice access (some voices require subscription)
Cause: API quota exceeded or network issues Solutions:
- Check ElevenLabs quota on your dashboard
- Verify internet connection
- Check console logs for detailed error messages
- ✅ Environment variables loaded correctly
- ✅ ElevenLabs API key valid
- ✅ AI response triggers TTS processing
- ✅ Audio buffer generated successfully
- ✅ Base64 encoding completed
- ✅ TTS output received in state
- ✅ Audio playback initiated
- ✅ User hears AI response
- ✅ Error handling for TTS failures
Edit speech-processor.ts to customize:
voiceSettings: {
stability: 0.7, // 0-1, higher = more consistent
similarityBoost: 0.8, // 0-1, higher = more similar to original
style: 0.2, // 0-1, higher = more expressive
useSpeakerBoost: true // Enhanced clarity
}Available models:
eleven_monolingual_v1- English only, fasteleven_multilingual_v2- Multiple languages, slowereleven_flash_v2_5- Ultra-fast, lower quality
Popular voice IDs:
21m00Tcm4TlvDq8ikWAM- Rachel (default)AZnzlk1XvdvUeBnXmlld- DomiEXAVITQu4vr4xnSDxMaL- BellaErXwobaYiN019PkySvjV- Antoni
- ✅ Chrome (desktop & mobile)
- ✅ Firefox (desktop & mobile)
- ✅ Safari (desktop & mobile)
- ✅ Edge (desktop)
- 🔊 Audio playback support
- 🌐 Base64 decoding support
- 📱 Modern browser (ES6+)
When everything is working correctly, you should see:
- 🔊 Console logs showing successful TTS processing
- 🎵 Audio playback of AI responses
- 📊 No TTS errors in console
- 🤖 Seamless voice conversation experience
If you encounter any issues not covered here, check the console logs for detailed error messages!