AI-powered research presentation generator that creates professional PowerPoint decks with factual content and proper citations. Combines Brave Search API, LlamaIndex RAG, and python-pptx to generate high-quality, research-backed presentations.
- Research-Driven Content: Uses Brave Search API to gather current, authoritative sources
- RAG-Enhanced Generation: LlamaIndex vector indexing for grounded, factual content
- Professional Quality: Complete thought validation, dynamic formatting, and citation tracking
- Template Support: Works with custom PowerPoint templates for branding consistency
- Verbose Progress Tracking: Real-time updates during the multi-minute generation process
- Phoenix Observability: Full LLM tracing with persistent storage for performance analysis and debugging
# Clone the repository
git clone https://github.com/jc7k/llm_pptx_deck_builder.git
cd llm_pptx_deck_builder
# Create and activate virtual environment with uv
uv venv
uv sync
# Set up environment variables
cp .env.example .env
# Edit .env with your API keys:
# BRAVE_API_KEY=brv-************************
# OPENAI_API_KEY=sk-************************
#
# Optional Phoenix settings (included in .env.example):
# ENABLE_PHOENIX=true
# PHOENIX_HOST=127.0.0.1
# PHOENIX_PORT=6006# Generate a presentation
uv run python deck_builder_cli.py --topic "AI impact on job market trends 2025"
# Use a custom template
uv run python deck_builder_cli.py --topic "Market Analysis" --template corporate_template.pptx
# Toggle validation mode
# Strict (default): enforces complete thoughts and specificity
uv run python deck_builder_cli.py --topic "AI in Education" --strict-validation
# Lenient: faster, suitable for drafts/testing
uv run python deck_builder_cli.py --topic "AI in Education" --lenient-validation
# Or via env var
DECK_STRICT=0 uv run python deck_builder_cli.py --topic "AI in Education"Host a simple web UI on Netlify Starter that triggers deck builds using GitHub Actions. Users bring their own API keys (OpenAI + Brave) per request; keys are never stored or logged by the app.
- How it works: static React app + Netlify Functions (dispatch GitHub Action) + GitHub Actions (runs Python CLI) + GitHub Artifacts (PPTX download).
- Security: Keys are entered in the browser as password fields, sent once to the Netlify Function to launch the job, masked in workflow logs, and cleared from UI memory immediately after dispatch.
Getting started:
- Read the setup guide: docs/netlify_setup.md
- Architecture, plan, and flow details: docs/netlify_web_ui.md
Local dev (frontend only):
cd web && npm install && npm run devDeploy (Netlify):
- Add Netlify env vars:
GITHUB_TOKEN,GITHUB_OWNER,GITHUB_REPO,GITHUB_WORKFLOW=build_deck.yml,GITHUB_BRANCH=main. - Push a branch and open a PR to get a Deploy Preview. Use the preview URL to submit a topic and your API keys, then download the generated PPTX when complete.
For comprehensive LLM performance monitoring and debugging:
# Start persistent Phoenix server (recommended for development)
uv run python scripts/start_phoenix.py &
# Phoenix UI will be available at: http://localhost:6006/
# Traces persist across sessions in: ~/.phoenix_llm_pptx/
# Now run your deck builder - all API calls are traced
uv run python deck_builder_cli.py --topic "Your topic"
# Analyze traces in Phoenix dashboard even after CLI exits
open http://localhost:6006/
# Stop Phoenix server when done
uv run python scripts/stop_phoenix.pyPhoenix Features:
- Persistent Traces: All traces stored in SQLite, survive server restarts
- API Monitoring: OpenAI, Brave Search, and LangChain call tracking
- Performance Analysis: Latency, token usage, and cost tracking
- Error Debugging: Full request/response logging for failed calls
- Quality Metrics: Content validation and generation success rates
Here's what you'll see when generating a presentation:
$ python deck_builder_cli.py --topic "AI impact on job market trends 2025"
🚀 Starting presentation generation for: AI impact on job market trends 2025
This may take several minutes...
[11:01:38] 🔍 RESEARCH: Starting web search for: AI impact on job market trends 2025
[11:01:38] 🔍 RESEARCH: Applying rate limiting for Brave Search API...
[11:01:38] 🔍 RESEARCH: Enhanced search query: AI impact on job market trends 2025 statistics data trends 2025 report analysis...
[11:01:38] 🔍 RESEARCH: Querying Brave Search API...
[11:01:41] 🔍 RESEARCH: ✅ Found 15 search results
[11:01:41] 🔍 RESEARCH: Top sources: https://www.nexford.edu/insights/how-will-ai-affect-jobs, https://thehill.com/policy/technology/5460357-ai-impact-on-job-market/, https://www.gartner.com/en/articles/hype-cycle-for-artificial-intelligence, https://id-times.com/finance/2025-job-market/, https://www.stlouisfed.org/on-the-economy/2025/aug/recent-college-grads-bear-brunt-labor-market-shifts
[11:01:41] 📄 DOCUMENTS: Processing 15 search results...
[11:01:41] 📄 DOCUMENTS: Extracting URLs from search results...
[11:01:41] 📄 DOCUMENTS: Found 15 URLs to process
[11:01:41] 📄 DOCUMENTS: Starting document loading with rate limiting...
[11:01:55] 📄 DOCUMENTS: ✅ Successfully loaded 14 documents
[11:01:55] 📄 DOCUMENTS: Total content: 226,564 characters
[11:01:55] 🧠 INDEXING: Creating vector index from 14 documents...
[11:01:55] 🧠 INDEXING: Converting documents to LlamaIndex format...
[11:01:55] 🧠 INDEXING: Building vector embeddings with OpenAI...
[11:01:57] 🧠 INDEXING: ✅ Vector index created: 14 docs, 66 chunks
[11:01:57] 🧠 INDEXING: Index ID: deck_builder_1756231317
[11:01:57] 📋 OUTLINE: Generating presentation outline for: AI impact on job market trends 2025
[11:01:57] 📋 OUTLINE: Querying vector index for relevant content...
[11:01:57] 📋 OUTLINE: Applying rate limiting for OpenAI API...
[11:02:21] 📋 OUTLINE: ✅ Generated outline with 11 slides
[11:02:21] 📋 OUTLINE: Slides: Title Slide, Agenda, Introduction, Key Concepts, Current Trends...
[11:02:21] 📋 OUTLINE: Estimated duration: 15 minutes
[11:02:21] 📝 CONTENT: Generating detailed content for 11 slides...
[11:02:21] 📝 CONTENT: Querying vector index for slide-specific content...
[11:02:21] 📝 CONTENT: Applying rate limiting for OpenAI API...
[11:02:21] 📝 CONTENT: Processing slides with RAG-based content generation...
Creating content allocation plan to eliminate repetition...
✅ Generated slide: Introduction → AI Market Growth Projections to 2025
✅ Generated slide: Key Concepts → Surge in AI Adoption Amid Labor Shortages
✅ Generated slide: Current Trends → Global AI Adoption in Healthcare and Workplaces
✅ Generated slide: Applications → AI's Economic Impact: Jobs and Revenue Growth
✅ Generated slide: Challenges → Global Workforce Impact of AI Distrust
✅ Generated slide: Future Outlook → AI's Economic Impact and Job Growth by 2030
✅ Generated slide: Conclusions → AI Market Growth and Workforce Concerns
✅ Generated slide: Next Steps → Building Trust and Skills for AI Success
Successfully generated 8 unique slides with no content repetition
[11:03:48] 📝 CONTENT: ✅ Generated content for 8 slides
[11:03:48] 📝 CONTENT: Collecting and processing citations...
[11:03:48] 📝 CONTENT: Slide 1: 3 citations
[11:03:48] 📝 CONTENT: Slide 2: 3 citations
[11:03:48] 📝 CONTENT: Slide 3: 3 citations
[11:03:48] 📝 CONTENT: Slide 4: 3 citations
[11:03:48] 📝 CONTENT: Slide 5: 3 citations
[11:03:48] 📝 CONTENT: Slide 6: 3 citations
[11:03:48] 📝 CONTENT: Slide 7: 3 citations
[11:03:48] 📝 CONTENT: Slide 8: 3 citations
[11:03:48] 📝 CONTENT: Total citations before deduplication: 24
[11:03:48] 📝 CONTENT: Deduplicating citations...
[11:03:48] 📝 CONTENT: ✅ Final unique citations: 3
[11:03:48] 📝 CONTENT: Generated 24 total bullet points across all slides
[11:03:48] 🎨 PRESENTATION: Creating PowerPoint file with 8 slides...
[11:03:48] 🎨 PRESENTATION: Using default PowerPoint template
[11:03:48] 🎨 PRESENTATION: Including 3 references
[11:03:48] 🎨 PRESENTATION: Initializing python-pptx presentation...
[11:03:48] 🎨 PRESENTATION: Rendering title slide...
[11:03:48] 🎨 PRESENTATION: Processing content slides...
Running final presentation validation...
❌ Presentation validation failed:
- Similar content detected: 'Global AI market projected between $244 billion and $757.6 billion by 2025' and 'AI market projected to reach $190 billion by 2025'
- Similar content detected: 'AI to add over $15 trillion to global revenue by 2030' and 'AI expected to contribute over $15 trillion to global revenue by 2030'
- Similar content detected: 'AI market projected to reach $2.4 trillion by 2032' and 'AI market projected to reach $190 billion by 2025'
Proceeding with presentation creation despite validation warnings...
[11:03:48] 🎨 PRESENTATION: ✅ Presentation saved to: output/AI Market Growth Projections to 2025_20250826_110348.pptx
[11:03:48] 🎨 PRESENTATION: File size: 0.0 MB
[11:03:48] 🎨 PRESENTATION: 🎉 Presentation generation complete!
✅ Presentation generated successfully!
📄 Output file: output/AI Market Growth Projections to 2025_20250826_110348.pptx
🎯 Slides created: 8
📚 References included: 3The system implements a 6-node LangGraph workflow with proper state management:
START → research → load_docs → create_index → generate_outline → generate_content → create_presentation → END
-
Research Node (
research_node)- Uses Brave Search API for current information
- Returns structured search results with URLs, titles, snippets
- Error handling with empty results fallback
-
Document Loading Node (
document_loading_node)- LangChain WebBaseLoader for content extraction
- BeautifulSoup4 + html2text for clean text parsing
- Rate limiting and respectful crawling
-
Indexing Node (
indexing_node)- LlamaIndex VectorStoreIndex creation
- OpenAI embeddings with configurable chunk size
- Metadata tracking for query engine optimization
-
Outline Generation Node (
outline_generation_node)- RAG-based outline generation using vector store
- Two-phase prompting: research → structured JSON
- Fallback outline for JSON parsing failures
-
Content Generation Node (
content_generation_node)- Per-slide content generation with RAG queries
- Citation tracking with inline reference markers
- Structured JSON output with bullets and speaker notes
-
Presentation Creation Node (
presentation_creation_node)- Python-PPTX rendering with template support
- Professional slide layouts and formatting
- Automatic references slide generation
class DeckBuilderState(TypedDict):
messages: List[AnyMessage]
user_request: str
search_results: List[Dict]
documents: List[Dict]
vector_index: Optional[Dict]
outline: Optional[Dict]
slide_specs: List[Dict]
references: List[str]
template_path: Optional[str]
output_path: str
status: strAll tools use the @tool decorator pattern:
@tool
def search_web(query: str, count: int = 10) -> List[Dict]:
"""Search web using Brave Search API for current information."""
# Implementation with error handling, rate limitingCore Tools:
search_web: Brave Search API integrationload_web_documents: LangChain WebBaseLoadercreate_vector_index: LlamaIndex vector store creationgenerate_outline: RAG-based outline generationgenerate_slide_content: RAG-based slide contentcreate_presentation: Python-PPTX rendering
Interested in contributing? Please read the contribution guide at CONTRIBUTING.md.
- Brave Search API: Web search for current information with rate limiting
- LangChain WebBaseLoader: Page fetching with SSL handling and retry logic
- LlamaIndex Vector Store: RAG-based content grounding and retrieval
- Content Quality System: Validates complete thoughts, prevents repetition
- Dynamic Formatting: Auto-adjusts font sizes and layouts for readability
- Citation Management: Tracks sources with inline references and bibliography
# Required API Keys
BRAVE_API_KEY=brv-************************ # Brave Search API
OPENAI_API_KEY=sk-************************ # OpenAI GPT models
# Optional Settings
OPENAI_MODEL=gpt-4o-mini # Default: gpt-4o-mini
USER_AGENT=llm-pptx-deck-builder/1.0 # For web scraping
# Optional: LangSmith Tracing
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=ls__************************
# Model Configuration
EMBEDDING_MODEL=text-embedding-3-small
# Validation Mode (optional)
# Set to 1/true to enforce strict validation globally; 0/false for lenient
DECK_STRICT=1# src/settings.py - Default values
max_search_results: int = 15 # Maximum search results to process
max_documents: int = 20 # Maximum documents to load
chunk_size: int = 1000 # Text chunk size for indexing
chunk_overlap: int = 200 # Text chunk overlap
similarity_top_k: int = 10 # Top K results for similarity search
default_output_dir: str = "output" # Default output directory
strict_validation: bool = False # Default CLI sets this to True unless DECK_STRICT=0- The test suite uses lightweight stubs under
tests/_stubsso it runs without network or heavy installs. - For running with full features locally, install extras:
uv pip install -e .[full]This pulls in langgraph, langchain-community, langchain-openai, llama-index, python-pptx, and pydantic-ai.
Production-ready rate limiting is implemented for all APIs:
- Brave Search: 0.5 req/sec, 30 req/min, 1800 req/hour
- OpenAI API: 0.33 req/sec, 20 req/min, 1000 req/hour
- Web Scraping: 2 req/sec, 120 req/min, 7200 req/hour
- Ensures bullet points express complete thoughts (not necessarily complete sentences)
- Detects incomplete endings with prepositions, conjunctions, and articles
- Validates word count (4-15 words for optimal readability)
- Prevents markdown formatting artifacts
- Content allocation planning prevents duplicate insights across slides
- Semantic similarity detection with automated retry mechanisms
- Specialized prompts for different slide types (Introduction, Applications, etc.)
- Automatic title-content alignment for semantic harmony
- Replaces generic titles with specific, insight-driven alternatives
- Length management prevents title line wraps (8-word maximum)
- Quote removal and formatting cleanup
The project uses comprehensive testing with FakeLLM for development:
@pytest.fixture
def fake_llm():
responses = [
'{"topic": "AI in Education", "objective": "Overview...", "slide_titles": [...]}',
'{"title": "Introduction", "bullets": [...], "speaker_notes": "..."}'
]
return FakeLLM(responses=responses)@patch('src.tools.requests.get')
def test_search_web_success(self, mock_get, mock_api_keys):
mock_response = Mock()
mock_response.json.return_value = {"web": {"results": [...]}}
mock_get.return_value = mock_response
# Test implementation# Run all tests
uv run pytest
# Test content quality validation
uv run python test_final_validation.py
# Test thought completeness detection
uv run python test_validation_only.py
# Code quality checks
uv run ruff check .
uv run ruff check --fix .
# Full validation suite
python validate.pysrc/
├── deck_builder_agent.py # Main LangGraph workflow
├── tools.py # Core generation and validation logic
├── models.py # Pydantic data models
├── dependencies.py # LlamaIndex and API setup
├── settings.py # Environment configuration
└── rate_limiter.py # API rate limiting utilities
tests/
├── test_agent.py # Agent workflow tests
├── test_tools.py # Tool function tests
└── test_integration.py # End-to-end tests
- Streaming: Real-time status updates during generation
- Caching: LlamaIndex vector store persistence
- Rate Limiting: Respectful API usage with backoff strategies
- Memory Management: Efficient document chunking and indexing
- Graceful Degradation: Continues workflow even with partial failures
- Retry Logic: Automatic retry for transient API failures
- Fallback Content: Default outline/content when AI generation fails
- Validation: Input sanitization and output schema validation
- Environment variable storage (never in code)
- Pydantic validation on startup
- Secure error messages (no key exposure)
- User prompt sanitization
- Search result count limits
- JSON schema validation with Pydantic
- File path validation for templates
Use corporate PowerPoint templates for branded presentations:
python deck_builder_cli.py \
--topic "Q4 Financial Results" \
--template templates/corporate_theme.pptx- Inherits slide masters, themes, and branding
- Maintains corporate design consistency
- Supports all standard PowerPoint template features
The system can be integrated into other applications:
from src.deck_builder_agent import build_deck_sync
# Generate presentation synchronously
result = build_deck_sync(
user_request="Market Analysis 2025",
template_path="corporate_template.pptx" # Optional
)
if result["success"]:
print(f"Generated: {result['output_path']}")
print(f"Total slides: {len(result['slide_specs'])}")
print(f"References: {len(result['references'])}")# Basic presentation generation
python deck_builder_cli.py --topic "Digital transformation in healthcare"
# Advanced usage with template and custom output
python deck_builder_cli.py \
--topic "Sustainable energy solutions for small businesses" \
--template corporate_template.pptx \
--output sustainability_deck.pptx- Python 3.11+
- uv package manager
- Brave Search API key
- OpenAI API key
- Internet connection for research and embedding generation
MIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
- API Key Errors: Verify
.envfile setup with correct key formats - Import Errors: Run
uv syncto install all dependencies - Test Failures: Run
python validate.pyfor comprehensive diagnostics - Template Issues: Ensure
.pptxfile format and file accessibility - Rate Limiting: If you get rate limit errors, the system will automatically retry with backoff
python validate.py # Full validation suite
python deck_builder_cli.py --help # CLI help and options
uv run pytest tests/ -v # Run all tests with verbose output
uv run ruff check . # Code quality checkFor detailed debugging information:
# Enable verbose logging
LANGCHAIN_TRACING_V2=true python deck_builder_cli.py --topic "Your topic"
# Run with maximum verbosity
python deck_builder_cli.py --topic "Your topic" --verboseFor issues and feature requests, please use the GitHub Issues page.