🎧 AudioBook Generator

Transform any document into professional audiobooks with AI-powered enhancement and high-quality text-to-speech.

✨ Features

📄 Multi-format Support: PDF, DOCX, TXT file processing
🤖 AI Enhancement: Gemini API transforms text into engaging audiobook narration
🎙️ Premium TTS: Edge TTS with multiple voice styles (storytelling, authoritative, conversational)
🔍 Smart Q&A: RAG-powered document search and question answering
🌐 Modern UI: React frontend with real-time progress tracking
⚡ Fast Processing: Optimized pipeline with caching and batch processing

🚀 Quick Start

Prerequisites

Python 3.8+
Node.js 16+
Gemini API key

Installation

Clone the repository

git clone https://github.com/AabidMK/Audiobook_generator_-Infosys_Internship_Aug2025.git
cd Audiobook_generator_-Infosys_Internship_Aug2025

Install Python dependencies
```
pip install -r requirements.txt
```

Setup environment

# Create .env file
echo "GEMINI_API_KEY=your_api_key_here" > .env

Install frontend dependencies
```
cd frontend
npm install
cd ..
```

Running the Application

Option 1: Full Stack (Recommended)

run_app.bat  # Windows
# or
python run_localhost.py  # Cross-platform

Option 2: Backend Only

python start_api.py

Option 3: Manual Setup

# Terminal 1 - Backend
python start_api.py

# Terminal 2 - Frontend
cd frontend
npm run dev

🎯 Usage

Upload Document: Drag & drop or select PDF/DOCX/TXT files
Generate Audiobook: Choose voice style and click generate
Download: Get enhanced text and high-quality audio files
Ask Questions: Use RAG system to query document content

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   React UI      │────│   FastAPI        │────│   Gemini API    │
│   (Frontend)    │    │   (Backend)      │    │   (Enhancement) │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                              │
                    ┌─────────┼─────────┐
                    │         │         │
            ┌───────▼───┐ ┌───▼────┐ ┌──▼──────┐
            │ Edge TTS  │ │ ChromaDB│ │Text     │
            │ (Audio)   │ │ (RAG)   │ │Extraction│
            └───────────┘ └────────┘ └─────────┘

🛠️ Tech Stack

Backend: FastAPI, Python 3.8+
Frontend: React, Vite, Axios
AI/ML: Google Gemini API, ChromaDB, Sentence Transformers
TTS: Microsoft Edge TTS
Text Processing: PyMuPDF, python-docx, BeautifulSoup

📁 Project Structure

├── main.py                 # FastAPI backend server
├── audiobook_generator.py  # Core audiobook logic
├── enhanced_extraction.py  # Text extraction
├── rag.py                 # RAG system
├── frontend/              # React application
├── requirements.txt       # Python dependencies
└── README.md             # This file

🔧 Configuration

Environment Variables

GEMINI_API_KEY=your_gemini_api_key
LM_STUDIO_BASE_URL=http://localhost:1234  # Optional

Voice Styles

storytelling: Warm, expressive (default)
authoritative: Deep, confident
conversational: Natural, friendly
narrative: Smooth, professional
dramatic: Dynamic, emotional

🚨 Troubleshooting

Port Conflicts

# Check ports
netstat -ano | findstr :8000

# Kill processes
taskkill /F /PID <PID>

Common Issues

Indexing failed: Check file permissions and format
Audio generation failed: Verify Edge TTS installation
API errors: Validate Gemini API key

📊 Performance

Text Processing: ~2-5 seconds per page
AI Enhancement: ~10-30 seconds per chunk
Audio Generation: ~1-2x real-time speed
Supported File Size: Up to 50MB documents

🤝 Contributing

Fork the repository
Create feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Infosys Internship Program - Project opportunity
Google Gemini API - AI text enhancement
Microsoft Edge TTS - High-quality speech synthesis
ChromaDB - Vector database for RAG

Made with ❤️ during Infosys Internship 2025

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
chroma_db		chroma_db
complete_audiobooks		complete_audiobooks
frontend		frontend
uploads		uploads
.gitignore		.gitignore
AI AudioBook Generator.pdf		AI AudioBook Generator.pdf
API_SETUP.md		API_SETUP.md
GEMINI_SETUP_COMPLETE.md		GEMINI_SETUP_COMPLETE.md
LICENSE		LICENSE
LOCALHOST_SETUP.md		LOCALHOST_SETUP.md
PROJECT_STATUS_COMPLETE.md		PROJECT_STATUS_COMPLETE.md
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
RAG_UNICODE_FIXES_COMPLETE.md		RAG_UNICODE_FIXES_COMPLETE.md
README.md		README.md
SETUP.md		SETUP.md
audiobook_generator.py		audiobook_generator.py
chroma_storing.py		chroma_storing.py
enhanced_extraction.py		enhanced_extraction.py
fast_start.bat		fast_start.bat
main.py		main.py
pipeline_orchestrator.py		pipeline_orchestrator.py
pipeline_rag.py		pipeline_rag.py
rag.py		rag.py
requirements.txt		requirements.txt
run_app.bat		run_app.bat
run_localhost.py		run_localhost.py
start_api.py		start_api.py
start_backend.bat		start_backend.bat
start_backend_only.py		start_backend_only.py
start_localhost.bat		start_localhost.bat
start_manual.py		start_manual.py
text_chunking.py		text_chunking.py
text_extractor.py		text_extractor.py
unicode_utils.py		unicode_utils.py
vector_embedding.py		vector_embedding.py
web_interface.py		web_interface.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎧 AudioBook Generator

✨ Features

🚀 Quick Start

Prerequisites

Installation

Running the Application

🎯 Usage

🏗️ Architecture

🛠️ Tech Stack

📁 Project Structure

🔧 Configuration

Environment Variables

Voice Styles

🚨 Troubleshooting

Port Conflicts

Common Issues

📊 Performance

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎧 AudioBook Generator

✨ Features

🚀 Quick Start

Prerequisites

Installation

Running the Application

🎯 Usage

🏗️ Architecture

🛠️ Tech Stack

📁 Project Structure

🔧 Configuration

Environment Variables

Voice Styles

🚨 Troubleshooting

Port Conflicts

Common Issues

📊 Performance

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages