Language-Agnostic AI Voice Detection using acoustic features. Works across Tamil, Hindi, Telugu, Malayalam, Bengali, English, and all other languages.
- Training Accuracy: 99.97%
- Testing Accuracy: 100%
- Dataset: 11,663 samples (1,665 AI + 9,998 Human)
- Languages Tested: Tamil, Hindi, Telugu, Malayalam, Bengali, English
- Inference Time: ~1-2 seconds per audio
This API detects AI-generated voices using acoustic artifacts, not speech content. This approach:
- β Works across all languages automatically
- β Detects AI patterns in pitch, energy, spectral characteristics
- β No speech-to-text or language models needed
- β Fast and explainable
- β Language-agnostic by design
The model analyzes these acoustic features:
- Pitch variance - AI voices have unnaturally stable pitch
- Energy patterns - Humans have natural volume fluctuations
- Spectral features - AI has smoother frequency distribution
- Pause patterns - AI pauses are too regular
- Zero-crossing rate - Different voice texture patterns
These features work identically across all languages!
- Backend: Python 3.14, FastAPI, Uvicorn
- Audio Processing: ffmpeg, soundfile, librosa
- ML: scikit-learn (RandomForestClassifier)
- Security: API key authentication
- Deployment: Railway-ready
- Python 3.10+
- ffmpeg (installed automatically on macOS via brew)
- Virtual environment (recommended)
- Clone the repository:
git clone https://github.com/zaheer-zee/AVA.git
cd AVA/voice-detection-api- Create virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment:
# The API will auto-generate an API key on first run
# Or create .env file with custom key:
echo "API_KEY=AVA-2026-YOUR-KEY" > .env- Run the server:
python -m app.mainThe API will start at http://localhost:8000
Open your browser and go to:
http://localhost:8000/static/index.html
Features:
- π Drag & drop audio files
- π― Real-time AI detection
- π Confidence visualization
- π Multilingual support indicator
POST /api/voice-detection
x-api-key: AVA-2026-XXXXXX
Content-Type: application/json
{
"audio_base64": "BASE64_ENCODED_AUDIO_STRING"
}{
"classification": "AI_GENERATED",
"confidence": 0.87,
"language": "multilingual",
"explanation": "Analysis indicates synthetic speech patterns: unnaturally stable pitch, consistent energy levels."
}# Convert MP3 to base64
BASE64_AUDIO=$(base64 -i sample.mp3)
# Make request
curl -X POST http://localhost:8000/api/voice-detection \
-H "x-api-key: AVA-2026-910728" \
-H "Content-Type: application/json" \
-d "{\"audio_base64\": \"$BASE64_AUDIO\"}"import base64
import requests
# Read and encode audio
with open('sample.mp3', 'rb') as f:
audio_base64 = base64.b64encode(f.read()).decode('utf-8')
# Make request
response = requests.post(
'http://localhost:8000/api/voice-detection',
headers={
'x-api-key': 'AVA-2026-910728',
'Content-Type': 'application/json'
},
json={'audio_base64': audio_base64}
)
result = response.json()
print(f"Classification: {result['classification']}")
print(f"Confidence: {result['confidence']*100:.1f}%")The confidence score is a probability (0-100%) representing how certain the model is that the audio is AI-generated.
| Confidence | Meaning | Classification |
|---|---|---|
| 0-50% | Low AI probability | HUMAN |
| 50-100% | High AI probability | AI_GENERATED |
Examples:
- 4% confidence = 96% sure it's HUMAN β Classification: HUMAN β
- 87% confidence = 87% sure it's AI β Classification: AI_GENERATED β
The model shows "multilingual" because it's language-agnostic:
- Works across ALL languages (Hindi, Urdu, Tamil, English, etc.)
- Analyzes acoustic features, not words
- No language-specific training needed
The API comes with a pre-trained model achieving 100% test accuracy. To retrain:
# Activate environment
source venv/bin/activate
# Download dataset and train (automated)
python scripts/download_dataset.py
python scripts/generate_ai_samples.py
python scripts/train_model.pyThis will:
- Download 10,000 human voice samples from Kaggle
- Generate 2,000 AI voice samples using Google TTS
- Train a RandomForestClassifier
- Save model to
models/voice_classifier.pkl
ποΈ Voice Detection Model Training
============================================================
π Collecting Training Samples...
AI samples: 1665
Human samples: 9998
Total samples: 11663
π€ Training Model...
Training Accuracy: 99.97%
Testing Accuracy: 100.00%
π Excellent accuracy! You're ready to win!
Place your audio files in:
training_data/
βββ ai_generated/ # AI-generated voice samples
β βββ sample1.mp3
β βββ sample2.mp3
βββ human/ # Human voice samples
βββ speaker1.mp3
βββ speaker2.mp3
Then run:
python scripts/train_model.pyPre-built multi-platform image available!
# Pull the image (works on Intel, AMD, and ARM servers)
docker pull ghcr.io/zaheer-zee/ava-ml-api:latest
# Run it
docker run -d -p 8000:8000 --env-file .env ghcr.io/zaheer-zee/ava-ml-api:latestPlatforms: linux/amd64, linux/arm64 Auto-built: Every push to main branch Perfect for: College domains, cloud deployment, production use
π Full deployment guide: See DEPLOYMENT.md
The easiest way to deploy the API is using Docker. All dependencies (including ffmpeg) are included in the container.
# Build and run
docker-compose up -d
# Check logs
docker-compose logs -f
# Stop
docker-compose downThe API will be available at http://localhost:8000
# Build the image
docker build -t voice-detection-api:latest .
# Run the container
docker run -d \
-p 8000:8000 \
--name voice-api \
--env-file .env \
voice-detection-api:latest
# Check logs
docker logs -f voice-api
# Stop and remove
docker stop voice-api && docker rm voice-apiCreate a .env file:
API_KEY=AVA-2026-YOUR-KEY
HOST=0.0.0.0
PORT=8000- Base Image: Python 3.10 slim
- System Dependencies: ffmpeg, libsndfile1
- Size: ~800MB (optimized)
- Security: Runs as non-root user
- Health Check: Built-in endpoint monitoring
- Push to GitHub:
git add .
git commit -m "Deploy AVA Voice Detection API"
git push origin main-
Deploy on Railway:
- Go to railway.app
- Create new project from GitHub repo
- Add environment variable:
API_KEY=AVA-2026-YOUR-KEY - Railway will auto-detect Python and deploy
-
Configure:
- Railway will provide a URL like
https://your-app.railway.app - Update your frontend to use this URL
- Test with:
curl https://your-app.railway.app/health
- Railway will provide a URL like
API_KEY=AVA-2026-910728 # Your API key
HOST=0.0.0.0 # Server host
PORT=8000 # Server port| Feature | Description | Why It Helps |
|---|---|---|
| Pitch Variance | Variation in voice pitch | AI voices are too stable |
| Pitch Mean/Std | Average pitch and deviation | AI has consistent pitch |
| Energy Variance | Volume fluctuation | Humans fluctuate naturally |
| Energy Mean/Std | Average energy levels | AI has uniform energy |
| Spectral Flatness | Frequency distribution smoothness | AI has smoother spectrum |
| Spectral Centroid | Brightness of sound | AI differs in timbre |
| Spectral Rolloff | High-frequency content | AI has different rolloff |
| Zero-Crossing Rate | Sign changes in waveform | AI patterns differ |
| Pause Regularity | Consistency of pauses | AI pauses unnaturally |
| Speech Rate | Speaking speed | AI is more consistent |
Total: 21 features analyzed per audio sample.
| Code | Meaning | Description |
|---|---|---|
200 |
Success | Audio analyzed successfully |
400 |
Bad Request | Invalid base64, corrupt audio, or unsupported format |
401 |
Unauthorized | Missing API key |
403 |
Forbidden | Invalid API key |
500 |
Server Error | Internal processing error |
- β API key required for all requests
- β
Keys validated via
x-api-keyheader - β No API key stored in code (environment variables)
- β CORS enabled for web interface
- β Input validation on all endpoints
# Using the quick test script
python quick_test.py training_data/ai_generated/english_00001.mp3# Health check
curl http://localhost:8000/health
# Voice detection
python test_api.py sample.mp3 AVA-2026-910728voice-detection-api/
βββ app/
β βββ main.py # FastAPI application
β βββ config.py # Configuration settings
β βββ models.py # Request/response models
βββ utils/
β βββ audio_processor.py # Audio conversion (ffmpeg)
β βββ feature_extractor.py # Feature extraction (librosa)
β βββ classifier.py # ML classification
βββ scripts/
β βββ download_dataset.py # Download Kaggle dataset
β βββ generate_ai_samples.py # Generate AI voices
β βββ train_model.py # Train the model
βββ models/
β βββ voice_classifier.pkl # Trained model (712 KB)
βββ static/
β βββ index.html # Web interface
βββ training_data/
β βββ ai_generated/ # AI voice samples
β βββ human/ # Human voice samples
βββ requirements.txt # Python dependencies
βββ .env # Environment variables
βββ README.md # This file
Fixed! We use ffmpeg for audio conversion instead of librosa's built-in loader.
Check that:
- Server is running and shows the API key in console
- You're using the correct key in
x-api-keyheader .envfile has the correctAPI_KEYvalue
Ensure:
- ffmpeg is installed:
brew install ffmpeg(macOS) - Audio file is valid MP3/WAV/M4A
- Audio is between 0.5s and 300s duration
- Add more diverse samples
- Balance AI/Human samples (equal counts)
- Use multiple AI voice generators
- Include samples from all target languages
β
Balance is key: Equal AI and Human samples
β
Diversity wins: Mix languages, voices, styles
β
Quality matters: Clear audio, 3-10 seconds
β
Test often: Validate accuracy after training
π Use the pre-trained model (100% test accuracy)
π Demonstrate multilingual support
π Show the web interface for live demos
π Explain the language-agnostic approach
On startup, check the console for your API key:
============================================================
ποΈ Voice Detection API Starting...
============================================================
API Key: AVA-2026-910728
Model Path: models/voice_classifier.pkl
Sample Rate: 16000 Hz
============================================================
Use this key in all API requests!
This is a hackathon project. Feel free to:
- Report issues
- Suggest improvements
- Add new features
- Improve documentation
MIT License - Built for Hackathon 2026
- Kaggle - Indian Languages Audio Dataset
- Google TTS - AI voice generation
- scikit-learn - Machine learning framework
- FastAPI - Modern web framework
- librosa - Audio analysis library
For questions or issues:
- Check this README
- Review the code comments
- Test with provided samples
- Check server logs for errors
Built with β€οΈ for AVA Hackathon 2026
Ready to detect AI voices across all languages! π