Skip to content

zaheer-zee/AVA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ AVA Voice Detection API

Language-Agnostic AI Voice Detection using acoustic features. Works across Tamil, Hindi, Telugu, Malayalam, Bengali, English, and all other languages.

Python 3.10+ FastAPI License: MIT

πŸ† Model Performance

  • Training Accuracy: 99.97%
  • Testing Accuracy: 100%
  • Dataset: 11,663 samples (1,665 AI + 9,998 Human)
  • Languages Tested: Tamil, Hindi, Telugu, Malayalam, Bengali, English
  • Inference Time: ~1-2 seconds per audio

🧠 Core Strategy

This API detects AI-generated voices using acoustic artifacts, not speech content. This approach:

  • βœ… Works across all languages automatically
  • βœ… Detects AI patterns in pitch, energy, spectral characteristics
  • βœ… No speech-to-text or language models needed
  • βœ… Fast and explainable
  • βœ… Language-agnostic by design

How It Works

The model analyzes these acoustic features:

  1. Pitch variance - AI voices have unnaturally stable pitch
  2. Energy patterns - Humans have natural volume fluctuations
  3. Spectral features - AI has smoother frequency distribution
  4. Pause patterns - AI pauses are too regular
  5. Zero-crossing rate - Different voice texture patterns

These features work identically across all languages!


πŸ› οΈ Tech Stack

  • Backend: Python 3.14, FastAPI, Uvicorn
  • Audio Processing: ffmpeg, soundfile, librosa
  • ML: scikit-learn (RandomForestClassifier)
  • Security: API key authentication
  • Deployment: Railway-ready

πŸš€ Quick Start

Prerequisites

  • Python 3.10+
  • ffmpeg (installed automatically on macOS via brew)
  • Virtual environment (recommended)

Installation

  1. Clone the repository:
git clone https://github.com/zaheer-zee/AVA.git
cd AVA/voice-detection-api
  1. Create virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment:
# The API will auto-generate an API key on first run
# Or create .env file with custom key:
echo "API_KEY=AVA-2026-YOUR-KEY" > .env
  1. Run the server:
python -m app.main

The API will start at http://localhost:8000

Web Interface

Open your browser and go to:

http://localhost:8000/static/index.html

Features:

  • πŸ“ Drag & drop audio files
  • 🎯 Real-time AI detection
  • πŸ“Š Confidence visualization
  • 🌍 Multilingual support indicator

πŸ“‘ API Usage

Endpoint

POST /api/voice-detection

Headers

x-api-key: AVA-2026-XXXXXX
Content-Type: application/json

Request Body

{
  "audio_base64": "BASE64_ENCODED_AUDIO_STRING"
}

Response

{
  "classification": "AI_GENERATED",
  "confidence": 0.87,
  "language": "multilingual",
  "explanation": "Analysis indicates synthetic speech patterns: unnaturally stable pitch, consistent energy levels."
}

Example with cURL

# Convert MP3 to base64
BASE64_AUDIO=$(base64 -i sample.mp3)

# Make request
curl -X POST http://localhost:8000/api/voice-detection \
  -H "x-api-key: AVA-2026-910728" \
  -H "Content-Type: application/json" \
  -d "{\"audio_base64\": \"$BASE64_AUDIO\"}"

Example with Python

import base64
import requests

# Read and encode audio
with open('sample.mp3', 'rb') as f:
    audio_base64 = base64.b64encode(f.read()).decode('utf-8')

# Make request
response = requests.post(
    'http://localhost:8000/api/voice-detection',
    headers={
        'x-api-key': 'AVA-2026-910728',
        'Content-Type': 'application/json'
    },
    json={'audio_base64': audio_base64}
)

result = response.json()
print(f"Classification: {result['classification']}")
print(f"Confidence: {result['confidence']*100:.1f}%")

πŸ“Š Understanding Results

Confidence Score

The confidence score is a probability (0-100%) representing how certain the model is that the audio is AI-generated.

Confidence Meaning Classification
0-50% Low AI probability HUMAN
50-100% High AI probability AI_GENERATED

Examples:

  • 4% confidence = 96% sure it's HUMAN β†’ Classification: HUMAN βœ…
  • 87% confidence = 87% sure it's AI β†’ Classification: AI_GENERATED βœ…

Language Support

The model shows "multilingual" because it's language-agnostic:

  • Works across ALL languages (Hindi, Urdu, Tamil, English, etc.)
  • Analyzes acoustic features, not words
  • No language-specific training needed

🎯 Model Training

The API comes with a pre-trained model achieving 100% test accuracy. To retrain:

Quick Training (Using Kaggle Dataset)

# Activate environment
source venv/bin/activate

# Download dataset and train (automated)
python scripts/download_dataset.py
python scripts/generate_ai_samples.py
python scripts/train_model.py

This will:

  1. Download 10,000 human voice samples from Kaggle
  2. Generate 2,000 AI voice samples using Google TTS
  3. Train a RandomForestClassifier
  4. Save model to models/voice_classifier.pkl

Training Output

πŸŽ™οΈ  Voice Detection Model Training
============================================================
πŸ“Š Collecting Training Samples...
   AI samples: 1665
   Human samples: 9998
   Total samples: 11663

πŸ€– Training Model...
   Training Accuracy: 99.97%
   Testing Accuracy: 100.00%

πŸ† Excellent accuracy! You're ready to win!

Custom Training Data

Place your audio files in:

training_data/
β”œβ”€β”€ ai_generated/     # AI-generated voice samples
β”‚   β”œβ”€β”€ sample1.mp3
β”‚   └── sample2.mp3
└── human/            # Human voice samples
    β”œβ”€β”€ speaker1.mp3
    └── speaker2.mp3

Then run:

python scripts/train_model.py

🚒 Deployment

οΏ½ GitHub Container Registry (GHCR)

Pre-built multi-platform image available!

# Pull the image (works on Intel, AMD, and ARM servers)
docker pull ghcr.io/zaheer-zee/ava-ml-api:latest

# Run it
docker run -d -p 8000:8000 --env-file .env ghcr.io/zaheer-zee/ava-ml-api:latest

Platforms: linux/amd64, linux/arm64 Auto-built: Every push to main branch Perfect for: College domains, cloud deployment, production use

πŸ“– Full deployment guide: See DEPLOYMENT.md


�🐳 Docker Deployment (Recommended)

The easiest way to deploy the API is using Docker. All dependencies (including ffmpeg) are included in the container.

Option 1: Docker Compose (Easiest)

# Build and run
docker-compose up -d

# Check logs
docker-compose logs -f

# Stop
docker-compose down

The API will be available at http://localhost:8000

Option 2: Docker CLI

# Build the image
docker build -t voice-detection-api:latest .

# Run the container
docker run -d \
  -p 8000:8000 \
  --name voice-api \
  --env-file .env \
  voice-detection-api:latest

# Check logs
docker logs -f voice-api

# Stop and remove
docker stop voice-api && docker rm voice-api

Environment Variables for Docker

Create a .env file:

API_KEY=AVA-2026-YOUR-KEY
HOST=0.0.0.0
PORT=8000

Docker Image Details

  • Base Image: Python 3.10 slim
  • System Dependencies: ffmpeg, libsndfile1
  • Size: ~800MB (optimized)
  • Security: Runs as non-root user
  • Health Check: Built-in endpoint monitoring

☁️ Railway Deployment

  1. Push to GitHub:
git add .
git commit -m "Deploy AVA Voice Detection API"
git push origin main
  1. Deploy on Railway:

    • Go to railway.app
    • Create new project from GitHub repo
    • Add environment variable: API_KEY=AVA-2026-YOUR-KEY
    • Railway will auto-detect Python and deploy
  2. Configure:

    • Railway will provide a URL like https://your-app.railway.app
    • Update your frontend to use this URL
    • Test with: curl https://your-app.railway.app/health

Environment Variables

API_KEY=AVA-2026-910728    # Your API key
HOST=0.0.0.0               # Server host
PORT=8000                  # Server port

🧱 Features Extracted

Feature Description Why It Helps
Pitch Variance Variation in voice pitch AI voices are too stable
Pitch Mean/Std Average pitch and deviation AI has consistent pitch
Energy Variance Volume fluctuation Humans fluctuate naturally
Energy Mean/Std Average energy levels AI has uniform energy
Spectral Flatness Frequency distribution smoothness AI has smoother spectrum
Spectral Centroid Brightness of sound AI differs in timbre
Spectral Rolloff High-frequency content AI has different rolloff
Zero-Crossing Rate Sign changes in waveform AI patterns differ
Pause Regularity Consistency of pauses AI pauses unnaturally
Speech Rate Speaking speed AI is more consistent

Total: 21 features analyzed per audio sample.


πŸ“Š Response Codes

Code Meaning Description
200 Success Audio analyzed successfully
400 Bad Request Invalid base64, corrupt audio, or unsupported format
401 Unauthorized Missing API key
403 Forbidden Invalid API key
500 Server Error Internal processing error

πŸ”’ Security

  • βœ… API key required for all requests
  • βœ… Keys validated via x-api-key header
  • βœ… No API key stored in code (environment variables)
  • βœ… CORS enabled for web interface
  • βœ… Input validation on all endpoints

πŸ§ͺ Testing

Test with Sample Audio

# Using the quick test script
python quick_test.py training_data/ai_generated/english_00001.mp3

Test API Endpoint

# Health check
curl http://localhost:8000/health

# Voice detection
python test_api.py sample.mp3 AVA-2026-910728

πŸ“ Project Structure

voice-detection-api/
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ main.py              # FastAPI application
β”‚   β”œβ”€β”€ config.py            # Configuration settings
β”‚   └── models.py            # Request/response models
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ audio_processor.py   # Audio conversion (ffmpeg)
β”‚   β”œβ”€β”€ feature_extractor.py # Feature extraction (librosa)
β”‚   └── classifier.py        # ML classification
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ download_dataset.py  # Download Kaggle dataset
β”‚   β”œβ”€β”€ generate_ai_samples.py # Generate AI voices
β”‚   └── train_model.py       # Train the model
β”œβ”€β”€ models/
β”‚   └── voice_classifier.pkl # Trained model (712 KB)
β”œβ”€β”€ static/
β”‚   └── index.html           # Web interface
β”œβ”€β”€ training_data/
β”‚   β”œβ”€β”€ ai_generated/        # AI voice samples
β”‚   └── human/               # Human voice samples
β”œβ”€β”€ requirements.txt         # Python dependencies
β”œβ”€β”€ .env                     # Environment variables
└── README.md               # This file

πŸ› Troubleshooting

"No module named 'aifc'"

Fixed! We use ffmpeg for audio conversion instead of librosa's built-in loader.

"Invalid API key"

Check that:

  1. Server is running and shows the API key in console
  2. You're using the correct key in x-api-key header
  3. .env file has the correct API_KEY value

"Failed to process audio"

Ensure:

  1. ffmpeg is installed: brew install ffmpeg (macOS)
  2. Audio file is valid MP3/WAV/M4A
  3. Audio is between 0.5s and 300s duration

Low accuracy after training

  • Add more diverse samples
  • Balance AI/Human samples (equal counts)
  • Use multiple AI voice generators
  • Include samples from all target languages

πŸ’‘ Pro Tips

For Best Accuracy:

βœ… Balance is key: Equal AI and Human samples
βœ… Diversity wins: Mix languages, voices, styles
βœ… Quality matters: Clear audio, 3-10 seconds
βœ… Test often: Validate accuracy after training

For Hackathon Success:

πŸ† Use the pre-trained model (100% test accuracy)
πŸ† Demonstrate multilingual support
πŸ† Show the web interface for live demos
πŸ† Explain the language-agnostic approach


πŸ“ API Key

On startup, check the console for your API key:

============================================================
πŸŽ™οΈ  Voice Detection API Starting...
============================================================
API Key: AVA-2026-910728
Model Path: models/voice_classifier.pkl
Sample Rate: 16000 Hz
============================================================

Use this key in all API requests!


🀝 Contributing

This is a hackathon project. Feel free to:

  • Report issues
  • Suggest improvements
  • Add new features
  • Improve documentation

πŸ“„ License

MIT License - Built for Hackathon 2026


πŸŽ‰ Acknowledgments

  • Kaggle - Indian Languages Audio Dataset
  • Google TTS - AI voice generation
  • scikit-learn - Machine learning framework
  • FastAPI - Modern web framework
  • librosa - Audio analysis library

πŸ“ž Support

For questions or issues:

  1. Check this README
  2. Review the code comments
  3. Test with provided samples
  4. Check server logs for errors

Built with ❀️ for AVA Hackathon 2026

Ready to detect AI voices across all languages! πŸš€

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors