Skip to content

vinpatel/Manuscript

Repository files navigation

🔍 Manuscript

The Open Source AI Content Detector That Respects Your Privacy

Detect AI-generated text, images, audio & video—100% offline, self-hosted, zero external calls.

GitHub Stars License Go Version Docker Pulls PRs Welcome


🌐 Website🚀 Quick Start📖 Documentation🎯 Use Cases💬 Discussions



🤔 The Problem

Every AI detection service requires you to upload your content to their servers. That's a dealbreaker for:

  • 🏥 Healthcare — HIPAA compliance prohibits sending patient data externally
  • ⚖️ Law Firms — Attorney-client privilege can't survive third-party uploads
  • 🏦 Finance — SOC2/PCI requirements restrict data sharing
  • 🛡️ Government — Air-gapped networks, classified environments
  • 🎓 Universities — 100K+ students = $100K+ annual licensing

Manuscript runs entirely on YOUR infrastructure. Your data never leaves your network.


⚡ Quick Start

Get running in under 30 seconds:

# Option 1: Docker (Recommended)
docker run -p 8080:8080 manuscript/manuscript

# Option 2: Go Install
go install github.com/vinpatel/manuscript/cmd/api@latest
manuscript

# Option 3: Build from Source
git clone https://github.com/vinpatel/manuscript.git
cd manuscript && make run

Then detect AI content:

curl -X POST http://localhost:8080/verify \
  -H "Content-Type: application/json" \
  -d '{"text": "Your content here"}'

Response:

{
  "id": "hm_abc123",
  "verdict": "human",
  "confidence": 0.87,
  "signals": {
    "sentence_variance": 0.42,
    "vocabulary_richness": 0.78,
    "contraction_ratio": 0.15
  }
}

🎯 Use Cases

👨‍💻 For Developers

Integration What It Does
Content Platforms Filter AI spam from UGC
Hiring Tools Verify candidate samples
EdTech Academic integrity checks
Social Apps Flag synthetic profiles
CMS Plugins WordPress, Ghost, Strapi
CI/CD Lint content like code
Extensions Detect AI on any webpage

🏢 For Organizations

Industry Why Manuscript
Enterprise Compliance (HIPAA, GDPR, SOC2)
Government Air-gapped, classified envs
Legal Protect client privilege
Healthcare Patient data stays on-prem
Finance Regulatory restrictions
Education Scale without per-seat costs
Media Own the tool, don't rent it

🏆 Why Manuscript?

Feature Manuscript GPTZero Originality.ai Turnitin
Self-hosted
Works Offline
Open Source ✅ MIT
Zero Cost ✅ Forever ❌ $$$ ❌ $$$ ❌ $$$$
Privacy-First ✅ No data leaves ⚠️ Cloud-only ⚠️ Cloud-only ⚠️ Cloud-only
Multi-Modal ✅ Text/Image/Audio/Video ⚠️ Text only ⚠️ Text only ⚠️ Text only
API Limits ∞ Unlimited ⚠️ Tiered ⚠️ Per-check ⚠️ Per-student

🔬 How It Works

Manuscript uses statistical and forensic analysis—no ML models, no GPU required, instant results.

📝 Text Detection Signals
Signal Human Writing AI Writing
Sentence length variance High (varied rhythm) Low (uniform)
Vocabulary richness Diverse, personal words "Safe" common words
Contractions "don't", "I'm", "we'll" "do not", "I am"
Punctuation variety !?;:—... Mostly periods
AI phrases Rare "As an AI...", "It's important to note..."
Hedging language Natural uncertainty Excessive qualifiers
Repetition patterns Organic callbacks Mechanical repetition
🖼️ Image Detection Signals
Signal Real Photo AI-Generated
EXIF metadata Present (camera, GPS, date) Missing or fake
Camera make Apple, Canon, Sony, etc. None or generic
Sensor noise Natural grain patterns Too clean or uniform
Compression artifacts JPEG-consistent Inconsistent patterns
Color distribution Natural histogram Artificial smoothing
🎵 Audio/Video Detection

Analyzes format metadata, encoder signatures, and AI tool markers in:

  • File headers and container metadata
  • Encoding parameters and profiles
  • Generation tool fingerprints (e.g., ElevenLabs, Suno markers)
  • Temporal consistency patterns

📖 API Reference

Endpoints

Endpoint Method Description
/verify POST Analyze content for AI generation
/verify/{id} GET Retrieve analysis by ID
/batch POST Analyze multiple items
/health GET Health check
/metrics GET Prometheus metrics

Detailed Analysis

curl -X POST "http://localhost:8080/verify?detailed=true" \
  -H "Content-Type: application/json" \
  -d '{"text": "Your content here"}'

Response with full signal breakdown:

{
  "id": "hm_xyz789",
  "verdict": "ai",
  "confidence": 0.92,
  "content_type": "text",
  "signals": {
    "sentence_variance": 0.12,
    "vocabulary_richness": 0.34,
    "contraction_ratio": 0.02,
    "ai_phrases_detected": ["It's important to note", "Additionally"],
    "hedging_score": 0.78
  },
  "processing_time_ms": 23
}

Image Analysis

curl -X POST http://localhost:8080/verify \
  -F "image=@photo.jpg"

⚙️ Configuration

Variable Default Description
PORT 8080 Server port
HOST 0.0.0.0 Bind address
ENV development Environment mode
LOG_LEVEL info Logging verbosity
LOG_FORMAT json Log format (json/text)
METRICS_ENABLED true Enable Prometheus metrics
CORS_ORIGINS * Allowed CORS origins
MAX_TEXT_LENGTH 100000 Max text chars
MAX_IMAGE_SIZE 10MB Max image upload

Docker Compose

version: '3.8'
services:
  manuscript:
    image: manuscript/manuscript:latest
    ports:
      - "8080:8080"
    environment:
      - LOG_LEVEL=info
      - METRICS_ENABLED=true
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: manuscript
spec:
  replicas: 3
  selector:
    matchLabels:
      app: manuscript
  template:
    spec:
      containers:
      - name: manuscript
        image: manuscript/manuscript:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

📊 Performance

Benchmarked on AWS c5.xlarge (4 vCPU, 8GB RAM):

Metric Manuscript Alternative A Alternative B
Requests/sec 12,400 N/A (cloud) N/A (cloud)
Latency (p50) 8ms 180ms 240ms
Latency (p99) 23ms 890ms 1200ms
Memory usage 45MB N/A N/A
Cold start 150ms N/A N/A

🗺️ Roadmap

  • Text detection (statistical analysis)
  • Image detection (EXIF + forensics)
  • Audio detection (metadata analysis)
  • Video detection (container analysis)
  • Docker support
  • Prometheus metrics
  • 🔜 Browser extension
  • 🔜 VS Code extension
  • 🔜 WordPress plugin
  • 🔜 Python SDK
  • 🔜 JavaScript SDK
  • 🔜 Webhook notifications
  • 🔜 Admin dashboard

See our project board for the full roadmap.


🤝 Contributing

We love contributions! Here's how to get involved:

  1. Star this repo ⭐ — It helps more than you think!
  2. Report bugsOpen an issue
  3. Suggest featuresStart a discussion
  4. Submit PRs — See CONTRIBUTING.md

Good First Issues

Looking to contribute? Check out issues labeled good first issue.

Help Us Improve Accuracy

Found a false positive or false negative? Submit a sample to help improve detection accuracy!


💬 Community


📜 License

MIT License — use it however you want. See LICENSE for details.


⭐ Star History

Star History Chart


If Manuscript helps you, please consider giving it a ⭐

It takes 2 seconds and helps the project reach more people who need privacy-first AI detection.


Made with ❤️ by Vin Patel and contributors

⬆ Back to Top

About

First Offline Open Source AI Content Detector - Privacy-First, Multi-Modal Detection for Text, Images, Audio & Video

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

 
 
 

Contributors