🔍 Manuscript

The Open Source AI Content Detector That Respects Your Privacy

Detect AI-generated text, images, audio & video—100% offline, self-hosted, zero external calls.

🌐 Website • 🚀 Quick Start • 📖 Documentation • 🎯 Use Cases • 💬 Discussions

🤔 The Problem

Every AI detection service requires you to upload your content to their servers. That's a dealbreaker for:

🏥 Healthcare — HIPAA compliance prohibits sending patient data externally
⚖️ Law Firms — Attorney-client privilege can't survive third-party uploads
🏦 Finance — SOC2/PCI requirements restrict data sharing
🛡️ Government — Air-gapped networks, classified environments
🎓 Universities — 100K+ students = $100K+ annual licensing

Manuscript runs entirely on YOUR infrastructure. Your data never leaves your network.

⚡ Quick Start

Get running in under 30 seconds:

# Option 1: Docker (Recommended)
docker run -p 8080:8080 manuscript/manuscript

# Option 2: Go Install
go install github.com/vinpatel/manuscript/cmd/api@latest
manuscript

# Option 3: Build from Source
git clone https://github.com/vinpatel/manuscript.git
cd manuscript && make run

Then detect AI content:

curl -X POST http://localhost:8080/verify \
  -H "Content-Type: application/json" \
  -d '{"text": "Your content here"}'

Response:

{
  "id": "hm_abc123",
  "verdict": "human",
  "confidence": 0.87,
  "signals": {
    "sentence_variance": 0.42,
    "vocabulary_richness": 0.78,
    "contraction_ratio": 0.15
  }
}

🎯 Use Cases

👨‍💻 For Developers

Integration	What It Does
Content Platforms	Filter AI spam from UGC
Hiring Tools	Verify candidate samples
EdTech	Academic integrity checks
Social Apps	Flag synthetic profiles
CMS Plugins	WordPress, Ghost, Strapi
CI/CD	Lint content like code
Extensions	Detect AI on any webpage

🏢 For Organizations

Industry	Why Manuscript
Enterprise	Compliance (HIPAA, GDPR, SOC2)
Government	Air-gapped, classified envs
Legal	Protect client privilege
Healthcare	Patient data stays on-prem
Finance	Regulatory restrictions
Education	Scale without per-seat costs
Media	Own the tool, don't rent it

🏆 Why Manuscript?

Feature	Manuscript	GPTZero	Originality.ai	Turnitin
Self-hosted	✅	❌	❌	❌
Works Offline	✅	❌	❌	❌
Open Source	✅ MIT	❌	❌	❌
Zero Cost	✅ Forever	❌ $$$	❌ $$$	❌ $$$$
Privacy-First	✅ No data leaves	⚠️ Cloud-only	⚠️ Cloud-only	⚠️ Cloud-only
Multi-Modal	✅ Text/Image/Audio/Video	⚠️ Text only	⚠️ Text only	⚠️ Text only
API Limits	∞ Unlimited	⚠️ Tiered	⚠️ Per-check	⚠️ Per-student

🔬 How It Works

Manuscript uses statistical and forensic analysis—no ML models, no GPU required, instant results.

📝 Text Detection Signals

Signal	Human Writing	AI Writing
Sentence length variance	High (varied rhythm)	Low (uniform)
Vocabulary richness	Diverse, personal words	"Safe" common words
Contractions	"don't", "I'm", "we'll"	"do not", "I am"
Punctuation variety	!?;:—...	Mostly periods
AI phrases	Rare	"As an AI...", "It's important to note..."
Hedging language	Natural uncertainty	Excessive qualifiers
Repetition patterns	Organic callbacks	Mechanical repetition

🖼️ Image Detection Signals

Signal	Real Photo	AI-Generated
EXIF metadata	Present (camera, GPS, date)	Missing or fake
Camera make	Apple, Canon, Sony, etc.	None or generic
Sensor noise	Natural grain patterns	Too clean or uniform
Compression artifacts	JPEG-consistent	Inconsistent patterns
Color distribution	Natural histogram	Artificial smoothing

🎵 Audio/Video Detection

Analyzes format metadata, encoder signatures, and AI tool markers in:

File headers and container metadata
Encoding parameters and profiles
Generation tool fingerprints (e.g., ElevenLabs, Suno markers)
Temporal consistency patterns

📖 API Reference

Endpoints

Endpoint	Method	Description
`/verify`	POST	Analyze content for AI generation
`/verify/{id}`	GET	Retrieve analysis by ID
`/batch`	POST	Analyze multiple items
`/health`	GET	Health check
`/metrics`	GET	Prometheus metrics

Detailed Analysis

curl -X POST "http://localhost:8080/verify?detailed=true" \
  -H "Content-Type: application/json" \
  -d '{"text": "Your content here"}'

Response with full signal breakdown:

{
  "id": "hm_xyz789",
  "verdict": "ai",
  "confidence": 0.92,
  "content_type": "text",
  "signals": {
    "sentence_variance": 0.12,
    "vocabulary_richness": 0.34,
    "contraction_ratio": 0.02,
    "ai_phrases_detected": ["It's important to note", "Additionally"],
    "hedging_score": 0.78
  },
  "processing_time_ms": 23
}

Image Analysis

curl -X POST http://localhost:8080/verify \
  -F "image=@photo.jpg"

⚙️ Configuration

Variable	Default	Description
`PORT`	`8080`	Server port
`HOST`	`0.0.0.0`	Bind address
`ENV`	`development`	Environment mode
`LOG_LEVEL`	`info`	Logging verbosity
`LOG_FORMAT`	`json`	Log format (json/text)
`METRICS_ENABLED`	`true`	Enable Prometheus metrics
`CORS_ORIGINS`	`*`	Allowed CORS origins
`MAX_TEXT_LENGTH`	`100000`	Max text chars
`MAX_IMAGE_SIZE`	`10MB`	Max image upload

Docker Compose

version: '3.8'
services:
  manuscript:
    image: manuscript/manuscript:latest
    ports:
      - "8080:8080"
    environment:
      - LOG_LEVEL=info
      - METRICS_ENABLED=true
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3

Kubernetes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: manuscript
spec:
  replicas: 3
  selector:
    matchLabels:
      app: manuscript
  template:
    spec:
      containers:
      - name: manuscript
        image: manuscript/manuscript:latest
        ports:
        - containerPort: 8080
        resources:
          requests:
            cpu: "100m"
            memory: "128Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"

📊 Performance

Benchmarked on AWS c5.xlarge (4 vCPU, 8GB RAM):

Metric	Manuscript	Alternative A	Alternative B
Requests/sec	12,400	N/A (cloud)	N/A (cloud)
Latency (p50)	8ms	180ms	240ms
Latency (p99)	23ms	890ms	1200ms
Memory usage	45MB	N/A	N/A
Cold start	150ms	N/A	N/A

🗺️ Roadmap

See our project board for the full roadmap.

🤝 Contributing

We love contributions! Here's how to get involved:

Star this repo ⭐ — It helps more than you think!
Report bugs — Open an issue
Suggest features — Start a discussion
Submit PRs — See CONTRIBUTING.md

Good First Issues

Looking to contribute? Check out issues labeled good first issue.

Help Us Improve Accuracy

Found a false positive or false negative? Submit a sample to help improve detection accuracy!

💬 Community

💬 GitHub Discussions — Ask questions, share ideas
🐛 Issue Tracker — Report bugs
🐦 Twitter/X — Updates and announcements
📧 Email — Business inquiries

📜 License

MIT License — use it however you want. See LICENSE for details.

⭐ Star History

If Manuscript helps you, please consider giving it a ⭐

It takes 2 seconds and helps the project reach more people who need privacy-first AI detection.

Made with ❤️ by Vin Patel and contributors

⬆ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
benchmark		benchmark
cmd/api		cmd/api
internal		internal
pkg/logger		pkg/logger
scripts		scripts
website		website
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
IMPROVEMENT_STORIES.md		IMPROVEMENT_STORIES.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 Manuscript

The Open Source AI Content Detector That Respects Your Privacy

🤔 The Problem

⚡ Quick Start

🎯 Use Cases

👨‍💻 For Developers

🏢 For Organizations

🏆 Why Manuscript?

🔬 How It Works

📖 API Reference

Endpoints

Detailed Analysis

Image Analysis

⚙️ Configuration

Docker Compose

Kubernetes

📊 Performance

🗺️ Roadmap

🤝 Contributing

Good First Issues

Help Us Improve Accuracy

💬 Community

📜 License

⭐ Star History

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🔍 Manuscript

The Open Source AI Content Detector That Respects Your Privacy

🤔 The Problem

⚡ Quick Start

🎯 Use Cases

👨‍💻 For Developers

🏢 For Organizations

🏆 Why Manuscript?

🔬 How It Works

📖 API Reference

Endpoints

Detailed Analysis

Image Analysis

⚙️ Configuration

Docker Compose

Kubernetes

📊 Performance

🗺️ Roadmap

🤝 Contributing

Good First Issues

Help Us Improve Accuracy

💬 Community

📜 License

⭐ Star History

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages