Public-Key Cryptography at the Image Sensor Level + C2PA-aware AI Media Detection + ML-enhanced Detection
Develop a privacy-preserving verification platform that determines whether an image or video was:
- Captured by a real camera (via sensor-level cryptographic proof and/or valid C2PA credentials),
- Likely AI-generated (Nano Banana, Sora, DALL·E, etc.), or
- Provenance unknown.
The system must integrate cryptographic proofs and machine learning classifiers to ensure robustness against adversarial manipulation.
In-scope
- Verification via C2PA manifests.
- Verification via sensor-level PKI signatures.
- ML/AI-based detection pipeline for AI media classification.
- Privacy-first operation: ephemeral processing only, no permanent storage.
Out-of-scope
- Persistent content storage or user profiling.
Frontend (React)
- Simple drag-and-drop or URL input.
- Displays authenticity result and rationale.
- Option to show cryptographic and ML-based verification details.
Backend (Python/FastAPI)
- Orchestration of verification pipeline.
- Calls C2PA validator, Sensor-PKI verifier, and ML classifiers.
- Stateless; horizontally scalable.
Database (MongoDB)
- Stores job states, extracted metadata, and ML features temporarily.
- TTL expiration (default: 24h).
Deployment (Docker)
- Containers: frontend, API, workers, Mongo, Nginx.
- Portable and easily deployed to cloud or on-prem environments.
Sensor-PKI
- Cameras sign capture hashes with secure hardware keys.
- Public key infrastructure maintained by manufacturers.
- Signatures embedded as COSE/CBOR envelopes or as C2PA assertions.
C2PA
- Validate provenance claims, trust chains, edit history.
- Expose results to users.
When cryptographic proofs are absent or stripped, use ML classifiers:
- Convolutional Neural Networks (CNNs) for pixel-level artifact detection.
- Noise Residual & PRNU-based models for identifying real vs synthetic sources.
- Transformers (Vision + Multimodal) for AI-model artifact detection.
- Metadata anomaly detection using ML to classify inconsistencies (e.g., frame cadence, EXIF tags).
Classifier Arms Race Strategy
- Continuously retrain models against new AI generators.
- Maintain benchmark sets of AI-generated vs. real content.
- Ensemble classifiers (cryptographic + forensic + ML) for final decision.
- Intake: File upload or URL provided.
- Hashing & Pre-checks.
- C2PA Verification (if present).
- Sensor-PKI Verification (if present).
- ML Detection Pipeline (if C2PA/PKI missing or inconclusive).
- Extract noise features, encoder toolmarks, metadata.
- Run ensemble classifiers.
- Decision Logic:
- Authentic (Camera)
- Authentic (C2PA)
- Likely AI-generated
- Unknown
- Return Results to frontend.
- Purge Artifacts after TTL expiry.
- No raw content retention.
- Ephemeral job storage in MongoDB.
- Explicit “Delete Now” option for users.
- Transparency page: clearly explains privacy guarantees.
POST /verify— Upload file/URL. Returns{ job_id }.GET /verify/{job_id}— Returns status, authenticity label, cryptographic proofs, ML classifier results.DELETE /verify/{job_id}— Force immediate deletion.GET /health— Service health check.
- jobs:
{ id, hash, status, label, reasons[], createdAt }(TTL). - proofs:
{ job_id, c2pa, sensor_pki, ml_results }. - metrics: anonymized aggregate counts (e.g., % AI, % authentic).
- Docker Compose for development.
- Kubernetes-ready for production scaling.
- Secrets managed via Vault/KMS.
- TLS via Nginx reverse proxy.
- Precision/Recall on benchmark datasets.
- Coverage: % with valid proofs (C2PA/PKI).
- Detection performance: ML classifier F1-score > 90%.
- Latency: <2s per image, <4s per short video.
- User comprehension: clear differentiation between “Unknown” and “Likely AI”.
- Weeks 1–2: Requirements, trust anchor setup.
- Weeks 3–6: C2PA verification implementation.
- Weeks 5–8: Sensor-PKI signature verification.
- Weeks 7–10: ML classifier integration (CNN + PRNU + metadata).
- Weeks 9–12: Ensemble decision logic + testing.
- Weeks 12–14: Security hardening, deployment, documentation.
- Rapid evolution of AI image generators.
- Partial adoption of C2PA by platforms.
- Potential adversarial bypass of ML classifiers.
- Governance of manufacturer trust anchors.
- System Whitepaper (cryptography + ML).
- Privacy & Security Policy.
- API Reference.
- ML Training Dataset Protocols.
- Threat Model Analysis.
- Positive proofs (C2PA, sensor-PKI) → strong cryptographic guarantees.
- ML classifiers → resilience against provenance stripping and AI artifacts.
- Ensemble decision → layered assurance, user-facing clarity.