Multi-Modal Watermark Dataset: 1,633 Files
Download: dataset.tar.gz (1.3 GB)
Summary
| Modality |
Model |
Watermark |
Reference |
Diverse |
Total |
| Image |
gemini/nano-banana-pro-preview |
SynthID |
390 |
100 |
490 |
| Image |
gemini/gemini-3.1-flash-image-preview |
SynthID |
489 |
100 |
589 |
| Image |
openai/dall-e-3 |
C2PA metadata |
180 |
100 |
280 |
| Audio |
gemini/gemini-2.5-flash-preview-tts |
SynthID |
187 |
30 |
217 |
| Video |
gemini/veo-3.1-generate-preview |
SynthID |
18 |
10 |
28 |
| Video |
gemini/veo-3.0-fast-generate-001 |
SynthID |
6 |
— |
6 |
| Video |
openai/sora-2 |
C2PA metadata |
13 |
10 |
23 |
|
|
|
1,283 |
350 |
1,633 |
Only models with confirmed watermarking were used. Models without confirmed watermarks (OpenAI TTS, gpt-image-1/1.5, FLUX, DashScope) were intentionally excluded.
Image — 1,359 files
Nano Banana Pro SynthID (490 files)
The model the project specifically requests for contributions. Generated via image-to-image (send pure color PNG → NBP recreates with SynthID).
Reference (390): 6 colors × 7 resolutions × ~10 each.
| Color |
RGB |
Purpose |
| Black |
#000000 |
Watermark ≈ entire pixel content |
| White |
#FFFFFF |
Cross-validation via phase inversion |
| Red |
#FF0000 |
R-channel carrier isolation |
| Green |
#00FF00 |
G-channel carrier isolation (strongest per project findings) |
| Blue |
#0000FF |
B-channel carrier isolation |
| Gray |
#808080 |
Mid-tone carrier behavior |
Diverse (100): Text-to-image with varied prompts.
Gemini 3.1 Flash SynthID (589 files)
Same approach as NBP but from a different Gemini model — allows comparing SynthID patterns across model variants.
Reference (489): 6 colors × 7 resolutions × ~10 each.
7 distinct output resolutions (each has different SynthID carrier frequency positions):
| Output Resolution |
Input Size |
Count (NBP) |
Count (Flash) |
| 1024×1024 |
1024×1024 |
53 |
67 |
| 1195×896 |
640×480 |
49 |
71 |
| 1264×843 |
768×512 |
56 |
72 |
| 1365×768 |
1920×1080 |
53 |
69 |
| 1440×720 |
2048×1024 |
56 |
63 |
| 720×1440 |
1024×2048 |
54 |
72 |
| 768×1365 |
1080×1920 |
53 |
71 |
Diverse (100): Text-to-image with varied prompts. Primarily 1408×768.
DALL-E 3 C2PA Baseline (280 files)
Control group — C2PA metadata watermark (trivially removable via exiftool -all= image.png).
Reference (180): 6 colors × 3 sizes (1024×1024, 1024×1792, 1792×1024) × 10 each.
Diverse (100): Mixed sizes.
Audio — 217 files
Gemini TTS SynthID
The only confirmed audio watermarker. Same principle as black images: minimal utterances maximize watermark-to-content ratio.
Reference (187): 8 voices × 10 texts × 3 repeated generations (for differential extraction).
- Voices: Kore, Puck, Charon, Fenrir, Aoede, Leda, Orus, Zephyr
- Texts: "A.", "One.", "Ah.", "Hmm.", "Okay.", "Yes.", "No.", "Hello.", "The.", "Aaaaaaaah."
- Format: WAV, 24 kHz, 16-bit mono, 0.5–2.9s
Diverse (30): Natural sentences, 4.5–7.6s.
Video — 57 files
Veo 3.1 SynthID (28 files)
Reference (18): Solid color screens × 3 each, 4s, 1280×720.
Diverse (10): Natural scenes, 8s.
Veo 3.0 Fast SynthID (6 files)
Reference (6): Black/white, 4s.
Sora 2 C2PA Baseline (23 files)
Reference (13): Solid colors (some blocked by moderation). Diverse (10): 8s.
Directory Structure
dataset/
├── image/
│ ├── nano-banana-pro/
│ │ ├── reference/ # 390 PNGs — {color}_{WxH}_{idx}.png
│ │ └── diverse/ # 100 PNGs — diverse_{idx}.png
│ ├── gemini/
│ │ ├── reference/ # 489 PNGs — {color}_{WxH}_{idx}.png
│ │ └── diverse/ # 100 PNGs — diverse_{idx}.png
│ └── dall-e-3/
│ ├── reference/ # 180 PNGs — {color}_{size}_{idx}.png
│ └── diverse/ # 100 PNGs — diverse_{size}_{idx}.png
├── audio/
│ └── gemini-tts/
│ ├── reference/ # 187 WAVs — ref{NN}_{voice}_{idx}.wav
│ └── diverse/ # 30 WAVs — diverse_{voice}_{idx}.wav
└── video/
├── veo-3.1/
│ ├── reference/ # 18 MP4s — {color}_{idx}.mp4
│ └── diverse/ # 10 MP4s — diverse_{idx}.mp4
├── veo-3.0-fast/
│ └── reference/ # 6 MP4s
└── sora-2/
├── reference/ # 13 MP4s
└── diverse/ # 10 MP4s
Watermark Evidence
Value
- Nano Banana Pro data — the specific model the project requests for contributions
- Two Gemini model variants (NBP + Flash) — compare SynthID patterns across model versions
- 7 new resolution profiles — existing dataset only has 1024×1024 and 2816×1536
- First audio watermark dataset — enables SynthID audio detection/removal research
- First video watermark dataset — enables SynthID multi-frame spectral analysis
- Per-channel R/G/B reference — direct per-channel carrier isolation
- C2PA baseline — control group for invisible vs. metadata watermarking
- Differential audio pairs — same text × voice × 3 gens for cross-correlation extraction
Multi-Modal Watermark Dataset: 1,633 Files
Download: dataset.tar.gz (1.3 GB)
Summary
gemini/nano-banana-pro-previewgemini/gemini-3.1-flash-image-previewopenai/dall-e-3gemini/gemini-2.5-flash-preview-ttsgemini/veo-3.1-generate-previewgemini/veo-3.0-fast-generate-001openai/sora-2Only models with confirmed watermarking were used. Models without confirmed watermarks (OpenAI TTS, gpt-image-1/1.5, FLUX, DashScope) were intentionally excluded.
Image — 1,359 files
Nano Banana Pro SynthID (490 files)
The model the project specifically requests for contributions. Generated via image-to-image (send pure color PNG → NBP recreates with SynthID).
Reference (390): 6 colors × 7 resolutions × ~10 each.
#000000#FFFFFF#FF0000#00FF00#0000FF#808080Diverse (100): Text-to-image with varied prompts.
Gemini 3.1 Flash SynthID (589 files)
Same approach as NBP but from a different Gemini model — allows comparing SynthID patterns across model variants.
Reference (489): 6 colors × 7 resolutions × ~10 each.
7 distinct output resolutions (each has different SynthID carrier frequency positions):
Diverse (100): Text-to-image with varied prompts. Primarily 1408×768.
DALL-E 3 C2PA Baseline (280 files)
Control group — C2PA metadata watermark (trivially removable via
exiftool -all= image.png).Reference (180): 6 colors × 3 sizes (1024×1024, 1024×1792, 1792×1024) × 10 each.
Diverse (100): Mixed sizes.
Audio — 217 files
Gemini TTS SynthID
The only confirmed audio watermarker. Same principle as black images: minimal utterances maximize watermark-to-content ratio.
Reference (187): 8 voices × 10 texts × 3 repeated generations (for differential extraction).
Diverse (30): Natural sentences, 4.5–7.6s.
Video — 57 files
Veo 3.1 SynthID (28 files)
Reference (18): Solid color screens × 3 each, 4s, 1280×720.
Diverse (10): Natural scenes, 8s.
Veo 3.0 Fast SynthID (6 files)
Reference (6): Black/white, 4s.
Sora 2 C2PA Baseline (23 files)
Reference (13): Solid colors (some blocked by moderation). Diverse (10): 8s.
Directory Structure
Watermark Evidence
Value