Multi-modal watermark dataset contribution: 1,113 files across image/audio/video (SynthID + C2PA)

## Multi-Modal Watermark Dataset: 1,633 Files

**Download**: [dataset.tar.gz (1.3 GB)](https://github.com/biangacila/reverse-SynthID/releases/download/dataset-v1/dataset.tar.gz)

---

### Summary

| Modality | Model | Watermark | Reference | Diverse | Total |
|----------|-------|-----------|-----------|---------|-------|
| Image | `gemini/nano-banana-pro-preview` | **SynthID** | 390 | 100 | **490** |
| Image | `gemini/gemini-3.1-flash-image-preview` | **SynthID** | 489 | 100 | **589** |
| Image | `openai/dall-e-3` | C2PA metadata | 180 | 100 | **280** |
| Audio | `gemini/gemini-2.5-flash-preview-tts` | **SynthID** | 187 | 30 | **217** |
| Video | `gemini/veo-3.1-generate-preview` | **SynthID** | 18 | 10 | **28** |
| Video | `gemini/veo-3.0-fast-generate-001` | **SynthID** | 6 | — | **6** |
| Video | `openai/sora-2` | C2PA metadata | 13 | 10 | **23** |
| | | | **1,283** | **350** | **1,633** |

Only models with **confirmed watermarking** were used. Models without confirmed watermarks (OpenAI TTS, gpt-image-1/1.5, FLUX, DashScope) were intentionally excluded.

---

### Image — 1,359 files

#### Nano Banana Pro SynthID (490 files)

The model the project specifically requests for contributions. Generated via image-to-image (send pure color PNG → NBP recreates with SynthID).

**Reference (390):** 6 colors × 7 resolutions × ~10 each.

| Color | RGB | Purpose |
|-------|-----|---------|
| Black | `#000000` | Watermark ≈ entire pixel content |
| White | `#FFFFFF` | Cross-validation via phase inversion |
| Red | `#FF0000` | R-channel carrier isolation |
| Green | `#00FF00` | G-channel carrier isolation (strongest per project findings) |
| Blue | `#0000FF` | B-channel carrier isolation |
| Gray | `#808080` | Mid-tone carrier behavior |

**Diverse (100):** Text-to-image with varied prompts.

#### Gemini 3.1 Flash SynthID (589 files)

Same approach as NBP but from a different Gemini model — allows comparing SynthID patterns across model variants.

**Reference (489):** 6 colors × 7 resolutions × ~10 each.

**7 distinct output resolutions** (each has different SynthID carrier frequency positions):

| Output Resolution | Input Size | Count (NBP) | Count (Flash) |
|-------------------|-----------|-------------|---------------|
| 1024×1024 | 1024×1024 | 53 | 67 |
| 1195×896 | 640×480 | 49 | 71 |
| 1264×843 | 768×512 | 56 | 72 |
| 1365×768 | 1920×1080 | 53 | 69 |
| 1440×720 | 2048×1024 | 56 | 63 |
| 720×1440 | 1024×2048 | 54 | 72 |
| 768×1365 | 1080×1920 | 53 | 71 |

**Diverse (100):** Text-to-image with varied prompts. Primarily 1408×768.

#### DALL-E 3 C2PA Baseline (280 files)

Control group — C2PA metadata watermark (trivially removable via `exiftool -all= image.png`).

**Reference (180):** 6 colors × 3 sizes (1024×1024, 1024×1792, 1792×1024) × 10 each.
**Diverse (100):** Mixed sizes.

---

### Audio — 217 files

#### Gemini TTS SynthID

The only confirmed audio watermarker. Same principle as black images: **minimal utterances maximize watermark-to-content ratio**.

**Reference (187):** 8 voices × 10 texts × 3 repeated generations (for differential extraction).

- Voices: Kore, Puck, Charon, Fenrir, Aoede, Leda, Orus, Zephyr
- Texts: "A.", "One.", "Ah.", "Hmm.", "Okay.", "Yes.", "No.", "Hello.", "The.", "Aaaaaaaah."
- Format: WAV, 24 kHz, 16-bit mono, 0.5–2.9s

**Diverse (30):** Natural sentences, 4.5–7.6s.

---

### Video — 57 files

#### Veo 3.1 SynthID (28 files)

Reference (18): Solid color screens × 3 each, 4s, 1280×720.
Diverse (10): Natural scenes, 8s.

#### Veo 3.0 Fast SynthID (6 files)

Reference (6): Black/white, 4s.

#### Sora 2 C2PA Baseline (23 files)

Reference (13): Solid colors (some blocked by moderation). Diverse (10): 8s.

---

### Directory Structure

```
dataset/
├── image/
│   ├── nano-banana-pro/
│   │   ├── reference/    # 390 PNGs — {color}_{WxH}_{idx}.png
│   │   └── diverse/      # 100 PNGs — diverse_{idx}.png
│   ├── gemini/
│   │   ├── reference/    # 489 PNGs — {color}_{WxH}_{idx}.png
│   │   └── diverse/      # 100 PNGs — diverse_{idx}.png
│   └── dall-e-3/
│       ├── reference/    # 180 PNGs — {color}_{size}_{idx}.png
│       └── diverse/      # 100 PNGs — diverse_{size}_{idx}.png
├── audio/
│   └── gemini-tts/
│       ├── reference/    # 187 WAVs — ref{NN}_{voice}_{idx}.wav
│       └── diverse/      #  30 WAVs — diverse_{voice}_{idx}.wav
└── video/
    ├── veo-3.1/
    │   ├── reference/    #  18 MP4s — {color}_{idx}.mp4
    │   └── diverse/      #  10 MP4s — diverse_{idx}.mp4
    ├── veo-3.0-fast/
    │   └── reference/    #   6 MP4s
    └── sora-2/
        ├── reference/    #  13 MP4s
        └── diverse/      #  10 MP4s
```

### Watermark Evidence

| Model | System | Evidence |
|-------|--------|----------|
| Nano Banana Pro / Gemini Flash / Veo / Gemini TTS | SynthID (invisible, frequency-domain) | [deepmind.google/technologies/synthid](https://deepmind.google/technologies/synthid/) |
| DALL-E 3 / Sora 2 | C2PA metadata (XMP, easily stripped) | [c2pa.org](https://c2pa.org) |

### Value

1. **Nano Banana Pro data** — the specific model the project requests for contributions
2. **Two Gemini model variants** (NBP + Flash) — compare SynthID patterns across model versions
3. **7 new resolution profiles** — existing dataset only has 1024×1024 and 2816×1536
4. **First audio watermark dataset** — enables SynthID audio detection/removal research
5. **First video watermark dataset** — enables SynthID multi-frame spectral analysis
6. **Per-channel R/G/B reference** — direct per-channel carrier isolation
7. **C2PA baseline** — control group for invisible vs. metadata watermarking
8. **Differential audio pairs** — same text × voice × 3 gens for cross-correlation extraction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-modal watermark dataset contribution: 1,113 files across image/audio/video (SynthID + C2PA) #19

Multi-Modal Watermark Dataset: 1,633 Files

Summary

Image — 1,359 files

Nano Banana Pro SynthID (490 files)

Gemini 3.1 Flash SynthID (589 files)

DALL-E 3 C2PA Baseline (280 files)

Audio — 217 files

Gemini TTS SynthID

Video — 57 files

Veo 3.1 SynthID (28 files)

Veo 3.0 Fast SynthID (6 files)

Sora 2 C2PA Baseline (23 files)

Directory Structure

Watermark Evidence

Value

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Modality	Model	Watermark	Reference	Diverse	Total
Image	`gemini/nano-banana-pro-preview`	SynthID	390	100	490
Image	`gemini/gemini-3.1-flash-image-preview`	SynthID	489	100	589
Image	`openai/dall-e-3`	C2PA metadata	180	100	280
Audio	`gemini/gemini-2.5-flash-preview-tts`	SynthID	187	30	217
Video	`gemini/veo-3.1-generate-preview`	SynthID	18	10	28
Video	`gemini/veo-3.0-fast-generate-001`	SynthID	6	—	6
Video	`openai/sora-2`	C2PA metadata	13	10	23
			1,283	350	1,633

Color	RGB	Purpose
Black	`#000000`	Watermark ≈ entire pixel content
White	`#FFFFFF`	Cross-validation via phase inversion
Red	`#FF0000`	R-channel carrier isolation
Green	`#00FF00`	G-channel carrier isolation (strongest per project findings)
Blue	`#0000FF`	B-channel carrier isolation
Gray	`#808080`	Mid-tone carrier behavior

Output Resolution	Input Size	Count (NBP)	Count (Flash)
1024×1024	1024×1024	53	67
1195×896	640×480	49	71
1264×843	768×512	56	72
1365×768	1920×1080	53	69
1440×720	2048×1024	56	63
720×1440	1024×2048	54	72
768×1365	1080×1920	53	71

Model	System	Evidence
Nano Banana Pro / Gemini Flash / Veo / Gemini TTS	SynthID (invisible, frequency-domain)	deepmind.google/technologies/synthid
DALL-E 3 / Sora 2	C2PA metadata (XMP, easily stripped)	c2pa.org

Multi-modal watermark dataset contribution: 1,113 files across image/audio/video (SynthID + C2PA) #19

Description

Multi-Modal Watermark Dataset: 1,633 Files

Summary

Image — 1,359 files

Nano Banana Pro SynthID (490 files)

Gemini 3.1 Flash SynthID (589 files)

DALL-E 3 C2PA Baseline (280 files)

Audio — 217 files

Gemini TTS SynthID

Video — 57 files

Veo 3.1 SynthID (28 files)

Veo 3.0 Fast SynthID (6 files)

Sora 2 C2PA Baseline (23 files)

Directory Structure

Watermark Evidence

Value

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions