Significant discrepancy in gFID between on-the-fly evaluation and offline sampling (SigLIP 2 Stage 2)

# [Bug/Question] Significant discrepancy in gFID between on-the-fly evaluation and offline sampling (SigLIP 2 Stage 2)

### Description
Hi, I am currently reproducing **SigLIP 2 Stage 2**. I've encountered a consistent discrepancy in **gFID** scores between the on-the-fly evaluation (performed during training) and the offline evaluation (using `sample_ddp.py` with the saved checkpoint).

At **Epoch 30**, I observed a gap of approximately **4.0 gFID** between the two evaluation methods. Is this a known behavior, or does it suggest an inconsistency in my environment or configuration?

---

### Comparison of Configurations
I noticed that my training and sampling configs differ in the `transport` parameters:

| Component | Training (On-the-fly) | Offline (`sample_ddp.py`) |
| :--- | :--- | :--- |
| **Model** | DiTwDDTHead | DiTwDDTHead (from `ep-0000030.pt`) |
| **Transport Type** | `Linear` / `velocity` | `Linear` / `velocity` |
| **Time Dist Type** | **`logit-normal_0_1`** | **`uniform`** |
| **Sampler** | ODE (Euler, 50 steps) | ODE (Euler, 50 steps) |
| **CFG Scale** | 1.0 | 1.0 |

---

### Detailed Config Snippets

#### 1. Training Config (On-the-fly Eval)
```yaml
stage_2:
  target: stage2.models.DDT.DiTwDDTHead
  params:
    input_size: 16
    patch_size: 1
    in_channels: 768
    hidden_size: [1152, 2048]
    depth: [28, 2]
    num_heads: [16, 16]
    mlp_ratio: 4.0
    class_dropout_prob: 0.1
    num_classes: 1000
    use_qknorm: false
    use_swiglu: true
    use_rope: true
    use_rmsnorm: true
    wo_shift: false
    use_pos_embed: true

transport:
  params:
    path_type: Linear
    prediction: velocity
    loss_weight: null
    time_dist_type: logit-normal_0_1 # <--- NOTE: logit-normal

sampler:
  mode: ODE
  params:
    sampling_method: euler
    num_steps: 50
    atol: 1.0e-06
    rtol: 0.001
    reverse: false

guidance:
  method: cfg
  scale: 1.0
```
#### 2. Eval Config 
```yaml
stage_2:
  target: stage2.models.DDT.DiTwDDTHead
  params:
    input_size: 16
    patch_size: 1
    in_channels: 768
    hidden_size: [1152, 2048]
    depth: [28, 2]
    num_heads: [16, 16]
    mlp_ratio: 4.0
    class_dropout_prob: 0.1
    num_classes: 1000
    use_qknorm: False
    use_swiglu: True
    use_rope: True
    use_rmsnorm: True
    wo_shift: False
    use_pos_embed: True
  ckpt: 'ckpts/stage-dit/siglip-B-rae-fixbug/checkpoints/ep-0000030.pt'

transport:
  params:
    path_type: 'Linear'
    prediction: 'velocity'
    time_dist_type: 'uniform' # <--- NOTE: uniform

sampler:
  mode: ODE
  params:
    sampling_method: 'euler'
    num_steps: 50
    atol: 1e-6
    rtol: 1e-3
    reverse: False

guidance:
  method: 'cfg'
  scale: 1.0
```

Component	Training (On-the-fly)	Offline (`sample_ddp.py`)
Model	DiTwDDTHead	DiTwDDTHead (from `ep-0000030.pt`)
Transport Type	`Linear` / `velocity`	`Linear` / `velocity`
Time Dist Type	`logit-normal_0_1`	`uniform`
Sampler	ODE (Euler, 50 steps)	ODE (Euler, 50 steps)
CFG Scale	1.0	1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant discrepancy in gFID between on-the-fly evaluation and offline sampling (SigLIP 2 Stage 2) #70

[Bug/Question] Significant discrepancy in gFID between on-the-fly evaluation and offline sampling (SigLIP 2 Stage 2)

Description

Comparison of Configurations

Detailed Config Snippets

1. Training Config (On-the-fly Eval)

2. Eval Config

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Significant discrepancy in gFID between on-the-fly evaluation and offline sampling (SigLIP 2 Stage 2) #70

Description

[Bug/Question] Significant discrepancy in gFID between on-the-fly evaluation and offline sampling (SigLIP 2 Stage 2)

Description

Comparison of Configurations

Detailed Config Snippets

1. Training Config (On-the-fly Eval)

2. Eval Config

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions