[Bug/Question] Significant discrepancy in gFID between on-the-fly evaluation and offline sampling (SigLIP 2 Stage 2)
Description
Hi, I am currently reproducing SigLIP 2 Stage 2. I've encountered a consistent discrepancy in gFID scores between the on-the-fly evaluation (performed during training) and the offline evaluation (using sample_ddp.py with the saved checkpoint).
At Epoch 30, I observed a gap of approximately 4.0 gFID between the two evaluation methods. Is this a known behavior, or does it suggest an inconsistency in my environment or configuration?
Comparison of Configurations
I noticed that my training and sampling configs differ in the transport parameters:
| Component |
Training (On-the-fly) |
Offline (sample_ddp.py) |
| Model |
DiTwDDTHead |
DiTwDDTHead (from ep-0000030.pt) |
| Transport Type |
Linear / velocity |
Linear / velocity |
| Time Dist Type |
logit-normal_0_1 |
uniform |
| Sampler |
ODE (Euler, 50 steps) |
ODE (Euler, 50 steps) |
| CFG Scale |
1.0 |
1.0 |
Detailed Config Snippets
1. Training Config (On-the-fly Eval)
stage_2:
target: stage2.models.DDT.DiTwDDTHead
params:
input_size: 16
patch_size: 1
in_channels: 768
hidden_size: [1152, 2048]
depth: [28, 2]
num_heads: [16, 16]
mlp_ratio: 4.0
class_dropout_prob: 0.1
num_classes: 1000
use_qknorm: false
use_swiglu: true
use_rope: true
use_rmsnorm: true
wo_shift: false
use_pos_embed: true
transport:
params:
path_type: Linear
prediction: velocity
loss_weight: null
time_dist_type: logit-normal_0_1 # <--- NOTE: logit-normal
sampler:
mode: ODE
params:
sampling_method: euler
num_steps: 50
atol: 1.0e-06
rtol: 0.001
reverse: false
guidance:
method: cfg
scale: 1.0
2. Eval Config
stage_2:
target: stage2.models.DDT.DiTwDDTHead
params:
input_size: 16
patch_size: 1
in_channels: 768
hidden_size: [1152, 2048]
depth: [28, 2]
num_heads: [16, 16]
mlp_ratio: 4.0
class_dropout_prob: 0.1
num_classes: 1000
use_qknorm: False
use_swiglu: True
use_rope: True
use_rmsnorm: True
wo_shift: False
use_pos_embed: True
ckpt: 'ckpts/stage-dit/siglip-B-rae-fixbug/checkpoints/ep-0000030.pt'
transport:
params:
path_type: 'Linear'
prediction: 'velocity'
time_dist_type: 'uniform' # <--- NOTE: uniform
sampler:
mode: ODE
params:
sampling_method: 'euler'
num_steps: 50
atol: 1e-6
rtol: 1e-3
reverse: False
guidance:
method: 'cfg'
scale: 1.0
[Bug/Question] Significant discrepancy in gFID between on-the-fly evaluation and offline sampling (SigLIP 2 Stage 2)
Description
Hi, I am currently reproducing SigLIP 2 Stage 2. I've encountered a consistent discrepancy in gFID scores between the on-the-fly evaluation (performed during training) and the offline evaluation (using
sample_ddp.pywith the saved checkpoint).At Epoch 30, I observed a gap of approximately 4.0 gFID between the two evaluation methods. Is this a known behavior, or does it suggest an inconsistency in my environment or configuration?
Comparison of Configurations
I noticed that my training and sampling configs differ in the
transportparameters:sample_ddp.py)ep-0000030.pt)Linear/velocityLinear/velocitylogit-normal_0_1uniformDetailed Config Snippets
1. Training Config (On-the-fly Eval)
2. Eval Config