-
Notifications
You must be signed in to change notification settings - Fork 95
Description
Hi authors,
Thank you for releasing the code and detailed appendices for “Score-Entropy Discrete Diffusion (SEDD)”—v3 (2024-10-06). The paper has been very helpful for my own work. While reading the supplementary Algorithm 1 (“Training with DWDSE”), I noticed a small mismatch between the theoretical loss in the main text and the pseudocode in Appendix B, and I wanted to check whether I’m misunderstanding something.
What I believe the correct weight should be
In Eq. (10) of the paper (and again in Theorem 3.6), the inner sum for the DWDSE term is weighted by the transition rate
For the two noise processes this specialises to:
| Noise kernel | Non-diagonal weight |
|---|---|
| Uniform | |
| Absorb |
What the pseudocode currently does
In Appendix B, Algorithm 1, the weight is hard-coded as:
\sigma(t) (1 - \delta_{x_t}(y)) // omit some sum operators
That equals [MASK]).
Request for confirmation
Could you please confirm whether the pseudocode line is indeed a small typo?
I might have misunderstood the intended interpretation of
Thanks again for the excellent work and for open-sourcing everything.
Best regards,
Zhenwei