Skip to content

Possible typo in Appendix B pseudocode: DWDSE weight for Q_absorb #18

@ParadoxZW

Description

@ParadoxZW

Hi authors,

Thank you for releasing the code and detailed appendices for “Score-Entropy Discrete Diffusion (SEDD)”—v3 (2024-10-06). The paper has been very helpful for my own work. While reading the supplementary Algorithm 1 (“Training with DWDSE”), I noticed a small mismatch between the theoretical loss in the main text and the pseudocode in Appendix B, and I wanted to check whether I’m misunderstanding something.


What I believe the correct weight should be

In Eq. (10) of the paper (and again in Theorem 3.6), the inner sum for the DWDSE term is weighted by the transition rate

$$Q_t(x_t, y)$$

For the two noise processes this specialises to:

Noise kernel Non-diagonal weight $Q_t(x_t,y)$
Uniform $\sigma(t) \cdot 1_{{y \neq x_t}}$
Absorb $\sigma(t) \cdot 1_{{y = \text{[MASK]}}}$

What the pseudocode currently does

In Appendix B, Algorithm 1, the weight is hard-coded as:

\sigma(t) (1 - \delta_{x_t}(y))   // omit some sum operators 

That equals $Q_t$ for the Uniform kernel, but for the Absorb kernel it would give non-zero weight to all $y\neq x_t$ (not only to [MASK]).

Request for confirmation

Could you please confirm whether the pseudocode line is indeed a small typo?
I might have misunderstood the intended interpretation of $Q_t$, so any clarification would be greatly appreciated. I believe this issue may alos help others interested in SEDD.

Thanks again for the excellent work and for open-sourcing everything.

Best regards,

Zhenwei

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions