Skip to content

question of a recent commit (and also reproducibility questions) #9

@da03

Description

@da03

Hi Rose,

The recent commit changed x_tp1=x_tp to x_tp=x_tp1 (cb3d345?diff=split?), but are your results based on this new commit or the old one? Since I also got different numbers on Wikisection compared to your paper (based on the version before the above commit), as brought up by another person in the latest post of #7. (Just to make sure, the results in the paper are based on GPT2-small which is called gpt-2 in huggingface, but not GPT2-large/xl right?)

Besides, I have a question about simulate_brownian_bridge: in this function, x_tp1 = x_t * (1- dt/(1-t)) + (dt/(1-t))*B_T + noise, but why is there a dt=0.05? According to the Brownian bridge process, shouldn't this be either x_tp1 = x_0 * (1 - t) + t * B_T + noise (if you use the older version always interpolating between x_0 and x_T), or x_tp1 = x_t * (1-1/(T-num_samples)) + 1/(T-num_samples) * B_T + noise (if you use the newer version interpolating between x_t and x_T)? And why is this noise term fixed rather than depending on t and T as in Equation 1 in the paper?

Lastly, I wonder if it's possible for you to share one setting of your model (Wikisection, TC-32) since that would address the issue of #7 as well. I can understand that it's hard to share big files, but I think google drive allows uploading big files, and you can remove optimizer.pt to make the checkpoint smaller.

Thanks,
Yuntian

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions