in graph_lib.py's Uniform graph you have:
def rate(self, i):
edge = torch.ones(*i.shape, self.dim, device=i.device) / self.dim
edge = edge.scatter(-1, i[..., None], - (self.dim - 1) / self.dim)
return edge
where you are normalizing $Q^{tok}$ by self.dim to avoid blowup.
doesn't this effect the reverse sampling probabilities given in the paper by
$\delta_{x_t^i}(x^i_{t-\Delta t }) + \Delta t Q_t^{tok}(x_t^i,x_{t-\Delta t}^i)s_\theta(x_t, t)$?