You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When embedding my text for conditioning, the trick for classifier-free guidance is to drop the embedding sometimes (usually 10% of the time).
My question is, what does drop mean? It seems I have come across two variants: using a random tensor as a substitute or a zero tensor.
GLIDE mentions in section 2.3 "we sometimes replace text captions with an empty sequence" - this would be a third option, using the embedding from the empty string?
I haven't been able to find any explanation on this, does someone know?
The text was updated successfully, but these errors were encountered:
@lucala I think it does not matter which method is used for zeroing out as long as it is consistent between sampling and training. Although I do not have experiments on that myself, it makes sense.
When embedding my text for conditioning, the trick for classifier-free guidance is to drop the embedding sometimes (usually 10% of the time).
My question is, what does drop mean? It seems I have come across two variants: using a random tensor as a substitute or a zero tensor.
GLIDE mentions in section 2.3 "we sometimes replace text captions with an empty sequence" - this would be a third option, using the embedding from the empty string?
I haven't been able to find any explanation on this, does someone know?
The text was updated successfully, but these errors were encountered: