Difference between self-attention and cross-attention in diffusion model unet #8555
                  
                    
                      Ahmad-Omar-Ahsan
                    
                  
                
                  started this conversation in
                General
              
            Replies: 1 comment 1 reply
-
| Hi, I haven't worked with this exact implementation. But generally, if you only have a few discrete labels, self-attention is usually fine, the model will learn to condition on those. Cross-attention is really good when your conditioning input has more structure (clinical features, text, etc.), since it lets the network focus dynamically instead of treating the label as a simple embedding. For  Hope this helps :) | 
Beta Was this translation helpful? Give feedback.
                  
                    1 reply
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
I am training a 2D conditional diffusion model on different labels. At the moment, I am only changing the number of classes parameter in the U-Net. I noticed that there is a
context-embedargument, which goes along with thewith_conditioningargument. Going through the code, it looks like if with_conditioning is set to True, then it calls cross-attention; otherwise, it calls self-attention.Which would be better, cross-attention or self-attention? Secondly, if I decide to use cross-attention, what should the size of my context embedding be?
Beta Was this translation helpful? Give feedback.
All reactions