Description
Hello, @ruifanxu.
I would like to ask you about the filling of sentences less than the maximum length.
'''
def mask(self, seqs):
device = next(self.parameters()).device
batch_size, seq_len = seqs.shape
mask = torch.triu(torch.ones((seq_len,seq_len), dtype=torch.long), diagonal=1).to(device) # [seq_len ,seq_len]
pad = torch.eq(seqs,self.padding_idx) # [n, seq_len]
mask = torch.where(pad[:,None,None,:],1,mask[None,None,:,:]).to(device) # [n, 1, seq_len, seq_len]
return mask>0 # [n, 1, seq_len, seq_len]
'''
Suppose the longest sentence is 200.
My idea is that sentences with less than 200 words should be given zero padding so that it will not affect the attention calculation and the generation of the target sequence.
Therefore, if there are only three words in a sentence, the following 197 (step) should be 0 (zero padding), otherwise the following parameters (context= [n, step, model_dim]) will have model_dim without step .
However, the above program does not seem to be filled with zeros? Only the future mask. I want to ask if I have an understanding error? Thank you.
Looking forward to your reply!Thank you