Skip to content

Conversation

@Andrei-Aksionov
Copy link
Contributor

Instead of indexing positional embeddings we can slice them. It has couple of benefits:

  1. Looks cleaner
  2. When indexing - returns a new tensor (plus a new tensor each time is created with torch.arange command). In contrast with slicing a view of a tensor is returned (basically the same underlying data).
ptr = lambda x: x.storage().data_ptr()

x = torch.nn.Embedding(128, 256)
out_slicing = x.weight[:3]
out_indexing = x(torch.tensor(range(3)))

print(x, "", ptr(x.weight))
print(out_slicing.shape, ptr(out_slicing))
print(out_indexing.shape, ptr(out_indexing))
print(f"out_slicing equals to out_indexing: {torch.equal(out_slicing, out_indexing)}")
--------------------------------------------------------------------------------------
(example output):
>> Embedding(128, 256)  140351087181824
>> torch.Size([3, 256]) 140351087181824
>> torch.Size([3, 256]) 140350970082304
>> out_slicing equals to out_indexing: True

As you can see when indexing the returned tensor has a different underlying data storage, where after slicing - the same.
"Slicing creates a view of the tensor, which shares the underlying data but contains information about the memory offsets used for the visible data. This avoids having to copy the data frequently, which makes a lot of operations much more efficient"[1]

Instead of indexing positional embeddings we can slice them. It has
couple of benefits:
1. Looks cleaner
2. When indexing - returns new tensor (plus a new tensor each time is
   created with torch.arange comand). In contrast with slicing a view of
   a tensor is returned (basically the same underlying data)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant