a list of things in PyTorch that took me a while to learn when I first came across them.
Admittedly, a lot of these are beginner-level difficulty, but if you do fit under that scope, then I hope it may serve you some use.
Definitions
In-place
When something is changed in-place, that means you won’t create a new i.e. tensor for some operation, and instead apply it on the operand. Hence it will not create new memory.
>>> x
0.2362
[torch.FloatTensor of size 1]
>>> x + y # Addition of two tensors creates a new tensor.
0.9392
[torch.FloatTensor of size 1]
>>> x
0.2362 # The value of x is unchanged.
[torch.FloatTensor of size 1]
>>> x.add_(y) # An in-place addition modifies one of the tensors itself, here the value of x.
0.9392
[torch.FloatTensor of size 1]
>>> x
0.9392
[torch.FloatTensor of size 1]
resources
https://discuss.pytorch.org/t/what-is-in-place-operation/16244/3
Operations
torch.no_grad()
Makes Torch turn off its autograd engine. Hence, new tensors will not be added to the computational graph.
Useful for:
- Performing inference
- Not leaking test data into the model training
>>> x = torch.randn(3, requires_grad=True)
>>> print(x.requires_grad)
True
>>> print((x ** 2).requires_grad)
True
>>> with torch.no_grad():
... print((x ** 2).requires_grad)
False
resources
- intro to autograd
- https://datascience.stackexchange.com/a/71598
- https://stackoverflow.com/questions/72504734/what-is-the-purpose-of-with-torch-no-grad/72504818#72504818
Tensor.unsqueeze()
“Stretches” an existing tensor into a new dimension. This new dimension of size 1 is also called a singleton dimension.
Basically, it’ll wrap the items at the specified dimension into a new array.
The opposite of this is Tensor.squeeze()
, which then “removes” dimensions with length 1.
Tensor.max(dim)
Returns only the largest value. If dim
is given, for dimension dim
.
>>> t = tensor([[1, 3], [4,8]])
# return largest value
>>> t.max()
tensor(8)
# return largest value for each item on 1st dimension
>>> t.max(0)
torch.return_types.max(
values=tensor([4, 8]),
indices=tensor([1, 1]))
# return largest value for each item on 2nd dimension
>>> t.max(1)
torch.return_types.max(
values=tensor([3, 8]),
indices=tensor([1, 1]))
An intuition for this is that it’ll squish dim
into a value of 1.
Tensor.unfold(dim, size, step)
For dimension dim
, get batches of size
every n-th step
. If size
> step
, it truncates the final output.
>>> x = torch.arange(1, 8) # returns tensor([1, 2, 3, 4, 5, 6, 7])
>>> x.unfold(0, 2, 1)
tensor([[ 1., 2.],
[ 2., 3.],
[ 3., 4.],
[ 4., 5.],
[ 5., 6.],
[ 6., 7.]])
torch.cat(list)
- stack: Concatenates sequence of tensors along a new dimension.
- cat: Concatenates the given sequence of seq tensors in the given dimension.
So if A and B are of shape (3, 4)
:
torch.cat([A, B], dim=0)
will be of shape(6, 4)
torch.stack([A, B], dim=0)
will be of shape(2, 3, 4)
>>> torch.cat([tensor([1, 2, 3]), tensor([4, 5, 6])])
tensor([1, 2, 3, 4, 5, 6])
torch.gather(src, dim, idx)
There are 2 requirements that might go unnoticed:
src
andidx
have to be of the same rank.- The size of both
src
andidx
atdim
must match.
This means that, even if idx
only has 1 row of index values, it need to have its own dimension (see the visualization).
>>> states=tensor([[1, 2, 3],[4, 5, 6]])
>>> actions=tensor([1, 2, 1])
>>> states.shape
torch.Size([2, 3])
>>> actions.shape
torch.Size([3])
# they are not of the same rank; we have to unsqueeze `actions` first
>>> actions=actions.unsqueeze(0)
torch.Size([1, 3])
>>> states.gather(1, actions.unsqueeze(0))
tensor([[2, 3, 2]])
torch.view(src, dim1, dim2, ...)
Turns src
into a tensor of the specified dimensions. For -1, the size is left to be guessed.
>>> torch.zeros(3, 5, 6).shape
torch.Size([3, 5, 6])
>>> torch.zeros(3, 5, 6).view(-1).shape
torch.Size([90])
>>> torch.zeros(3, 5, 6).view(-1, 3).shape
torch.Size([30, 3])
>>> torch.zeros(3, 5, 6).view(-1, 3, 3).shape
torch.Size([10, 3, 3])
This is a way to visualize; it’ll basically take each value of the src
tensor and start iterating through the indices, starting on the position lowest (0) and furthest in (last dimension). In our example, [0][0][0]
, then [0][0][1]
, and so on.
torch.nn.Softmax(dim)
Apply a softmax function along the n
th dimension. It’s useful because it can map a range to probabilities. I was mostly wondering why it’s called “softmax”. Apparently, it’s an opposition to “hardmax”, which sets the maximum value to 1
and everything else to 0
. Softmax, however, is non-lossy and maps values to a range.
resources
Tips
Working with batches
Remember that torch.nn.Modular
will output a tensor of the same dimensions as its input.
>>> dqn(tensor([.5]))
tensor([-0.1096, 0.1605, -0.1288, -0.1131], grad_fn=<AddBackward0>)
>>> dqn(tensor([[.5]]))
tensor([[-0.1096, 0.1605, -0.1288, -0.1131]], grad_fn=<AddmmBackward0>)
>>> dqn(tensor([[[.5]]]))
tensor([[[-0.1096, 0.1605, -0.1288, -0.1131]]], grad_fn=<ViewBackward0>)
Dimensions intuition
One easy way to visualize dimensions or to know what dimension some data is in, is by noticing that their index corresponds to their “depth” on the bracket notation.
# 2nd dimension (size 2)
# v
tensor([[[.1, .2, 3.], [.3, .4, 5.]]])
# ^ ^
# 1st dimension (size 1) 3nd dimension (size 3)
(of course, in actual code, the dimension indexes would start at 0.)
Also, the indices for tensor.Size
correctly correspond to their dimension.
>>> tensor([[[.1, .2, 3.], [.3, .4, 5.]]]).shape
torch.Size([1, 2, 3])
Masking
If you have two tensors (or a tensor and an array) A
and mask
of same shape, where mask
has a Boolean data type, you can use the notation A[mask]
to “zip” over A.
NOTE
PyTorch will skip the entire operation for values ofmask
which areFalse
. So for example, if you are assigning some tensorB
toA
, be aware that the length ofB
should be the same of the n. ofTrue
s onmask
(because forFalse
values, the pointer won’t be incremented).
>>> a = tensor([5, 6, 7, 8])
>>> b = tensor([1, 2, 4])
>>> mask = tensor([ True, True, False, True])
>>> a[mask] = b
>>> a
tensor([1, 2, 7, 4])
Reshaping tensors
By using tensor.view(-1)
, you basically “reset” or flatten the tensor’s shape. You can then convert it into any other shape.
>>> x = tensor([[[1, 2, 3], [4, 5, 6]]])
>>> x.view(-1)
tensor([1, 2, 3, 4, 5, 6])
>>> x.view(-1, 2, 3)
tensor([[[1, 2, 3],
[4, 5, 6]]])
Operation between tensors w/ differing shapes
One little quirk is that PyTorch will expect tensors of same shape, but is okay with singleton dimensions. Hence it is the case that:
# doesn't work
>>> tensor([1, 2, 3]) * tensor([[4, 5], [4, 2], [3, 4]])
''' Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1'''
# tensor([[1], [2], [3]]) works!
>>> tensor([1, 2, 3]).unsqueeze(-1) * tensor([[4, 5], [4, 2], [3, 4]])
tensor([[ 4, 5],
[ 8, 4],
[ 9, 12]])
Variable-size tensors
Say you have multiple input tensors of variable size, and you want to lay them all in the same vector, e.g. a batch. Here’s what you can do:
>>> x = [tensor([1, 2, 3]), tensor([3, 4])]
>>> padded_x = torch.nn.utils.rnn.pad_sequence(a, batch_first=True) # similar to torch.stack, but allows padding