01-05-2023 - 02-09-2023

PyTorch handbook

a list of things in PyTorch that took me a while to learn when I first came across them.

Admittedly, a lot of these are beginner-level difficulty, but if you do fit under that scope, then I hope it may serve you some use.

Definitions

In-place

When something is changed in-place, that means you won’t create a new i.e. tensor for some operation, and instead apply it on the operand. Hence it will not create new memory.

>>> x 
0.2362 
[torch.FloatTensor of size 1]
>>> x + y # Addition of two tensors creates a new tensor.
0.9392  
[torch.FloatTensor of size 1] 
>>> x 
0.2362 # The value of x is unchanged. 
[torch.FloatTensor of size 1]
>>> x.add_(y) # An in-place addition modifies one of the tensors itself, here the value of x. 
0.9392 
[torch.FloatTensor of size 1] 
>>> x 
0.9392 
[torch.FloatTensor of size 1]

resources

https://discuss.pytorch.org/t/what-is-in-place-operation/16244/3

Operations

`torch.no_grad()`

Makes Torch turn off its autograd engine. Hence, new tensors will not be added to the computational graph.

Useful for:

Performing inference
Not leaking test data into the model training

>>> x = torch.randn(3, requires_grad=True)
>>> print(x.requires_grad)
True
>>> print((x ** 2).requires_grad)
True
>>> with torch.no_grad():
... print((x ** 2).requires_grad)
False

`Tensor.unsqueeze()`

“Stretches” an existing tensor into a new dimension. This new dimension of size 1 is also called a singleton dimension.

X axis is the square brackets in blue; Y in yellow; Z in blue

Basically, it’ll wrap the items at the specified dimension into a new array. The opposite of this is Tensor.squeeze(), which then “removes” dimensions with length 1.

`Tensor.max(dim)`

Returns only the largest value. If dim is given, for dimension dim.

>>> t = tensor([[1, 3], [4,8]])
# return largest value
>>> t.max()  
tensor(8)
# return largest value for each item on 1st dimension
>>> t.max(0) 
torch.return_types.max(
values=tensor([4, 8]),
indices=tensor([1, 1]))
# return largest value for each item on 2nd dimension
>>> t.max(1) 
torch.return_types.max(
values=tensor([3, 8]),
indices=tensor([1, 1]))

An intuition for this is that it’ll squish dim into a value of 1.

`Tensor.unfold(dim, size, step)`

For dimension dim, get batches of size every n-th step. If size > step, it truncates the final output.

>>> x = torch.arange(1, 8) # returns tensor([1, 2, 3, 4, 5, 6, 7])
>>> x.unfold(0, 2, 1)
tensor([[ 1.,  2.],
        [ 2.,  3.],
        [ 3.,  4.],
        [ 4.,  5.],
        [ 5.,  6.],
        [ 6.,  7.]])

`torch.cat(list)`

.stack() vs .cat()

stack: Concatenates sequence of tensors along a new dimension.

cat: Concatenates the given sequence of seq tensors in the given dimension.

So if A and B are of shape (3, 4):

torch.cat([A, B], dim=0) will be of shape (6, 4)
torch.stack([A, B], dim=0) will be of shape (2, 3, 4)

>>> torch.cat([tensor([1, 2, 3]), tensor([4, 5, 6])]) 
tensor([1, 2, 3, 4, 5, 6])

`torch.gather(src, dim, idx)`

note that in this visualization, dim. index starts at 1 not 0

There are 2 requirements that might go unnoticed:

src and idx have to be of the same rank.
The size of both src and idx at dim must match.

This means that, even if idx only has 1 row of index values, it need to have its own dimension (see the visualization).

>>> states=tensor([[1, 2, 3],[4, 5, 6]])
>>> actions=tensor([1, 2, 1])
>>> states.shape
torch.Size([2, 3])
>>> actions.shape
torch.Size([3])
# they are not of the same rank; we have to unsqueeze `actions` first
>>> actions=actions.unsqueeze(0)
torch.Size([1, 3])
>>> states.gather(1, actions.unsqueeze(0))
tensor([[2, 3, 2]])

`torch.view(src, dim1, dim2, ...)`

Turns src into a tensor of the specified dimensions. For -1, the size is left to be guessed.

>>> torch.zeros(3, 5, 6).shape          
torch.Size([3, 5, 6])
>>> torch.zeros(3, 5, 6).view(-1).shape 
torch.Size([90])
>>> torch.zeros(3, 5, 6).view(-1, 3).shape 
torch.Size([30, 3])
>>> torch.zeros(3, 5, 6).view(-1, 3, 3).shape 
torch.Size([10, 3, 3])

This is a way to visualize; it’ll basically take each value of the src tensor and start iterating through the indices, starting on the position lowest (0) and furthest in (last dimension). In our example, [0][0][0], then [0][0][1], and so on.

`torch.nn.Softmax(dim)`

Apply a softmax function along the nth dimension. It’s useful because it can map a range to probabilities. I was mostly wondering why it’s called “softmax”. Apparently, it’s an opposition to “hardmax”, which sets the maximum value to 1 and everything else to 0. Softmax, however, is non-lossy and maps values to a range.

resources

Why is the softmax function called that way?

Tips

Working with batches

Remember that torch.nn.Modular will output a tensor of the same dimensions as its input.

>>> dqn(tensor([.5]))    
tensor([-0.1096,  0.1605, -0.1288, -0.1131], grad_fn=<AddBackward0>)
>>> dqn(tensor([[.5]]))  
tensor([[-0.1096,  0.1605, -0.1288, -0.1131]], grad_fn=<AddmmBackward0>)
>>> dqn(tensor([[[.5]]]))
tensor([[[-0.1096,  0.1605, -0.1288, -0.1131]]], grad_fn=<ViewBackward0>)

Dimensions intuition

One easy way to visualize dimensions or to know what dimension some data is in, is by noticing that their index corresponds to their “depth” on the bracket notation.

#             2nd dimension (size 2)
#             v
tensor([[[.1, .2, 3.], [.3, .4, 5.]]])
#            ^                          ^
#            1st dimension (size 1)     3nd dimension (size 3)

(of course, in actual code, the dimension indexes would start at 0.)

Also, the indices for tensor.Size correctly correspond to their dimension.

>>> tensor([[[.1, .2, 3.], [.3, .4, 5.]]]).shape  
torch.Size([1, 2, 3])

Masking

If you have two tensors (or a tensor and an array) A and mask of same shape, where mask has a Boolean data type, you can use the notation A[mask] to “zip” over A.

NOTE
PyTorch will skip the entire operation for values of mask which are False. So for example, if you are assigning some tensor B to A, be aware that the length of B should be the same of the n. of Trues on mask (because for False values, the pointer won’t be incremented).

>>> a = tensor([5, 6, 7, 8])
>>> b = tensor([1, 2, 4])
>>> mask = tensor([ True,  True, False,  True])
>>> a[mask] = b
>>> a
tensor([1, 2, 7, 4])

Reshaping tensors

By using tensor.view(-1), you basically “reset” or flatten the tensor’s shape. You can then convert it into any other shape.

>>> x = tensor([[[1, 2, 3], [4, 5, 6]]])
>>> x.view(-1)
tensor([1, 2, 3, 4, 5, 6])
>>> x.view(-1, 2, 3) 
tensor([[[1, 2, 3],
         [4, 5, 6]]])

Operation between tensors w/ differing shapes

One little quirk is that PyTorch will expect tensors of same shape, but is okay with singleton dimensions. Hence it is the case that:

# doesn't work
>>> tensor([1, 2, 3]) * tensor([[4, 5], [4, 2], [3, 4]]) 
''' Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 1'''
# tensor([[1], [2], [3]]) works!
>>> tensor([1, 2, 3]).unsqueeze(-1) * tensor([[4, 5], [4, 2], [3, 4]]) 
tensor([[ 4,  5],
        [ 8,  4],
        [ 9, 12]])

Variable-size tensors

Say you have multiple input tensors of variable size, and you want to lay them all in the same vector, e.g. a batch. Here’s what you can do:

>>> x = [tensor([1, 2, 3]), tensor([3, 4])]
>>> padded_x = torch.nn.utils.rnn.pad_sequence(a, batch_first=True) # similar to torch.stack, but allows padding

resources

Why do we “pack” the sequences in PyTorch?