Linear algebra is a core mathematical concept in machine learning, especially deep learning, a sub-field of ML. There is a number of instances where linear algebra comes in handy when implementing neural network in deep learning. One of it is when dealing with unstructured data like images. An image consists of pixels which is commonly represented as a tensor or a matrix. In this blog post, I will briefly talk about basic linear algebra operations in PyTorch that are used in deep learning.
This function allows us to perform dot product aka inner product between two vectors of the same size. The first element of
t1 is multiplied with the first element of
t2 and the second element of
t1 is multiplied with the second element of
t2 and so on and so forth. These products are then summed together. Note that a dot product between 2 vectors always returns a scalar value.
This operation is also commutative, in which
t1 . t2 = t2 . t1. If we pass in
t2 as the first argument and
t1 as the second argument, we will get the same answer. One thing to keep in mind is that the dot product only works if the vectors are of the same size. If not, it will spit out an error complaining about inconsistent number of elements.
torch.mm() is responsible for multiplication between 2 matrices. Similar to vector multiplication, matrix multiplication makes use of dot product and requires the matrices to have certain sizes. The number of columns of the first matrix must be equal to the number of rows of the second matrix. Each row of the first matrix will be transposed and multiplied against each column in the second matrix. This is basically a vector multiplication where each row in the first matrix is transposed to make sure it has the same dimension as each column in the second matrix.
For example, the dot product is valid if the first matrix has a dimension of (3, 2) and the second matrix has a dimension of (2, 2). But not the other way around. A bunch of words might not help much, so let's look at a couple of examples.
Example 2 uses the same arguments as example 1, except that the order of the arguments is swapped. We can see that swapping the order results in a completely different outcome. So, unlike the dot product between 2 vectors, matrix multiplication is not commutative;
t1 x t2 != t2 x t1.
Example below shows that it is important to make sure the rows of the first matrix have the same number of entries as the columns of the second matrix.
This function performs multiplication, but it is not limited to certain shapes of tensors.
torch.matmul() allows us to do multiplication for different ranks of tensors. Based on PyTorch's official documentation, this function behaves according to the dimensionality of the input tensors. For instance, if both arguments are vectors of the same size, it will behave exactly like
torch.dot(). If both arguments are matrices, it will perform matrix multiplication similar to
torch.mm(). It also supports multiplication between a scalar and a matrix, by converting the scalar value into a rank-2 tensor so these 2 tensors will be compatible. In other words, it supports broadcasting. Check out this blogpost for a detailed explanation on broadcasting.
In above example,
t1 is a scalar value and
t2 is a matrix. The way
matmul handles this is by pre-pending a
1 to the dimension of
t1 so the new dimension becomes
(1, 2). It is now compatible with
t2 that has a dimension of
(2, 3). The pre-pended
1 is removed after multiplication is performed.
Sometimes the tensor that we have is not the shape or dimension that we desire, and this happens a lot. So this is where transposing an array or a matrix comes in handy. One of the applications is when doing an operation within matrices itself, which I mentioned earlier in
torch.mm() section. In terms of a matrix, transposing can be thought of as flipping the elements over the diagonal axis.
torch.transpose() accepts 3 arguments; first argument being the tensor, second and third arguments being the dimensions that we want to swap.
The above example shows that we swap between dimension 0 and dimension 1, so the row of the output is the column of the input and vice versa.
torch.transpose() allows us to swap between the same dimension, but it is the same as not swapping. So the output will be just the same as the input. Example below demonstrates this behavior:
So far, we haven't mentioned element-wise operation between tensors. There are a number of functions in PyTorch that allows us to do that and one of them is
torch.add(). In order to sum two matrices together, they must have the same size.
t2 are both (2, 2) matrices allowing values at the same position of the 2 matrices to be added together resulting in also a (2, 2) matrix.
Example above shows that the function supports broadcasting in which it modifies the dimension of
t1 so it becomes compatible with the second argument,
It is certainly important to have a good understanding and know when to implement a particular linear algebra operation if we want to delve into the world of deep learning. That being said, the list of functions above is far from exhaustive. Fret not, as we dive deeper, we are likely to discover more functions and the list will only grow from here on out!