Analyze the results of the CPU vs GPU experiment.

#aws #cloud #jupyter #beginners

Introduction

Now, I will complete the previous post to analyze the results of the CPU versus GPU.
Because the experiment is based on matrix multiplication computations, a fundamental operation in many machine learning algorithms, you can expect similar trends when using a GPU instead of a CPU when implementing machine learning algorithms.

Focus on the two charts at the bottom of the code cell that look similar to the following image and see what conclusions you can make:

The upper graph shows the time it takes for each device to compute matrix products for different sizes of matrices, with the CPU in orange.
The lower graph shows the GPU speedup, which is the ratio of the CPU compute time to the GPU compute time.

The CPU time grows proportional to the size of the matrix squared or cubed. That is, doubling the matrix size results in a 4X to 8X increase in compute time.
This follows the growth in the number of computations required to calculate matrix products of different sizes
The GPU time grows almost linearly with the size of the matrix for the sizes used in the experiment. That is doubling the size doubles the time. This indicates that the GPU hasn't reached capacity.
It is able to add more compute cores to complete the computation in much shorter times than a CPU.
The compute time for matrices of size less than 1000 is similar for GPU and CPU.
Sometimes the CPU performs better than GPU for these small sizes.
Transfering results from GPU to CPU and initializing GPU programs create some overhead in using a GPU.
In general, GPUs excel for large-scale problems.
The speedup increases with the matrix size. The maximum reached is around 20X. A 20X speedup can mean a compute job finishing in 1 day on CPU, or in 1 hour on GPU.
For larger problems, GPUs can offer speedups in the hundreds.

In this post, you analyzed the results of the experiment and gained an understanding of when GPUs can offer substantial improvements and when they are overkill.
For a complete comparison, you need to consider the cost of running GPU instances to running CPU-only instances.
Usually, if you have problems large enough to be compute-bound, the price increase of using GPUs is significantly less than the speedup achieved by using GPUs.
This results in a reduced overall price.
For very large problems, you can utilize multiple GPUs in a cluster.
AWS provides a machine learning CloudFormation template to help with setting up an auto-scaled cluster of GPUs for machine learning tasks.

If you have time, try to modify the code to see what size of matrix causes the GPU to become compute-bound. Observe how high the temperature reaches while running in a compute-bound state for an extended period.
Also, see what the maximum speedup you can achieve by using the GPU is.