DEV Community

عبدالله عياد | Abdullah Ayad for AWS Community Builders

Posted on • Updated on

Set up the CPU versus GPU experiment


  • The EC2 instance for this Lab is a p2.xlarge instance, which has 4 CPUs and 1 GPU.
  • The GPU is a NVIDIA Tesla K80 containing 2,496 processing cores.
  • In this post, you will create a Jupyter notebook with provided Python code to compare the performance of matrix multiplication computations.
  • Matrix multiplications are a core operation in many machine learning algorithms.
  • The provided code uses the TensorFlow machine learning library to implement the matrix multiplications on CPU and GPU.
  • You have to see this post and this post to understand how to start a jupyter notebook.

Set up the experiment

Step 1

Return to your SSH shell where you started the Jupyter notebook server, and copy the notebook server URL.
Note: You can retrieve the URL again by running the jupyter notebook list command again.

Step 2

Open a new browser tab, and paste the URL you copied earlier into the address bar:

Image description

Step 3

Replace the port number (8888) with 8000, replace localhost with your EC2 public IP, and navigate to the URL:

Image description

  • Recall that the SSH tunnel is from port 8888 on the remote server to port 8000.
  • The landing page shows all the files in the working directory.

Step 4

Click on the New button above the file listing table, and select your environment (here I selected conda_tensorflow_p36):

Image description

  • The Amazon Deep Learning AMI includes several virtual environments for Python to avoid dependency conflicts between packages.
  • The provided code needs an environment with Python 3.6 and TensorFlow available.
  • The new notebook opens in a new browser tab:

Image description

Step 5

Paste the following Python code into the cell:

from __future__ import print_function
import time
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as pyplot
def benchmark(devices):
  '''Benchmark each device by computing matrix products'''
  times = {device: [] for device in devices}
  sizes = range(100, 7000, 500)
  for size in sizes:
    print(f"Calculating {size}x{size} matrix product")
    for device in devices:
      shape = (size, size)
      data_type = tf.float32
      with tf.device(device):
        mat1 = tf.random_uniform(shape=shape, minval=0, maxval=1, dtype=data_type)
        mat2 = tf.random_uniform(shape=shape, minval=0, maxval=1, dtype=data_type)
        matmul = tf.matmul(mat1, mat2)
      with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as session:
        start_time = time.time()
        result =
        time_taken = time.time() - start_time
        print(f"{device} took {round(time_taken,2)}s")
  return times, sizes
def plot_results(devices, sizes, times):
  '''Plot the benchmark results'''
  fig, (ax1, ax2) = pyplot.subplots(2, 1, sharex=True)

  for device in devices:
    ax1.plot(sizes, times[device], 'o-', label=device)
  ax1.set_ylabel('Compute Time')
  ax1.set_title('Device Compute Time vs. Matrix size')
  ax1.legend(devices, loc=2)

  ax2.plot(sizes, np.divide(times[devices[1]], times[devices[0]]), 'o-', label=device)
  ax2.set_ylabel('GPU Speedup')
  ax2.set_xlabel('Matrix size')
  ax2.set_title('GPU Speedup vs. Matrix size')
def experiment():
  '''Run an experiment that compares CPU and GPU device performance'''
  devices = ["/gpu:0", "/cpu:0"]
  times, sizes = benchmark(devices)
  plot_results(devices, sizes, times)
Enter fullscreen mode Exit fullscreen mode
  • Don't worry about understanding the details of the code. The code includes three functions:

benchmark: Runs the matrix multiplication benchmark for both the CPU and GPU devices
plot_results: Generates visualizations to easily interpret the benchmark results
experiment: Runs the overall experiment

Step 6

Return to your initial SSH shell, and enter the following command to monitor the status of the Python process associated with your notebook:

top -p `pgrep "python"`
Enter fullscreen mode Exit fullscreen mode

Image description

The top command is one way to track the CPU (%CPU) and memory (%MEM) usage of the python process that runs the CPU device benchmark.

Step 7

Enter s1 to tell top to update the statistics every second.

Step 8

Switch to the SSH shell you used for creating a tunnel, and enter the following command to monitor the status of the GPU:

watch -n 1 nvidia-smi
Enter fullscreen mode Exit fullscreen mode

Image description

  • The nvidia-smi is the NVIDIA system management information tool.
  • You are using it to display information about the GPU including the temperature (Temp), power usage (Pwr), memory usage (Memory-Usage), and GPU utilization (GPU-Util).
  • The watch command is used to update the output every second.

Step 9

Position your Jupyter notebook browser tab, and two SSH shells so that you can see all three at once.

This way you will be able to monitor the CPU and GPU as the experiment runs.

To perform the experiment, read the next post from here


Top comments (0)