In this post, I will do a detailed benchmark of different type of AWS EC2 instances, to get below results:
- Performance improvements from c5/r5/m5 to c4/r4/m4? (for c3/r3/m3, definitely we need to replace)
- Comparison between AMD and Intel instances? e.g m5 vs. m5a
- Multiple small instances or one big instance? e.g 4 * (4 CPU, 8GB Memory) or 1 * (16 CPU, 32GB Memory)
To simplify the spelling, we defined some terms here:
- C-Series or C: Computer Optimized Instances
- M-Series or M: General Purpose Instances
- R-Series or R: Memory Optimized Instances
- 5th-Gen/4th-Gen/3rd-Gen: 5th/4th/3rd generation instances
- Use 5th-Gen as much as possible
- 5th-Gen has better performance in CPU and Memory. ~25% faster in CPU and 10 times faster in memory.
- 5th-Gen is 15% cheaper than 4th-Gen - $0.231 per hour for c4.xlarge and $0.196 per hour for c5.xlarge in Singapore.
- Use C-Series for CPU-bound applications, C-Series CPU performance is ~20% better than M/R-Series.
- AMD instances is 20% faster in CPU but 25% slower in Memory.
Before I listed long introductions of benchmark and charts, I guess it is better to write down the conclusions first.
After several benchmarks, we can have conclusions from chart and data:
- Single-core CPU performance is almost the same in one series.
- Differences among C/M/R-Series are not only the ratio between CPUs and memory - C-Series is 1:2, M-Series is 1:4; R-Series is 1:8. C-Series single-core performance is ~20% better than M/R-Series.
- C-Series CPU Compute Optimized Instances CPU performance is ~20% better than M-Series and R-Series Instances.
- C/M/R-Series single-core performance in multiple-cores benchmark decay significantly when testing cores bigger than half of cores - performance running on all cores drops ~20% than running on less or equal to half of cores. This may be caused by over-sell of virtual machines.
- Amazon Linux 2 CPU performance is 8% better than CentOS 7, Ubuntu Server 18.04 LTS and Ubuntu Server 16.04 LTS.
- 5th-Gen memory performance is 10 times faster in sequential read/write than 4th-Gen and 3th-Gen.
- 5th-Gen memory performance is 2-3 times faster in random read/write than 4th-Gen and 3th-Gen.
- C5-Series/R5-Series CPU is ~25% faster than C4-Series/R4-Series.
- M5-Series CPU is ~20% faster than M4-Series.
- C-Series memory performance is ~8% faster than R-Series/M-Series.
- M-Series with AMD, CPU performance is ~20% faster than M-Series with Intel.
- M-Series with AMD, memory performance is ~25% slower than M-Series with Intel.
I read a lot of different benchmarks before and sometimes it is hard to understand because authors usually put my emphasis on the results, not the procedures. I think not only the benchmark results are important, also the procedures. So in this section I will write the commands which generated benchmark results. Please feel free to skip.
Sysbench is a scriptable multi-threaded benchmark tool based on LuaJIT. It is most frequently used for database benchmarks, but can also be used to create arbitrarily complex workloads that do not involve a database server.
We will test on these aspects:
- cpu: a simple CPU benchmark
- memory: a memory access benchmark
sysbench cpu --threads=1 run
--threads, The total number of worker threads to create. Each thread will run on one CPU core.
Memory tests will run for
read performance and
write performance in
sysbench memory --memory-oper=read --memory-access-mode=seq run sysbench memory --memory-oper=write --memory-access-mode=seq run sysbench memory --memory-oper=read --memory-access-mode=rnd run sysbench memory --memory-oper=write --memory-access-mode=rnd run
Geekbench is a cross-platform benchmark that measures your system's performance. It will run several built-in tests on Single-Core and Multiple-Cores. The results are shown one
Single-Core Score and one
Multi-Core Score, like below:
cd Geekbench-5.1.0-Linux/ ./geekbench5
According to AWS “C5 and C5d instances feature either the 1st or 2nd generation Intel Xeon Platinum 8000 series processor (Skylake-SP or Cascade Lake) with a sustained all core Turbo CPU clock speed of up to 3.6 GHz.”, we need to know which type has the best single-core performance.
We will test on:
- c5 families:
- m5 families:
- r5 families
EC2 servers are always multiple-cores, and the pricing usually grows with cores linearly. For example, c5.large instances have 2 CPUs, $0.098 per hour; c5.xlarge instances have 4 CPUs, $0.196 per hour.
In this test, we want to make sure the performance grows linearly, like the relation between pricing and cores. We only need to test single-core performance on different generations instances if multiple-cores performance is linear with cores number. It will save a lot of time and make the whole benchmark easier to understand.
We will test on:
We will test single-core performance on different operating systems:
- Amazon Linux 2
- CentOS 7 (x86_64) - with Updates HVM
- Ubuntu Server 18.04 LTS
- Ubuntu Server 16.04 LTS
This test is the main target for this benchmark - we want to know the performance improvements among c5/c4/c3, r5/r4/r3, m5/m4/m3.
We will test on:
- c5.large, c4.large, c3.large
- r5.large, r4.large, r3.large
- m5.large, m4.large, m3.large
“a” in m5a means AMD CPU, which is cheaper than Intel CPU. In this test we want to know the difference between AMD and Intel CPU.
We will test on:
- m5.2xlarge - tested in previous test