Verify「CPU memory reordering」
Question
According to Memory Reordering Caught in the Act open_in_new, Intel CPU may sometimes reorder memory instructions. Memory reordering has a great effect on lock-free
programming. The goal of this article is to verify memory reordering.
Given two integers X
and Y
, somewhere in memory, they are initialized to 0
. Now we have two processors which run parallelly and execute machine codes.
Intel CPU may execute "Memory instruction reordering" and the actual execution order has become:
There are four situations: r1 = 1, r2 = 0
or r1 = 0, r2 = 1
or r1 = 1, r2 = 1
or r1 = 0, r2 = 0
. Among them, r1 = 0, r2 = 0
is caused by Intel CPU "Memory instruction reordering" which is also the goal of our verification.
CPU Info
Now list my CPU Info:
Name | Processors | Cores | Cache |
---|---|---|---|
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | 4 | 2 | 3 MB Intel® Smart Cache |
$ cat /proc/cpuinfo # Watch cpu info
$ lscpu # View summary
Deduction
To verify this process, we use Nim language. Providing that we have four variables, run two worker threads in multi-core processor and disable compiler's Memory reordering optimization to watch the process of cpu's Memory reordering optimization.
First we initialize x
and y
to zero. Then we run x = 1; r1 = y;
in worker thread 1 and run y = 1; r2 = x;
in worker thread 2. Finally we print the number of which r1 = 0, r2 = 0
occurs (Number of occurrences, number of iterations).
Thread 1:
x = 1
r1 = y
Thread 2:
y = 1
r2 = x
Results:
r1 == 0 and r2 == 0
Programming
First, choose a "wake <-> wait" mechanism for the threads to keep the worker threads in order. For simplicity, we manually write a simple semaphore implementation.
This semaphore implementation uses Linux's eventfd
and eventfd
is an efficient thread communication mechanism which is very suitable as a semaphore implementation(reference eventfd(2) — Linux manual page).
import std/posix
import std/os
# Provide semaphore-like semantics for reads from the new file descriptor.
const EFD_SEMAPHORE = 1
proc eventfd(initval: cuint, flags: cint): cint {.
importc: "eventfd",
header: "sys/eventfd.h"
.}
type
Semaphore = object
fd: cint
template checkOSError(syscall: untyped) =
let n = syscall
if n < 0:
raiseOSError(osLastError())
proc initSemaphore(): Semaphore =
# create eventfd in semaphore mode
result.fd = eventfd(0, EFD_SEMAPHORE)
proc wait(s: var Semaphore) =
var buf: uint64
checkOSError s.fd.read(buf.addr, 8)
proc signal(s: var Semaphore) =
var buf: uint64 = 1
checkOSError s.fd.write(buf.addr, 8)
Implement thread functions:
import std/random
var
beginSem1 = initSemaphore() # for thread 1 synchronization
endSem1 = initSemaphore() # for thread 1 synchronization
beginSem2 = initSemaphore() # for thread 2 synchronization
endSem2 = initSemaphore() # for thread 2 synchronization
x: int
y: int
r1: int
r2: int
const Format{.strdefine.} = "logging" # Specify the format of the console print log
proc threadFunc1() {.thread.} =
randomize() # Initialize "Random Number Generator"
while true:
beginSem1.wait() # Wait for the main thread to initiate a signal
while rand(100) mod 8 == 0: break # Calculate the random number
# deliberately make the two threads run staggered
# ----------------------------- transaction! -------------------------------
x = 1
{.emit: """asm volatile("" ::: "memory");""".} # Disable compiler reordering
r1 = y
# --------------------------------------------------------------------------
endSem1.signal() # Notify the main thread that an iteration is complete
proc threadFunc2() {.thread.} =
randomize() # Initialize "Random Number Generator"
while true:
beginSem2.wait() # Wait for the main thread to initiate a signal
while rand(100) mod 8 == 0: break # Calculate the random number
# deliberately make the two threads run staggered
# ----------------------------- transaction! -------------------------------
y = 1
{.emit: """asm volatile("" ::: "memory");""".} # Disable compiler reordering
r2 = x
# --------------------------------------------------------------------------
endSem2.signal() # Notify the main thread that an iteration is complete
proc main() =
var
thread1: Thread[void]
thread2: Thread[void]
createThread(thread1, threadFunc1)
createThread(thread2, threadFunc2)
# Repeat experiment
var n = 0 # Number of reorders
var counter = 0 # The number of worker thread iterations
while true:
x = 0
y = 0
beginSem1.signal()
beginSem2.signal()
endSem1.wait()
endSem2.wait()
counter.inc()
if r1 == 0 and r2 == 0: # CPU memory reordering occurs
n.inc()
when Format == "csv": # Output csv format: -d:Format=csv
echo $n, ", ", $counter
else: # Output log format
echo "Reorders detected (n=" & $n & ", counter=" & $counter & ")"
joinThread(thread1)
joinThread(thread2)
main()
Experimental results
Open the console, run the program, and output the log:
$ nim c -r -d:release reordering.nim
Reorders detected (n=1, counter=1)
Reorders detected (n=2, counter=509)
Reorders detected (n=3, counter=512)
Reorders detected (n=4, counter=21185)
Reorders detected (n=5, counter=21400)
Reorders detected (n=6, counter=21486)
Reorders detected (n=7, counter=21490)
Reorders detected (n=8, counter=21553)
Reorders detected (n=9, counter=31530)
Reorders detected (n=10, counter=31612)
Reorders detected (n=11, counter=32948)
Reorders detected (n=12, counter=36575)
Reorders detected (n=13, counter=39927)
Reorders detected (n=14, counter=59147)
Reorders detected (n=15, counter=59165)
Reorders detected (n=16, counter=59168)
Reorders detected (n=17, counter=60922)
Reorders detected (n=18, counter=62117)
Reorders detected (n=19, counter=62121)
Reorders detected (n=20, counter=62130)
Reorders detected (n=21, counter=62407)
Reorders detected (n=22, counter=65811)
Reorders detected (n=23, counter=65829)
Reorders detected (n=24, counter=69376)
Reorders detected (n=25, counter=72983)
Reorders detected (n=26, counter=76600)
Reorders detected (n=27, counter=78803)
Reorders detected (n=28, counter=79768)
Reorders detected (n=29, counter=83267)
Reorders detected (n=30, counter=83304)
...
Further, let's make a csv file and a chart to view the trend of CPU memory reordering.
We have defined the Format
symbol in the program to help control the format of the printed log. -d:Format=cs
prints csv data format (number of reordering occurrences\t worker thread iterations
).
Write a python script (reordering.py), read the resulting csv file (reordering.csv) and generate a graph (reordering.png).
# reordering.py
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import csv
import seaborn as sns
sns.set(style="darkgrid")
xdata = [] # Number of worker thread iterations
ydata = [] # Occurrence of reorders
with open("reordering.csv") as f:
fcsv = csv.reader(f)
headers = next(fcsv)
for row in fcsv:
xdata.append(float(row[1]))
ydata.append(float(row[0]))
df = pd.DataFrame(dict(counter=xdata, n=ydata))
g = sns.relplot(
x="counter",
y="n",
height=4,
linewidth=2,
aspect=1.3,
kind="line",
data=df
)
g.fig.autofmt_xdate()
plt.savefig('reordering.png')
Run our program:
$ nim c -r -d:release reordering.nim > reordering.csv
$ python3 reordering.py
Generate csv file (reordering.csv):
1, 1524
2, 6342
3, 6396
4, 6404
5, 6612
6, 6665
7, 6706
8, 6787
9, 6789
10, 6793
11, 6821
12, 6833
13, 6845
14, 6860
15, 6869
16, 6872
...
991, 4272196
992, 4272227
993, 4272232
994, 4272236
995, 4272242
996, 4272277
997, 4272280
998, 4272293
999, 4272305
1000, 4275173
...
Generate a graph (reordering.png) --- Basically present a linear trend:
Top comments (0)