DEV Community

Franck Pachot for AWS Heroes

Posted on • Updated on

Pressure Stall Information on CentOS7

I'm running databases and knowing if the pressure is on CPU, RAM or I/O is crucial, and that's not easy to infer from the metrics provided in CloudWatch or OS usual monitoring. Recent Linux kernels provide PSI (Pressure Stall Information) for that, so let's enable it.

I have EC2 instances provisioned from
aws-marketplace/CentOS Linux 7 images but Centos is not moving fast and has an old kernel:

[yugabyte]$ cat /etc/system-release

CentOS Linux release 7.4.1708 (Core)

[yugabyte]$ uname --kernel-release

3.10.0-693.5.2.el7.x86_64
Enter fullscreen mode Exit fullscreen mode

I need a more recent one which I'll install from ELRepo

[yugabyte]$ sudo yum update -y

No packages marked for update

[yugabyte]$ sudo yum install -y https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm
Enter fullscreen mode Exit fullscreen mode

Now listing the mainline kernel ('ml' as opposite to long-term 'lt'):

[yugabyte]$ yum list available --disablerepo='*' --enablerepo=elrepo-kernel kernel-ml.x86_64

Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * elrepo-kernel: mirrors.coreix.net
Available Packages
kernel-ml.x86_64                                                5.13.12-1.el7.elrepo                                                 elrepo-kernel
Enter fullscreen mode Exit fullscreen mode

And installing it:

[yugabyte]$ sudo yum --enablerepo=elrepo-kernel install -y kernel-ml

Enter fullscreen mode Exit fullscreen mode

Here it is as the first menu entry for grub:

[centos]$ sudo grep ^menuentry /boot/grub2/grub.cfg

menuentry 'CentOS Linux (5.13.12-1.el7.elrepo.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-693.5.2.el7.x86_64-advanced-6f15c206-f516-4ee8-a4b7-89ad880647db' {
menuentry 'CentOS Linux (3.10.0-1160.36.2.el7.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-693.5.2.el7.x86_64-advanced-6f15c206-f516-4ee8-a4b7-89ad880647db' {
menuentry 'CentOS Linux (3.10.0-693.5.2.el7.x86_64) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-693.5.2.el7.x86_64-advanced-6f15c206-f516-4ee8-a4b7-89ad880647db' {
menuentry 'CentOS Linux (0-rescue-f073c429a7456b53ec3e2c53460c5c8f) 7 (Core)' --class centos --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-0-rescue-f073c429a7456b53ec3e2c53460c5c8f-advanced-6f15c206-f516-4ee8-a4b7-89ad880647db' {
Enter fullscreen mode Exit fullscreen mode

I make it the default one (menuentry 0):

[yugabyte]$ sudo sed -e '/^GRUB_DEFAULT=saved/s/=.*/=0/' -i /etc/default/grub

Enter fullscreen mode Exit fullscreen mode

I need to add psi in the kernel command line:

[yugabyte]$ sudo mkdir -p /etc/tuned/psi && sudo tee /etc/tuned/psi/tuned.conf <<'TAC'

[main]
  summary=Enable Pressure Stall Information as in https://dev.to/aws-heroes/pressure-stall-information-on-ec2-centos-7-2nbb-temp-slug-5559720
[bootloader]
cmdline=psi=1

TAC
Enter fullscreen mode Exit fullscreen mode

Checking current profile:

[yugabyte]$ tuned-adm profile
Current active profile: virtual-guest
Enter fullscreen mode Exit fullscreen mode

adding the new one:

[yugabyte]$ sudo tuned-adm profile virtual-guest psi

Enter fullscreen mode Exit fullscreen mode

enabling all these GRUB changes:

[yugabyte]$ sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Enter fullscreen mode Exit fullscreen mode

Now ready to reboot the node. This is where I appreciate running a distributed database (YugabyteDB 🚀) as I can rolling-restart the nodes without application interruption.

Checking it:

[yugabyte]$ tail /proc/pressure/*

==> /proc/pressure/cpu <==
some avg10=27.80 avg60=25.88 avg300=16.13 total=77572758
full avg10=0.98 avg60=0.94 avg300=0.55 total=4422080

==> /proc/pressure/io <==
some avg10=12.03 avg60=13.02 avg300=7.36 total=32530366
full avg10=4.73 avg60=5.33 avg300=3.08 total=15034660

==> /proc/pressure/memory <==
some avg10=0.12 avg60=0.02 avg300=0.00 total=309168
full avg10=0.12 avg60=0.02 avg300=0.00 total=307455
Enter fullscreen mode Exit fullscreen mode

Now it remains to interpret it. I explained a bit in a past blog post and the full description in on www.kernel.org. Basically, the "some" line shows the percent of time where one task is stalled, and "full" when all non-idle tasks are waiting, over the last 10 seconds, 1 minute, and 5 minutes. So if you feel something is slow and should be faster, don't scale blindly. You know which resource is responsible for the response time.

Discussion (0)