Daniele Polencic

Posted on Mar 6, 2023

CPU requests and limits in Kubernetes

#kubernetes #devops

In Kubernetes, what should I use as CPU requests and limits?

Popular answers include:

Always use limits!
NEVER use limits, only requests!
I don't use either; is it OK?

Let's dive into it.

In Kubernetes, you have two ways to specify how much CPU a pod can use:

Requests are usually used to determine the average consumption.
Limits set the max number of resources allowed.

The Kubernetes scheduler uses requests to determine where the pod should be allocated in the cluster.

Since the scheduler doesn't know the consumption (the pod hasn't started yet), it needs a hint.

But it doesn't end there.

CPU requests are also used to repart the CPU to your containers.

Let's have a look at an example:

A node has a single CPU.
Container A has requests equal to 0.1 vCPU.
Container B has requests equal to 0.2 vCPU.

What happens when both containers try to use 100% of the available CPU?

Since the CPU request doesn't limit consumption, both containers will use all available CPUs.

However, since container B's request is doubled compared to the other, the final CPU distribution is: Container 1 uses 0.3vCPU and the other 0.6vCPU (double the amount).

Requests are suitable for:

Setting a baseline (give me at least X amount of CPU).
Setting relationships between pods (this pod A uses twice as much CPU as the other).

But do not help set hard limits.

For that, you need CPU limits.

When you set a CPU limit, you define a period and quota.

Example:

period: 100000 microseconds (0.1s).
quota: 10000 microseconds (0.01s).

I can only use the CPU for 0.01 seconds every 0.1 seconds.

That's also abbreviated as "100m".

If your container has a hard limit and wants more CPU, it has to wait for the next period.

Your process is throttled.

So what should you use as CPU requests and limits in your Pods?

A simple (but not accurate) way is to calculate the smallest CPU unit as:

REQUEST = NODE_CORES * 1000 / MAX_NUM_PODS_PER_NODE

For a 1 vCPU node and a limit of 10 Pods, that's a 1 * 1000 / 10 = 100Mi request.

Assign the smallest unit or a multiplier of it to your containers.

For example, if you don't know how much CPU you need for Pod A, but you identified it is twice as Pod B, you could set:

Request A: 1 unit
Request B: 2 units

If the containers use 100% CPU, they repart the CPU according to their weights (1:2).

A better approach is to monitor the app and derive the average CPU utilization.

You can do this with your existing monitoring infrastructure or use the Vertical Pod Autoscaler to monitor and report the average request value.

How should you set the limits?

Your app might already have "hard" limits. (Node.js is single-threaded and uses up to 1 core even if you assign 2).
You could have: limit = 99th percentile + 30-50%.

You should profile the app (or use the VPA) for a more detailed answer.

Should you always set the CPU request?

Absolutely, yes.

This is a standard good practice in Kubernetes and helps the scheduler allocate pods more efficiently.

Should you always set the CPU limit?

This is a bit more controversial, but, in general, I think so.

You can find a deeper dive here: https://dnastacio.medium.com/why-you-should-keep-using-cpu-limits-on-kubernetes-60c4e50dfc61

Also, if you want to dig in more a few relevant links:

And finally, if you've enjoyed this thread, you might also like:

The Kubernetes workshops that we run at Learnk8s https://learnk8s.io/training
This collection of past threads https://twitter.com/danielepolencic/status/1298543151901155330
The Kubernetes newsletter I publish every week https://learnk8s.io/learn-kubernetes-weekly

Top comments (6)

Maciej Wakuła • Mar 13 '23

That is a good article but we should focus on the use cases (I'll try to prepare a response for this nice article).
You can have an environment that must remain under control and stable (with limits) or you might have an (likely own) cloud with few resources, very little load but peak load on a very few containers. This needs to be analyzed though with different contenerization methods and their ability to free allocated resources.

Daniele Polencic • Mar 13 '23 • Edited

I agree, there's more to it as well. This article doesn't cover:

Overhead from container runtime (modelled with the RuntimeClass).
QoS for Pods.
CPU limits have broad implications in the actual app deployed. The JVM for example has specific settings and needs limits. I remember reading something similar for Go apps too.
Quotas. So I expect plenty of criticism 😅

Maciej Wakuła • Mar 13 '23

Still - this is a very good and solid article :)

Daniele Polencic • Mar 13 '23

Thank you!

Alexandru Lazarev • Jan 15 '25

Hi @danielepolencic
Great article explained in simple terms, but still - a question :)
What is the difference between the "Normal" vs "Observed" in (for example) p99 CPU Usage?

And small remark for use-case is Node CPU Usage = 60% - I guess it is not mandatory that container will respect requests ratio, since there is available CPU and container with less requests need more - it will take more while container with high request don't need also much more.
P.S. Also there looks like typo in picture: pod#1 is texted as with lower CPU Usage % but in picture it is drawn as with higher, no?

Daniele Polencic • Mar 21 '25

What do you mean by "normal"?

I guess it is not mandatory that container will respect requests ratio, since there is available CPU and container with less requests need more

This is not correct. Requests are guaranteed. If you don't use them, they are lost. No other process can get them, even if they are free.