This blogpost is about linux memory management, and specifically about the question that has been asked about probably any operating system throughout history: how much free memory do I need to consider it to be healthy?
To start off, a reference to a starwars quote: 'This is not the free memory you're looking for'.
What this quote means to say is that whilst the free memory statistic obviously shows free memory, what you are actually looking for is the amount of memory that can be used for memory allocations on the system. On linux, this is not the same as free memory.
Of course free memory is directly available, actually free memory, which can be directly used by a process that needs free memory (a free page). A free page is produced by the kernel page daemon (commonly named 'swapper'), or by a process that explicitly frees memory.
The linux operating system frees a low amount of memory, for the reason to make use of memory as optimal as it can. One of the many optimizations in linux is to use memory for a purpose, such as storing an earlier read page from disk. It doesn't make sense to free all used memory right after usage, such as a page read from disk. In fact, radically cleaning all used pages after use would eliminate the (disk) page cache in linux.
In linux, there are no settings for dedicating a memory area as disk cache, instead it essentially takes all non-used memory and keeps the data in it available as long as there isn't another, better purpose for the page.
However, there must be some sort of indicator that tells you how much memory the operating system can use. I just said that free memory is not that indicator. So what is that indicator? Since this commit there is the notion of 'available memory' (unsurprisingly, 'MemAvailable' in /proc/meminfo). This is the amount of memory that could be used if memory is needed by any process. So, if you want to know how much memory can be used on a linux system, available memory is the statistic to look for, and not free memory.
An obvious question at this point is: if there is available memory that actually should be looked for when trying to assess memory health, which is traditionally done by looking at free memory, why is there free memory, and not just available memory?
Free memory is really just a minimal amount of pages made free upfront to quickly provide the kernel or a process with some free pages, for the maximal amount as set by the kernel parameter vm.min_free_kbytes. The main reason for doing that is performance: if a page is required by a process, and it would need to find an available page, the requesting process must stop processing and scan memory for pages that are available, which takes time and CPU processing, and if a page is found, it needs to be freed (removed from the lists in which it could be found to be used for the original contents). A free page has all that work done upfront, and can just be taken from the list.
The next obvious question is: okay, so I need to look at available memory, do I need to monitor for a certain amount, such as a percentage of total memory? Despite being an obvious question, this cannot be answered without understanding the system and especially the processing application on it. The amount of memory that needs to be available depends on the amount of memory allocations and freeing that the total amount of running processes perform, which is really specific to each machine, and the application it is serving.
The way free memory works can be seen on a linux system when monitoring the free memory statistic (MemFree in /proc/meminfo): if a system has started up, probably a certain (large) amount of memory is shown as free, which amount declines based on the memory allocation eagerness of the processes running on it. Once it's down to close to vm.min_free_kbytes, what you will see is it fluctuates a bit, but remains there. If free memory remains consistently higher, not because of allocation and freeing, it means the memory is never used. In general, what you see is that it mostly stays around vm.min_free_kbytes. This also means that monitoring free memory doesn't really tell you anything.
This will be different if processes free large amounts of memory on the system, obviously these will be added to free memory, and thus increase free memory, which will increase the statistic of free memory much higher than around vm.min_free_kbytes. This is application dependent, and not something hardly ever seen: process based databases such as Postgresql or Oracle can allocate (huge) private heaps per individual process for data processing, and can free them if they are not needed anymore.
Closely monitoring the free memory and available memory statistics, in the case of Postgresql (and therefore Yugabyte YSQL) memory usage, memory allocations tend to be bursty in nature. That means that most of the time, there is no gradual increase of any memory statistic for which a warning and alarm threshold can be set so a potential low on memory scenario can be warned for. Instead, the general pattern that I witness is that memory usage does generally move up and down, however at certain times very rapidly peak causing the swapper to free memory, and sometimes do peak so much that processes are forced to stall and perform direct memory gathering.
If the peak is short enough, this passes rather unnoticed from a userland perspective, and the direct memory gathering together with the swapper freeing memory quickly returns the system to a normal memory state and returns to normal performance. Of course totally depending on each specific situation, when lots of memory is in active use, this is actually not a rare event.
If the memory gathering peak is longer and gets a system close to exhaustion, there are two scenario's that I see, which do not exclude each other, in random order:
If any process could not find enough pages to satisfy its need after scanning all memory, it will invoke the the OOM (out of memory) killer. The OOM killer performs a calculation roughly based on memory usage and oom priority, and terminates one process with the intent to restore memory availability and thus normal functioning and performance.
The system gets into a state that is commonly called 'thrashing' for which the Wikipedia article link performs a very good description. The essence is the memory management for all processes performing memory tasks eats up all the time because of memory over-allocation, bringing the system close to a standstill, but no process actually is unable to find available pages, so the OOM killer is not invoked.
Both obviously have a profound impact, and should be prevented, because they impact global performance. The general recommendation is to make sure memory allocations are significantly lower than total memory, so there is memory left for the kernel, the page cache and any peaks in memory usage for one or more processes.
The conclusion is that monitoring free memory with the intent to understand how much memory is available is wrong. There is another statistic that provides this information, which is memory available. In order to make a linux system perform optimally, it should have enough available memory so processing is impacted as little as possible.