DEV Community

Cover image for 32 Kernel’s Teeth for “Chewing” the Network Stack on Linux
Roman Belshevitz
Roman Belshevitz

Posted on

32 Kernel’s Teeth for “Chewing” the Network Stack on Linux

The topic of tuning the network stack is very narrow and complex the same time. Today, many tips either do not match the current default settings. Some mechanisms are already included in modern kernels. Below is my compilation of what seems to be relevant at the moment. It's mostly about timeouts and memory consumption. There are a lot of them now and they are cheap.

In this article I provide a set of recommended network configuration settings for optimizing TCP connections on a server. The suggested settings include adjusting parameters related to orphaned TCP sockets, reducing the timeout for sockets in the FIN-WAIT-2 state, configuring TCP keepalive checks, managing memory allocation for TCP connections, disabling syncookies, selecting a congestion control algorithm, expanding the local port range, enabling protection against TIME_WAIT attacks, increasing the maximum number of open sockets, adjusting buffer sizes for connections, and optionally disabling local ICMP packet redirects.

These discussed settings aim to improve performance, memory usage, and security on powerful and busy servers.

1

⚙️ Increase the value of tcp_max_orphans, which determines the maximum number of orphaned (not associated with any process) TCP sockets. Each socket consumes approximately 64 KB of memory. Therefore, the parameter should be matched with the available memory on the server.
net.ipv4.tcp_max_orphans = 65536

2

⚙️ Decrease tcp_fin_timeout (default is 60). This parameter determines the maximum time a socket can remain in the FIN-WAIT-2 state. This state is used when the other party does not close the connection. Each socket occupies about 1.5 KB of memory, which can consume memory when there are many of them.
net.ipv4.tcp_fin_timeout = 10

3

⚙️ Parameters related to TCP connection checks in the SO_KEEPALIVE status: keepalive_time specifies the time after which checks will begin after the last activity on the connection, keepalive_intvl determines the interval between checks, and keepalive_probes specifies the number of checks.
net.ipv4.tcp_keepalive_time = 1800
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 5

4

⚙️ Pay attention to the parameters net.ipv4.tcp_mem, net.ipv4.tcp_rmem, and net.ipv4.tcp_wmem. They heavily depend on the memory available on the server and are automatically calculated during system load. In general, it is not necessary to modify them, but sometimes it is possible to manually adjust them to increase the values.

5

⚙️ Disable (enabled by default) the transmission of syncookies to the host in case of SYN packet queue overflow for a specific socket.
net.ipv4.tcp_syncookies = 0

6

⚙️ Special attention should be given to the congestion control algorithm used in TCP networks. There are many algorithms (cubic, htcp, bic, westwood, etc.), and it is difficult to definitively say which one is better to use. Algorithms show different results in different load scenarios. The kernel parameter tcp_congestion_control controls this:
net.ipv4.tcp_congestion_control = cubic

7

⚙️ When the server has a large number of outbound connections, there may not be enough local ports for them. By default, the range 32768-60999 is used. It can be expanded:
net.ipv4.ip_local_port_range = 10240 65535

8

⚙️ Enable protection against TIME_WAIT attacks. By default, it is disabled.
net.ipv4.tcp_rfc1337 = 1

9

⚙️ The maximum number of open sockets waiting for connections has a relatively low default value. In kernels prior to 5.3, it is 128, which was increased to 4096 in kernel 5.4. It makes sense to increase it on busy and powerful servers:
net.core.somaxconn = 16384

10

⚙️ On powerful and busy servers, you can increase the default buffer size values for both receiving and transmitting for all connections. This parameter is measured in bytes. By default, it is 212992 or 208 KB.
net.core.rmem_default = 851968
net.core.wmem_default = 851968
net.core.rmem_max = 12582912
net.core.wmem_max = 12582912

11

⚙️ Disable local ICMP packet redirects. This should only be done if your server does not act as a router, i.e., if you have a regular web server.
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.send_redirects = 0
Additionally, you can completely disable kernel-level responses to ICMP requests. It is not a common practice.

12

Increase the value of tcp_max_orphans, which determines the maximum number of orphaned (not associated with any process) TCP sockets. Each socket consumes approximately 64 KB of memory. Therefore, the parameter should be matched with the available memory on the server.
net.ipv4.tcp_max_orphans = 65536

13

⚙️ Decrease tcp_fin_timeout (default is 60). This parameter determines the maximum time a socket can remain in the FIN-WAIT-2 state. This state is used when the other party does not close the connection. Each socket occupies about 1.5 KB of memory, which can consume memory when there are many of them.
net.ipv4.tcp_fin_timeout = 10

14

⚙️ Parameters related to TCP connection checks in the SO_KEEPALIVE status: keepalive_time specifies the time after which checks will begin after the last activity on the connection, keepalive_intvl determines the interval between checks, and keepalive_probes specifies the number of checks.
net.ipv4.tcp_keepalive_time = 1800
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 5

15

⚙️ Pay attention to the parameters net.ipv4.tcp_mem, net.ipv4.tcp_rmem, and net.ipv4.tcp_wmem. They heavily depend on the memory available on the server and are automatically calculated during system load. In general, it is not necessary to modify them, but sometimes it is possible to manually adjust them to increase the values.

16

⚙️ Disable (enabled by default) the transmission of syncookies to the host in case of SYN packet queue overflow for a specific socket.
net.ipv4.tcp_syncookies = 0

17

⚙️ Special attention should be given to the congestion control algorithm used in TCP networks. There are many algorithms (cubic, htcp, bic, westwood, etc.), and it is difficult to definitively say which one is better to use. Algorithms show different results in different load scenarios. The kernel parameter tcp_congestion_control controls this:
net.ipv4.tcp_congestion_control = cubic

18

⚙️ When the server has a large number of outbound connections, there may not be enough local ports for them. By default, the range 32768-60999 is used. It can be expanded:
net.ipv4.ip_local_port_range = 10240 65535

19

⚙️ Enable protection against TIME_WAIT attacks. By default, it is disabled.
net.ipv4.tcp_rfc1337 = 1

20

⚙️ The maximum number of open sockets waiting for connections has a relatively low default value. In kernels prior to 5.3, it is 128, which was increased to 4096 in kernel 5.4. It makes sense to increase it on busy and powerful servers:
net.core.somaxconn = 16384

21

⚙️ On powerful and busy servers, you can increase the default buffer size values for both receiving and transmitting for all connections. This parameter is measured in bytes. By default, it is 212992 or 208 KB.
net.core.rmem_default = 851968
net.core.wmem_default = 851968
net.core.rmem_max = 12582912
net.core.wmem_max = 12582912

22

⚙️ Disable local ICMP packet redirects. This should only be done if your server does not act as a router, i.e., if you have a regular web server.
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.send_redirects = 0

Additionally, you can completely disable kernel-level responses to ICMP requests. I usually don't Increase the value of tcp_max_orphans, which determines the maximum number of orphaned (not associated with any process) TCP sockets.

Each socket consumes approximately 64 KB of memory. Therefore, the parameter should be matched with the available memory on the server.
net.ipv4.tcp_max_orphans = 65536

23

⚙️ Decrease tcp_fin_timeout (default is 60). This parameter determines the maximum time a socket can remain in the FIN-WAIT-2 state. This state is used when the other party does not close the connection. Each socket occupies about 1.5 KB of memory, which can consume memory when there are many of them.
net.ipv4.tcp_fin_timeout = 10

24

⚙️ Parameters related to TCP connection checks in the SO_KEEPALIVE status: keepalive_time specifies the time after which checks will begin after the last activity on the connection, keepalive_intvl determines the interval between checks, and keepalive_probes specifies the number of checks.
net.ipv4.tcp_keepalive_time = 1800
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 5

25

⚙️ Pay attention to the parameters net.ipv4.tcp_mem, net.ipv4.tcp_rmem, and net.ipv4.tcp_wmem. They heavily depend on the memory available on the server and are automatically calculated during system load. In general, it is not necessary to modify them, but sometimes it is possible to manually adjust them to increase the values.

26

⚙️ Disable (enabled by default) the transmission of syncookies to the host in case of SYN packet queue overflow for a specific socket.
net.ipv4.tcp_syncookies = 0

27

⚙️ Special attention should be given to the congestion control algorithm used in TCP networks. There are many algorithms (cubic, htcp, bic, westwood, etc.), and it is difficult to definitively say which one is better to use. Algorithms show different results in different load scenarios. The kernel parameter tcp_congestion_control controls this:
net.ipv4.tcp_congestion_control = cubic

28

⚙️ When the server has numerous outbound connections, there may not be enough local ports for them. By default, the range 32768-60999 is used. It can be expanded:
net.ipv4.ip_local_port_range = 10240 65535

29

⚙️ Enable protection against TIME_WAIT attacks. By default, it is disabled.
net.ipv4.tcp_rfc1337 = 1

30

⚙️ The maximum number of open sockets waiting for connections has a relatively low default value. In kernels prior to 5.3, it is 128, which was increased to 4096 in kernel 5.4. It makes sense to increase it on busy and powerful servers:
net.core.somaxconn = 16384

31

⚙️ On powerful and busy servers, you can increase the default buffer size values for both receiving and transmitting for all connections. This parameter is measured in bytes. By default, it is 212992 or 208 KB.
net.core.rmem_default = 851968
net.core.wmem_default = 851968
net.core.rmem_max = 12582912
net.core.wmem_max = 12582912

32

⚙️ Disable local ICMP packet redirects. This should only be done if your server does not act as a router, i.e., if you have a regular web server.
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.send_redirects = 0
Additionally, you can completely disable kernel-level responses to ICMP requests. It is not a common practice.

Sources:
a. https://man7.org/linux/man-pages/man7/tcp.7.html
b. https://cr.yp.to/syncookies.html
c. https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
d. https://blog.cloudflare.com/unbounded-memory-usage-by-tcp-for-receive-buffers-and-how-we-fixed-it/
e. https://www.geeksforgeeks.org/tcp-connection-termination/

Thanks to Mike Freemon and Marek 🐦@majek04 Majkowski.

Top comments (1)

Collapse
 
chris_f_4e08afa2c2f125c2d profile image
Chris F

Some good tips here. Would be nice if each tip explained whether it was a performance, memory, or security enhancement. Also, some of them are repeated.