Chris White

Posted on Jul 23, 2023

Python Networking: Servers

#python #tutorial #networking

Note: I've added table of contents to previous installments so they'll hopefully be easier to navigate. Thanks to derlin for the nifty TOC generation tool!

Security Note
Basic Server
Permissions Dropping
Socket Server
A Thread Story or GIL Steals Your Lunch Money
Multiprocessing
Blocking and Polling via Selector
Conclusion

So far we've seen the basic communication pattern for networking and the three low level protocols that make up the core of Internet communication. Now we're going to look at how servers operate. Servers are anything from the Bulletin Board Systems (BBS) from back in the days, to modern web servers hosting millions of clients. Covered in this article will be a simple python server and slowly add more functionality to how it serves data.

Security Note

The code here does not guard against malicious attacks done via manipulating how client data is sent. It's only meant to show the basics of how each type of server works. If you're working with a public facing service you should really have a reverse proxy and even a firewall in front of it to handle such attacks. I generally prefer doing it at that layer since it's easier to handle network hardening in easy to update software than trying to handle it across who knows how many codebases. So basically:

Don't use any of this in production

Basic Server

Most servers have a workflow of:

Bind to a part
Start listening for traffic
Accept a connection
Deal with the connection
Close the connection

So we'll start with an echo server that simply replies back to the client with what it was sent. Here is some example code from the python documentation:

# Echo server program
import socket

HOST = ''                 # Symbolic name meaning all available interfaces
PORT = 50007              # Arbitrary non-privileged port
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.bind((HOST, PORT))
    s.listen(1)
    conn, addr = s.accept()
    with conn:
        print('Connected by', addr)
        while True:
            data = conn.recv(1024)
            if not data: break
            conn.sendall(data)

And the client:

# Echo client program
import socket

HOST = 'localhost'    # The remote host
PORT = 50007              # The same port as used by the server
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.connect((HOST, PORT))
    s.sendall(b'Hello, world')
    data = s.recv(1024)
print('Received', repr(data))

The results are:

> python .\simple_server.py
Connected by ('127.0.0.1', 53811)
>

> python .\simple_client.py
Received b'Hello, world'
>

Before continuing with this I'd like to take a moment to discuss port bind permissions.

Permissions Dropping

One interesting thing to note is per the IANA well known ports listing there is actually a specific port number 7 which is designated for an echo server. If I try to bind this in Windows as a non-privileged user:

TCP    0.0.0.0:7              0.0.0.0:0              LISTENING       24664

It happily complies with the request (though you may need a one time windows firewall exception). Linux on the other hand:

    s.bind((HOST, PORT))
PermissionError: [Errno 13] Permission denied

The port bind gets rejected. This will occur on most any *NIX like system. Now we could just run it as root to solve the problem:

# python3 SimpleServer/simple_server.py
Server bound to port 7

But in general running services as root is not really desired since if someone manages to exploit the server they could potentially have full control over the system. To get around this we can utilize os.setuid and os.setgid. The code then becomes something like this:

# Echo server program
import socket, os, pwd, grp

HOST = ''                 # Symbolic name meaning all available interfaces
PORT = 7                  # Well known echo port

# https://stackoverflow.com/a/2699996
def drop_privileges(uid_name='nobody', gid_name='nogroup'):
    if os.getuid() != 0:
        # We're not root so, like, whatever dude
        return

    # Get the uid/gid from the name
    running_uid = pwd.getpwnam(uid_name).pw_uid
    running_gid = grp.getgrnam(gid_name).gr_gid

    # Remove group privileges
    os.setgroups([])

    # Try setting the new uid/gid
    os.setgid(running_gid)
    os.setuid(running_uid)

    # owner/group r+w
    old_umask = os.umask(0x007)

with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.bind((HOST, PORT))
    print(f'Server bound to port {PORT}')
    drop_privileges()
    s.listen(1)
    conn, addr = s.accept()
    with conn:
        print('Connected by', addr)
        while True:
            data = conn.recv(1024)
            if not data: break
            conn.sendall(data)

This will drop permissions to a specific user, with "nobody" and "nogroup" by default. The pwd.getpwnam call obtains the entry for the user in the UNIX password database, (most of the time will be /etc/passwd) and grp.getgrnam does the same for the UNIX group database (most of the time will be /etc/group). After running this we can see the port is bound, but the process is running as nobody:

# python3 SimpleServer/simple_server_drop_priv.py
Server bound to port 7
$ pgrep -a -u nobody
285691 python3 SimpleServer/simple_server_drop_priv.py

umask is related to permissions for files and directories created by the process. The 007 I have set allows user and group to have full access to the files, while all other users are blocked from access. This means I could change the process group to something like "serveradmin" and users in those groups would be able to interact with the server's files. Alex Juarez has a good article on permissions in general. This Stack Overflow answer also has an interesting look at the nuances of how umask operates.

Socket Server

Now the problem with the existing server is it exits right away and only handles one connection. This functionality is not practical for something like a web server which needs to constantly serve clients. Now we could make some modifications to have it continually serve connections:

# Echo server program
import socket

HOST = ''     # Symbolic name meaning all available interfaces
PORT = 9999   # Arbitrary non-privileged port
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
    s.bind((HOST, PORT))
    s.listen()
    while True:
        conn, addr = s.accept()
        print('Connected by', addr)
        with conn:
            data = conn.recv(1024).strip()
            print("{} wrote:".format(addr))
            print(data)
            conn.sendall(data)

But it's fairly low level not very extendable as-is. Thankfully python has the socket server module to provide abstraction in setting up a server. The python docs also have an example for a socket server:

import socketserver

class MyTCPHandler(socketserver.BaseRequestHandler):
    """
    The request handler class for our server.

    It is instantiated once per connection to the server, and must
    override the handle() method to implement communication to the
    client.
    """

    def handle(self):
        # self.request is the TCP socket connected to the client
        self.data = self.request.recv(1024).strip()
        print("{} wrote:".format(self.client_address[0]))
        print(self.data)
        # just send back the same data, but upper-cased
        self.request.sendall(self.data.upper())

if __name__ == "__main__":
    HOST, PORT = "localhost", 9999

    # Create the server, binding to localhost on port 9999
    with socketserver.TCPServer((HOST, PORT), MyTCPHandler) as server:
        # Activate the server; this will keep running until you
        # interrupt the program with Ctrl-C
        server.serve_forever()

While the server creation is still somewhat more imperative in nature, client connections are now handled via an object which inherits off socketserver.BaseRequestHandler. This requires the implementing class to define a handle() method, which for TCP will expose self.request to hold a socket referencing the connection. Now to show multiple connections working I'll utilize the Apache HTTP server benchmarking tool. This is easily available in Ubuntu via sudo apt-get install apache2-utils:

$ ab -i -n 20 http://172.18.128.1:9999/

This will make several short HTTP requests to the server (I've adjusted the code to bind to the proper IP address). 20 requests will be executed to the server which we can see the result of here:

Being a benchmarking tool we also get some nice statistics, mostly:

Requests per second:    1992.99 [#/sec] (mean)

This is compared to the simple server infinite loop version:

Requests per second:    1745.88 [#/sec] (mean)

Now while the code layout has improved there's still the issue of handling multiple clients at once. One interesting way to handle this is to separate the socket acceptance and the actual client handler out.

A Thread Story or GIL Steals Your Lunch Money

Threads is one way of looking at this issue. The short story is it makes threading not as performant as a language without it using native thread. The long story is another full article. Socketserver has a thread server wrap around to help with this:

import threading
import socketserver

class ThreadedTCPRequestHandler(socketserver.BaseRequestHandler):

    def handle(self):
        print("{} wrote:".format(self.client_address[0]))
        data = str(self.request.recv(1024), 'ascii')
        print(data)
        cur_thread = threading.current_thread()
        response = bytes("{}: {}".format(cur_thread.name, data), 'ascii')
        self.request.sendall(response)

if __name__ == "__main__":
    # Port 0 means to select an arbitrary unused port
    HOST, PORT = "localhost", 50007

    server = socketserver.ThreadedTCPServer((HOST, PORT), ThreadedTCPRequestHandler)
    server.serve_forever()

While you do have to deal with the GIL, it's really not that bad for a basic sized server.

Multiprocessing

When you run a server it gets a identified by the system as a process. A process can in turn run another process (the top of the chain is the init process for most operating systems). These are often known as child processes and the process that spawned them is the parent process. The multiprocessing module in the python standard library is able to manage such child processes. Using this method a client is attached to a process for handling:

import multiprocessing as mp
import logging
import socket
import time

logger = mp.log_to_stderr(logging.DEBUG)

# https://stackoverflow.com/a/8545724
# With modifications for echo server
def worker(socket):
    while True:
        client, address = socket.accept()
        data = client.recv(1024)
        logger.debug("{u} connected".format(u=address))
        print(data)
        client.sendall(data)
        client.close()
if __name__ == '__main__':
    num_workers = 20

    serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    serversocket.bind(('',9999))
    serversocket.listen(20)

    workers = [mp.Process(target=worker, args=(serversocket,)) for i in
            range(num_workers)]

    for p in workers:
        p.daemon = True
        p.start()

    while True:
        try:
            time.sleep(10)
        except:
            break

So this will create 20 worker processes which will be listening for connections to the main server (yes, you can have multiple accept() calls). Looking at the processes:

Indeed we see that there are 21 python processes, the main parent process and the 20 worker processes. Now the issue here is that while we've split the workload up each worker process is still bound to finishing the client communication before moving on to the next. What if we could remove some of the barriers in waiting for client communication?

Blocking and Polling via Selector

It turns out that IO has a concept of blocking and non-blocking. Socket communication by default is blocking, meaning you have to wait for work like receiving data from a connection to be done before moving on. To get around this we can set socket communication to non-blocking via socket.setblocking. This means the usual socket methods will return right away. Unfortunately this has two inherent issues with a standard setup:

Not sending/receiving all the client data
High CPU usage on accept() loops due to continually calling it

To work around this there are several calls that deal with a connection being available which are supported by the selector module. By using DefaultSelector the most optimal for your operating system is chosen. As an example:

import socket
import selectors
import types

from io import BytesIO

host = "localhost"
port = 50007

def accept_wrapper(sock):
    conn, addr = sock.accept()
    print('accepted connection from', addr)
    conn.setblocking(False)
    data = types.SimpleNamespace(addr=addr, inb=b'', outb=BytesIO())
    sel.register(conn, selectors.EVENT_READ, data=data)

def service_connection(key, mask):
    sock = key.fileobj
    data = key.data
    if mask & selectors.EVENT_READ:
        try:
            recv_data = sock.recv(1024)
            if recv_data:
                data.outb.write(recv_data)
            else:
                sel.modify(socket, selectors.EVENT_WRITE, data=data)
        except:
            print('closing connection to', data.addr)
            sel.unregister(sock)
            sock.close()

    if mask & selectors.EVENT_WRITE:
        print('writing data to ', data.addr)
        sock.sendall(data.outb.getvalue())
        print('closing connection to', data.addr)
        sel.unregister(sock)
        sock.close()

sel = selectors.DefaultSelector()

lsock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
lsock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
lsock.bind((host, port))
lsock.listen()
print('listening on', (host, port))
lsock.setblocking(False)
sel.register(lsock, selectors.EVENT_READ, data=None)

while True:
    events = sel.select(timeout=None)
    for key, mask in events:
        if key.data is None:
            accept_wrapper(key.fileobj)
        else:
            service_connection(key, mask)

Now behind the scenes selector has a few options available as to how it's doing things. In the end though the process is:

Set sockets to non-blocking
Block until a client connects
The accept will immediately return the client socket
Add that socket to the list of sockets to check
Block again, this time the client socket we just accepted is in the list of sockets to be notified about along with the server socket
A socket is ready
Go to 3 if it's the server socket
If it's a client connection, run a handler against it to deal with the data
Mostly loop back to 5, except we're not adding any new sockets

Which is pretty much how the loop goes. The main ways you'll generally deal with this are: select(), poll(), and epoll(). All of these have their own Selector() implementation. Using DefaultSelector generally picks the most optimal. In general, select is not quite the best performant due to the limit it has of 1024 sockets it can check (though it does work on Windows). poll() is an enhanced version, while still keeping somewhat portable. Both select() and poll() are essentially keeping a list of sockets to look at and going through them each time. epoll() on the other hand is more reactive instead allowing the ability to handle a large amount of sockets more efficiently than select() and poll(). That said, it's only available on Linux which limits portability (not a huge issue given how easy it is to get a Linux server these days). Handling a large number of connections efficiently is often referred to as the C10k problem (or some variant of k). Looking at the code now:

lsock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
lsock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
lsock.bind((host, port))
lsock.listen()
print('listening on', (host, port))
lsock.setblocking(False)
sel.register(lsock, selectors.EVENT_READ, data=None)

Here we have a normal socket bind and listen for the server. The server's socket is set to be non-blocking and registered into the list of sockets were interested in.

while True:
    events = sel.select(timeout=None)
    for key, mask in events:
        if key.data is None:
            accept_wrapper(key.fileobj)
        else:
            service_connection(key, mask)

Now we have a main event loop. For the server socket the data property is set to none. If this is the case we run the client socket accept handler. Otherwise we're dealing with an existing connection that needs to be handled.

def accept_wrapper(sock):
    conn, addr = sock.accept()
    print('accepted connection from', addr)
    conn.setblocking(False)
    data = types.SimpleNamespace(addr=addr, inb=b'', outb=BytesIO())
    sel.register(conn, selectors.EVENT_READ, data=data)

This will accept our connection and also set it to non-blocking. The next thing it does is setup a SimpleNamespace which is nicely explained here. It will be attached to the socket as a way to keep state when dealing with it. This allows for interaction between readers and writers. outb is set to a BytesIO type which is very performant when working with byte concatenation, which we'll be doing to keep track of data read in.

def service_connection(key, mask):
    sock = key.fileobj
    data = key.data
    if mask & selectors.EVENT_READ:
        try:
            recv_data = sock.recv(1024)
            if recv_data:
                data.outb.write(recv_data)
            else:
                sel.modify(socket, selectors.EVENT_WRITE, data=data)
        except:
            print('closing connection to', data.addr)
            sel.unregister(sock)
            sock.close()

    if mask & selectors.EVENT_WRITE:
        print('writing data to ', data.addr)
        sock.sendall(data.outb.getvalue())
        print('closing connection to', data.addr)
        sel.unregister(sock)
        sock.close()

Now is the interesting part. The code will check if this is a read or write event. By default, the only thing that's being checked is if a socket is ready for reading. When everything is done we need to echo back so we switch writing mode. Then on the writing side we simply send all the data we have gathered back and close and remove the socket from the list of sockets we're interested in. The epoll() version gives a nice count on requests per second:

Requests per second:    6216.97 [#/sec] (mean)

You can force a specific selector by changing DefaultSelector to:

SelectSelector ( select() )
PollSelector ( poll() )
EpollSelector ( epoll() )

Conclusion

I will say that this article to me is mostly showing different server types. If you're really in need of true performance it might be better to consider a language built for that ( such as GoLang, especially since it has emphasis on networking ) or have dedicated software that deals with all the nuances of network communication. In fact, most of the time you won't need to deal with this much in the modern cloud computing world. Load balancers, containerized microservices, and many managed services handle much of this for you. If you really just want to work with one to test things out the blocking threaded socketserver is good enough in my opinion. Now that we've seen different types of servers the next installment will be looking at a specialized type of server: HTTP.

DEV Community

Python Networking: Servers

Security Note

Basic Server

Permissions Dropping

Socket Server

A Thread Story or GIL Steals Your Lunch Money

Multiprocessing

Blocking and Polling via Selector

Conclusion

Top comments (0)

Read next

What I'd do differently in Bootcamp. (spoiler: Everything)

5 Signs You’ve Built a Secretly Bad Architecture (And How to Fix It)

How to Download and Install FortiClient VPN for Your Windows PC

Azure Functions (dotnet): The Right Way to Work with Queue Storage