Anton Gubarev

Posted on May 29, 2022

Distributed locks in go without changing the app

#go #distributedsystems #lock #linux

Intro

For processes that must always exist in only one instance in a distributed system, we implement locks. There are many suitable tools for this, mainly the key-value stores. And this is implemented as the project grows and most likely directly in the code of the application.And the application starts to know something about the infrastructure around it, but whether it’s true. Of course not. This makes it more difficult for developers to run the application in a dev or test environment. And if the infrastructure changes again, this again will require work on the service.
I’ve been studying the web for ready-made solutions to this problem, but I haven’t found anything suitable. In this article series, I’ll tell you how to implement distributed locks without changing the code of the application itself, and without wasting the time of the developers (especially if you have dozens or hundreds of services)
This is the first part where I will introduce the reader into linux processes and we will see how to run and control them programmatically.

Processes

In order to achieve the main goal, you need to understand how the processes in linux work. In the kernel, the processes are presented simply as a structure. The process has attributes and states. There are a lot of attributes in the structure, but a few are important to us now:

PID. Unique process identifier in system
PPID. Unique process identifier of parent
Process command. Which process command runs and in which directory
Return code. Appears when the process completes

In Linux, no process appears out of nowhere. Everyone has a process that spawned it - the parent process. The exception is the init process (pid 1), which starts when the kernel starts (in some distributions, systemd or others can replace init).
When a new process starts, the parent fork first, after which we get a nearly identical process, but with a new PID and PPID which equals the parent PID. In this case, all memory, file descriptors, current working directory and so on are copied.
The exec of the new command is then executed in the resulting fork.
All processes can be in several states of existence:

Birth
Ready. Ready to work and waiting for the Linux scheduler to start running it and allocate CPU resources.
Execution. The process is in progress.
Waiting. The process is pending or blocked by other reasons.
Completion (death). Completes its work and frees up resources. The parent waits for the child to complete the process and reads the result of its completion, the response code. On the basis of this data, he can make a decision, either to put the information in the stdout, or to try to restart the process, or write something in the logs. For example, a well-known nginx web server creates a new process on each connection, and logs information about all errors in the child processes. By the way, the child can also write to the same logs, as they inherit file descriptors from the parent. But what if the parent dies before the child? His child processes will not die with him. They will be transferred to another process, you can say adopted. Usually this is PID 1, but depending on the distribution, this may differ. This knowledge is enough to continue the conversation. We can write a util that will run our main application and at the same time implement locks. Then you do not have to describe the nuances of the infrastructure inside the application. I will delve into the details and add the capabilities of our util step by step, gradually, for a better understanding. In Go there is an os/exec package that allows us to do this. Let’s try to start some process.

func main() {
    var cmd *exec.Cmd
    if len(os.Args) == 1 {
        cmd = exec.Command(os.Args[1])
    } else {
        cmd = exec.Command(os.Args[1], os.Args[2:]...)
    }

    fmt.Printf("I`am process %d \n", os.Getpid())
    println("Let`s start a new process")
    var outb, errb bytes.Buffer
    cmd.Stdout = &outb
    cmd.Stderr = &errb
    if err := cmd.Run(); err != nil {
        panic(err)
    }
    fmt.Printf("New process finished. Pid: %d \n", cmd.Process.Pid)
    fmt.Println("out:", outb.String(), "err:", errb.String())
}

Result:

go run main.go ls
I`am process 13405
Let`s start a new process
New process finished. Pid: 13406
out: go.mod
main.go
 err:

Let’s analyze this code in more detail:

var cmd *exec.Cmd
if len(os.Args) == 1 {
        cmd = exec.Command(os.Args[1])
    } else {
        cmd = exec.Command(os.Args[1], os.Args[2:]...)
    }
}

I describe the command as a structure. I take the data from the arguments. Since the null argument is a compiled binary, then I subtract arguments starting with index 1, which will be the name of the file to be run, then there may be arguments (or maybe not)

var outb, errb bytes.Buffer
cmd.Stdout = &outb
cmd.Stderr = &errb

I will need to read the standard output of a running command to see what was displayed. This creates two buffers for standard output and error output. I will return to it after the util is finished.

if err := cmd.Run(); err != nil {
    panic(err)
}

This is where the command was triggered. First, the fork of the current process (the compiled binary file of our application) is executed, and then what we passed in the arguments is started. This is a child process and the parent is waiting for it to complete. As soon as the child completes, the parent continues. And at the end I print the child process output.

fmt.Println("out:", outb.String(), "err:", errb.String())

Manage processes

At the moment I could try to insert the logic of the locks before starting the process. But what to do if our util dies, the child process will remain running and there will be nobody to unlock and no more process will start. So when the parent completes, all the child must be completed first.
Linux has the ability to manage processes. Signals are designed to do this. Signals are some events in the system that are sent to the process. Signals may come from the system kernel (for example, a hardware failure) or from the user (for example, a key is pressed).
There are many and there is no need to consider them all. If you wish, you can view them in kernel. I will bring those that may be important to us further in the topic under consideration.

SIGINT. Interrupt. Sent when the user sends a shutdown signal to the process. Ctrl-C
SIGKILL. Immediately terminate the process. The program cannot process or ignore this signal and will be terminated by the kernel.
SIGTERM. Terminate. Polite finishing. The program must complete its actions and finish correctly.

This set of signals must be processed by the parent and completed by the child. Except for SIGKILL, but there will be a separate conversation about it much later. So, let’s see an example of how this can be implemented:

func main() {
    var cmd *exec.Cmd
    if len(os.Args) == 1 {
        cmd = exec.Command(os.Args[1])
    } else {
        cmd = exec.Command(os.Args[1], os.Args[2:]...)
    }

    fmt.Printf("I`am process %d \n", os.Getpid())
    println("Let`s start a new process")

    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr
    if err := cmd.Start(); err != nil {
        panic(err)
    }
    fmt.Printf("Process started. Pid: %d \n", cmd.Process.Pid)

    sigs := make(chan os.Signal, 1)
    signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)

    ctx, cancel := context.WithCancel(context.Background())

    go func() {
        select {
        case sig := <-sigs:
            fmt.Printf("got signal: %s \n", sig.String())
            if err := cmd.Process.Signal(sig); err != nil {
                log.Fatalf("error sending signal to process: %v", err)
            }
            return
        case <-ctx.Done():
            println("finished go routin")
            return
        }

    }()

    if err := cmd.Wait(); err != nil {
        if _, ok := err.(*exec.ExitError); ok {
            log.Fatalf("Child process failed: %v", err)
        }
        log.Fatalf("Parent failed, wait command: %v", err)
    }

    println("process finished")
    cancel()
}

Function has changed a lot. Let’s look at it in detail.

cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Start(); err != nil {
    panic(err)
}

First, it is better to redirect process output streams to standard system outputs. Then you do not have to wait for the completion of the process to see its output. Especially if the process can work for a long time.
Second, I used Start() instead of Run(). This method does the same, starts a new process, but does not wait for its completion and continues the program.

sigs := make(chan os.Signal, 1)
signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)

Subscribe in the parent process to receive the interrupt signals mentioned above. For this purpose a buffered channel of size 1 is created.

go func() {
    select {
    case sig := <-sigs:
        fmt.Printf("got signal: %s \n", sig.String())
        if err := cmd.Process.Signal(sig); err != nil {
            log.Fatalf("error sending signal to process: %v", err)
        }
        return
    case <-ctx.Done():
        println("finished go routin")
        return
    }
}()

In this goroutine I listen to signals from the kernel. And as soon as they come in immediately duplicate their child process. It is also planned to complete the goroutine if the parent process is completed before any signal is received. It may not be accepted at all if the child properly works out and ends on her own.

if err := cmd.Wait(); err != nil {
    if _, ok := err.(*exec.ExitError); ok {
        log.Fatalf("Child process failed: %v", err)
    }
    log.Fatalf("Parent failed, wait command: %v", err)
}

Now the parent is waiting for the child to complete the process. The execution will continue when the child sends the completion code.

Try the option when the process ends on the Sigint signal.

go run main.go ping 8.8.8.8
I`am process 18060
Let`s start a new process
Process started. Pid: 18061
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=56 time=37.743 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=56 time=45.169 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=56 time=44.006 ms
^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 37.743/42.306/45.169/3.261 ms
got signal: interrupt
process finished

And the option where the process ended before any signal was received:

go run main.go ls
I`am process 18095
Let`s start a new process
Process started. Pid: 18096
go.mod  main.go
process finished

Conclusion

In the first part of the article series, I explained how the processes are arranged, how to manage them and showed an example of implementation on Go. In the next part I will add lock here. Of course, there were not considered many other cases, such as how to shoot a lock if the process got SIGKILL. I’ll write about it later, gradually getting into the topic.

DEV Community

Distributed locks in go without changing the app

Intro

Processes

Manage processes

Conclusion

Top comments (0)

Read next

The Power of Open Source Communities: Paving the Way to High-Paying Jobs

From PHPUnit to Go: Data-Driven Unit Testing for Go Developers

Optimizing PostgreSQL Mass Deletions with Table Partitioning

System Design: Building a Parking Lot System in Go