GNU timeout annoyed me so I replaced it; or an extremely simple introduction to fork() and signal handling in perl

#signals #fork #perl

Previously on this channel ... GNU tree annoyed me so I fixed it

I am one of the CPAN-testers. That means that I automatically download code from the internet and run it. I can protect against most of the evils that exposes me to by running it in a VM with limited resources and privileges and so on. There is, however, one attack that is hard to defend against like that. Often - due to bugs and not malice - some code that I'm testing will just hang, and sit there doing nothing. It's not using much CPU, or memory, or thrashing the disk, or using the network, it's just sitting there doing nothing. Usually waiting for something on the console. But because my testing is largely automated I'm not paying any attention to the console, so I don't placate it, and so my whole testing pipeline shudders to a halt as something sits there doing nothing for hours or even days.

The obvious solution is to impose a time limit (a wall-clock time limit, not a CPU time limit of course), and the obvious tool for that is GNU timeout. It's very good, and I recommend it. However, it doesn't quite fit my needs. In particular, its --foreground option, which allows the command it runs to read the terminal and get terminal signals, has a fatal flaw in that if there's a timeout only that command will be killed, leaving any child processes still running. In my use case it's almost always a child process that hangs.

There are a few other similar tools out there, but they all have some problem or other that makes them unsuitable for my use. Some of them don't preserve the command's exit status, and others aren't conveniently packaged for all the different OSes I use. So I wrote my own.

I'm leaving out some details here, but the core of the program is this:

    use Proc::Killfam;

    my $pid = fork();
    if(!defined($pid)) {
        die("Error forking\n")
    } elsif($pid) { # still in the ptimeout process
        $SIG{TERM} = sub { exit 124 };

        my $status = system @command_args;
        kill SIGTERM => $pid;
        exit $status >> 8;
    } else { # watchdog child process
        sleep $timeout;
        killfam SIGTERM => getppid;
    }

It forks off a child process which acts as a watchdog. If for any reason we can't fork it just stops immediately, as nothing will work anyway.

The watchdog immediately goes to sleep for $timeout seconds. If it ever reaches the end of that period it sends a SIGTERM to its parent process and all its children (the killfam function is what finds and kills all the children, its implemented in Proc::Killfam which you can find on the CPAN).

The original process, in the mean time, creates a signal handler for SIGTERM and then runs whatever command you told it to.

If that command finishes quickly, before the timeout, then the original process sends a SIGTERM to the watchdog, killing it as it is no longer needed, then exits with the same exit status code as the command.

Otherwise, if the timeout is reached, the watchdog sends a SIGTERM to the original command and all its children, including the command that you told it to run and any of its children that it can find. The original command's signal handler for SIGTERM catches the signal and turns it into an exit status of 124, just like GNU timeout does, so that it's reasonably easy to tell the difference between a normal exit-with-failure and a timeout.

Top comments (6)

Steven Bakker • Dec 23 '21

Nice one. Clever way to reverse the logic (because I would have run the command in the child and use waitpid() with a timeout + killfam in the parent).

Also, I wonder if it is possible to avoid the fork altogether by using alarm() and calling killfam() on $$ in the signal handler.

Steven Bakker • Dec 23 '21 • Edited

Looks like a alarm() and group kill do the same job. Only requires POSIX.

#!/usr/bin/perl

use 5.014;
use warnings;
use POSIX qw( setpgid );

my $USAGE = "usage: $0 nsecs command ...\n";

@ARGV >= 2 or die $USAGE;
my ($timeout, @command) = @ARGV;
$timeout =~ /^\d+$/ or die $USAGE;

# Create new process group with ourselves as the group leader.
setpgid($$, 0) or die "cannot setpgid(): $!\n";

# Catch an ALRM signal by sending the TERM signal to the group.
$SIG{ALRM} = sub {
    warn "TIMEOUT\n";
    kill 'TERM' => -$$;
};

# Any group kill will also hit us, so exit nicely.
$SIG{TERM} = sub { exit(128 + 14) };

alarm($timeout); # Arm the timer.
system { $command[0] } @command; # Run the command.
alarm(0); # Disarm the timer.
exit $?; # Exit with the child status.

Aristotle Pagaltzis • Dec 25 '21

@ARGV == 2 or die $USAGE;

This is needlessly restrictive. It should be >= instead of ==. That way you can avoid running the given command through the shell.

system @command;

It may be useful to use system { $command[0] } @command here instead, which will ensure the command is never run under shell. This loses a bit of convenience if the user did want to run the command under shell, but prepending sh -c ... to the arguments is no big deal and this keeps the program from containing a hidden “oops, you got shell when you didn’t mean to” trap.

But this is more subjective than the other change.