Previously on this channel ... GNU tree annoyed me so I fixed it
I am one of the CPAN-testers. That means that I automatically download code from the internet and run it. I can protect against most of the evils that exposes me to by running it in a VM with limited resources and privileges and so on. There is, however, one attack that is hard to defend against like that. Often - due to bugs and not malice - some code that I'm testing will just hang, and sit there doing nothing. It's not using much CPU, or memory, or thrashing the disk, or using the network, it's just sitting there doing nothing. Usually waiting for something on the console. But because my testing is largely automated I'm not paying any attention to the console, so I don't placate it, and so my whole testing pipeline shudders to a halt as something sits there doing nothing for hours or even days.
The obvious solution is to impose a time limit (a wall-clock time limit, not a CPU time limit of course), and the obvious tool for that is GNU timeout. It's very good, and I recommend it. However, it doesn't quite fit my needs. In particular, its --foreground
option, which allows the command it runs to read the terminal and get terminal signals, has a fatal flaw in that if there's a timeout only that command will be killed, leaving any child processes still running. In my use case it's almost always a child process that hangs.
There are a few other similar tools out there, but they all have some problem or other that makes them unsuitable for my use. Some of them don't preserve the command's exit status, and others aren't conveniently packaged for all the different OSes I use. So I wrote my own.
I'm leaving out some details here, but the core of the program is this:
use Proc::Killfam;
my $pid = fork();
if(!defined($pid)) {
die("Error forking\n")
} elsif($pid) { # still in the ptimeout process
$SIG{TERM} = sub { exit 124 };
my $status = system @command_args;
kill SIGTERM => $pid;
exit $status >> 8;
} else { # watchdog child process
sleep $timeout;
killfam SIGTERM => getppid;
}
It forks off a child process which acts as a watchdog. If for any reason we can't fork it just stops immediately, as nothing will work anyway.
The watchdog immediately goes to sleep for $timeout
seconds. If it ever reaches the end of that period it sends a SIGTERM
to its parent process and all its children (the killfam
function is what finds and kills all the children, its implemented in Proc::Killfam
which you can find on the CPAN).
The original process, in the mean time, creates a signal handler for SIGTERM
and then runs whatever command you told it to.
If that command finishes quickly, before the timeout, then the original process sends a SIGTERM
to the watchdog, killing it as it is no longer needed, then exits with the same exit status code as the command.
Otherwise, if the timeout is reached, the watchdog sends a SIGTERM
to the original command and all its children, including the command that you told it to run and any of its children that it can find. The original command's signal handler for SIGTERM
catches the signal and turns it into an exit status of 124, just like GNU timeout does, so that it's reasonably easy to tell the difference between a normal exit-with-failure and a timeout.
Top comments (6)
Nice one. Clever way to reverse the logic (because I would have run the command in the child and use waitpid() with a timeout + killfam in the parent).
Also, I wonder if it is possible to avoid the fork altogether by using alarm() and calling killfam() on $$ in the signal handler.
Looks like a
alarm()
and groupkill
do the same job. Only requiresPOSIX
.This is needlessly restrictive. It should be
>=
instead of==
. That way you can avoid running the given command through the shell.It may be useful to use
system { $command[0] } @command
here instead, which will ensure the command is never run under shell. This loses a bit of convenience if the user did want to run the command under shell, but prependingsh -c ...
to the arguments is no big deal and this keeps the program from containing a hidden “oops, you got shell when you didn’t mean to” trap.But this is more subjective than the other change.
Good suggestions, thanks. I quickly threw this together to test the signal mechanism, didn't pay attention to the arguments that much.
FWIW,
use strict
is redundant given youruse 5.014
. (This applies touse 5.012
and up.)Fixed, including the things you mention below. Thanks.