How to run a really long task from a Rails web request

#rails #linux #rake

Recently, our management needed a way to export invoices in bulk. After the manager selects the first and last invoice for the batch in a web form, an asynchronous process should start that generates PDF files for the invoices, packs them into a zip file and sends the manager an email with a link to download the export. Now, generating the PDFs is slow, very slow. For larger batches involving hundreds or thousands of invoices, this process can easily take 10 or 15 minutes or even more.

So how do we trigger such a long-running process from a Rails request? The first option that comes to mind is a background job run by some of the queuing back-ends such as Sidekiq, Resque or DelayedJob, possibly governed by ActiveJob. While this would surely work, the problem with all these solutions is that they usually have a limited number of workers available on the server and we didn’t want to potentially block other important background tasks for so long.

What we wanted instead was to run a new, separate process from the Rails request. Something like running a Rake task but triggered by a web request. In fact, we even had the bulk export already implemented as a Rake task, so what we actually wanted was to make this task accessible from our admin web interface.

”Forking“ the process

The standard way on Unix-like systems to spawn a new process is to fork it. In a Rails controller, forking a rake task could look like this:

class BulkInvoiceExportsController < ApplicationController
  def create
    child = fork do
      exec("bin/rails export_invoices FROM=20220001 TO=20220100 \\
            >> /tmp/bulk_invoices_export.log 2>&1")
    end
    Process.detach(child)
  end
end

Let’s note a few things about the code inspired by this StackOverflow answer:

The Process#fork method splits the current process (its current thread) into two copies and the new child process runs the code in the block.
The child process is then replaced with a newly loaded process using Process#exec.
The final child process inherits all important settings from the parent process, such as environment variables, open file descriptors or current working directory. This is why we can simply run bin/rails without having to set up the correct ruby first (even when using a ruby version manager such as rvm, rbenv or chruby) and without specifying an absolute path to the Rails binary.
Because the code in the block uses shell redirection, the child Rails process is not executed directly but using a standard shell (usually /bin/sh). Redirection allows us to debug and monitor what is going on in the rake task.
By default, the operating system expects that the parent process is interested in the child process termination status. We are not – we want to run the rake task and forget about it, the task handles everything else such as sending the final email by itself. That’s why we call Process#detach to let the OS know we don’t care about the child process and to prevent accumulating zombie processes.

”Spawning“ the process

If we wanted to make our code more portable (usable on Windows, for example), we would have to use Process#spawn instead of fork, as suggested in the ruby documentation. The spawn method also allows to fine-tune the child process environment, file descriptors, limits or working directory.

An almost equivalent way of scheduling the rake task using spawn could be written this way:

class BulkInvoiceExportsController < ApplicationController
  def create
    child = spawn("bin/rails export_invoices FROM=20220001 TO=20220100",
                  %i[out err] => %w[/tmp/bulk_invoices_export.log a])
    Process.detach(child)
  end
end

Security caveats

Please keep in mind that triggering such a long-running process from the controller is not safe. In the previous examples, each request to the create action of the controller leads to spawning one external Rails process, consuming perhaps a substantial portion of the CPU and memory resources and opening more connections to your database servers. This is a setup very vulnerable to DoS attacks.

The technique is probably OK only in very controlled environments such as in an internal admin area accessible to a limited number of people who know what they are doing and when the function is used only sparingly. If we wanted to make this rake task publicly accessible (as in a ”data take out“ function, for example), we would definitely resort to a real queuing system such as those mentioned above or perhaps a queuing daemon on the system level (e.g. atd which can hold the tasks based on the server load).

Anyway, for our use case, directly forking the rake task from the controller was the most pragmatic way to go and we are happy about the result.

If you don’t want to miss future posts like this, follow me here or on Twitter. Cheers!

DEV Community

How to run a really long task from a Rails web request

”Forking“ the process

”Spawning“ the process

Security caveats

Top comments (0)

Read next

Ruby on Rails 8: Custom Compression for Encrypted Data

Ruby on Rails 8: How to Batch with Custom Columns

How to undo a git pull command

Ruby on Rails 8: Simplifying Sharding with New Methods