DEV Community

Cover image for Optimizing Rails connections
Hopsoft
Hopsoft

Posted on • Edited on

Optimizing Rails connections

This is the 3rd post in a series that starts here..

Have you ever experienced a Rails app that suddenly and inexplicably delivers dismal response times or becomes completely unresponsive? It can be incredibly confusing, especially when nothing in the environment has changed recently.

Let's look at one of the most common reasons this might happen.

Datastore Connections

Correctly configuring Rails connections can be complicated. Here's a short video that explains how easy it is to get wrong.

Note that this problem isn't isolated to your primary database either. It's possible to misconfigure any datastore that has limited connections. Postgres, Redis, Memcached, etc...

There are a few aspects of this problem to consider.

Connection Limits

Datastores have an upper limit to how many active connections they can reasonably support. System resources typically govern these limits. For example, PostgreSQL defaults to 100 connections while Redis defaults to 10,000. It's possible to override default connection limits. Just be sure you know what you're doing.

Hosting providers impose further limits. Consider that Heroku's Standard 0 Postgres plan caps at 120 connections and their Premium 0 Redis plan caps at 40.

Connection Pools

Establishing a connection is an expensive and slow operation. Connection pooling reduces overall latency by limiting how frequently connections are set up by reusing pre-established connections.

The improved efficiency gained from connection pooling enables applications to serve more concurrent requests than otherwise possible based on datastore connection limits alone. Pooling is a force multiplier.

Deployment Topology

The deployment configuration plays a significant role. Consider a database with a limit of 40 connections. We'd be at capacity with 2 servers configured to use 20 connections each.

This problem compounds as the deployment becomes more complex. And the challenge isn't limited to the number of servers. We also need to account for the number of processes running per server and the number of threads active in each process.

The Variables

Typical Rails applications use the following environment variables.

  • WEB_CONCURRENCY - the number of processes
  • RAILS_MAX_THREADS - the number of threads

These are descriptive names, but I think we can do better. Given that Rails applications do more than serve web requests, let's consider some more suitable names before looking at formulas.

  • SERVERS - the number of servers/dynos
  • PROCESSES - the number of processes
  • THREADS - the number of threads

We also need to factor in external systems that have access to the database. This adds another variable to consider.

  • EXTERNAL_CONNECTIONS

That's all the variables, now onto the formula.

The Formula

The formula is quite simple.



(SERVERS * PROCESSES * THREADS) + EXTERNAL_CONNECTIONS


Enter fullscreen mode Exit fullscreen mode

Let's look at a standard Rails deployment.

  • 2 web servers, 2 processes, 8 threads
  • 3 worker servers, 4 processes, 16 threads
  • 10 external connections

Standard Rails Deployment

Here's the equation for the number of connections.



(2 * 2 * 8) + (3 * 4 * 16) + 10 = 234


Enter fullscreen mode Exit fullscreen mode

It's vital to ensure that you don't configure more connections than are available. You may need to increase the provisioned hardware or reduce the number of connections your application uses.

Configuration

Here's an example of configuring Rails for the deployment described above.



# config/database.yml
default: &default
  adapter: postgresql
  encoding: unicode
  pool: <%= ENV["THREADS"] %>
  # it's generally a good idea to configure timeouts
  connect_timeout: 1
  variables:
    statement_timeout: 5s
    lock_timeout: 2s


Enter fullscreen mode Exit fullscreen mode


# config/initializers/sidekiq.rb
settings = {
  url: ENV["REDIS_QUEUE_URL"], 
  size: ENV["THREADS"].to_i + 2, # pool size
  # it's generally a good idea to configure timeouts
  connect_timeout: 0.2,
  read_timeout: 0.5,
  write_timeout: 0.5
}

Sidekiq.configure_server do |config|
  config.redis = settings
end

Sidekiq.configure_client do |config|
  config.redis = settings
end


Enter fullscreen mode Exit fullscreen mode


# config/environments/production.rb
Rails.application.configure do
  config.cache_store = :redis_cache_store, {
    url: ENV["REDIS_CACHE_URL"],
    size: ENV["THREADS"] + 2, # pool size
    # it's generally a good idea to configure timeouts
    connect_timeout: 0.2,
    read_timeout: 0.5,
    write_timeout: 0.5
  }
end


Enter fullscreen mode Exit fullscreen mode


# config/puma.rb
workers ENV["PROCESSES"]
max_threads_count = ENV["THREADS"]
min_threads_count = ENV["THREADS"]
threads min_threads_count, max_threads_count


Enter fullscreen mode Exit fullscreen mode


# Procfile
web: PROCESSES=2 THREADS=8 bin/start-pgbouncer bundle exec puma -C config/puma.rb
worker: PROCESSES=4 THREADS=16 bin/start-pgbouncer bundle exec sidekiq -C config/sidekiq.yml


Enter fullscreen mode Exit fullscreen mode

The formula and configuration may appear simple, but they are deceptively complex. The complexity becomes more apparent as the deployment grows more sophisticated. For example, consider a deployment with multiple worker pools of varying sizes backed by Redis instances with different connection limits. Can you think of other deployment topologies? How does the math work out?

Advanced Options

Stacked Pooling

It's possible to stack connection pooling with tools like pg_bouncer, which can help further ensure your Rails application doesn't exhaust database connections. Heroku supports this with a buildpack.

Sidekiq Swarm

The number of processes and threads can be hard to determine when using tools like Sidekiq swarm. Just pay attention and ensure that you know what the number is.

Top comments (0)