It’s no secret among my colleagues that I’m a huge fan of Postgres. I think the features of modern relational databases are a source of untapped potential in the Rails community. The Rails community tends to make websites where we can easily swap from one database platform to another, but if you’re okay with your app being tightly coupled to Postgres, you can unlock some of Postgres’s amazing, lesser-known features.
This is the first post in a series that will explore some of Postgres’s hidden gems.
Shortcomings of ActiveRecord Tracking
One of our clients is a government agency. Their highest priority is that their data be accurate and auditable. This means keeping track of what was changed and when.
For a while we did this using the paper-trail gem. This was a very simple way to add a few lines of code to keep track of all of the changes made to an ActiveRecord model. But it came with one drawback. Every change to the data had to be done through ActiveRecord. There are often times when this makes an app vulnerable to a race condition. I’ll use a contrived example so as not to share any real code from our client’s app.
Imagine that we are working on a payment processing app built in Ruby on Rails. (This is an instructional example. NEVER build a payments app like this.)
class Account < ActiveRecord::Base
# includes a numeric field for "balance"
def add_money!(value)
update!(balance: balance + value)
end
end
Here we have a model, Account
, that will allow us to update the balance attribute based on a supplied value. We also have a basic controller for this model, with an update action that accepts payment params and updates the account via the add_money!
method.
class AccountsController < ActionController::Base
# This is REALLY insecure.
# Do not build an actual bank site like this
# Security tips are beyond the scope of this blog post
def update
payment = params[:payment].to_f
account = Account.find(params[:id])
account.add_money!(payment)
end
end
Now imagine that your account has a balance of $10. You receive two payments at the same time, each for $2. The following timeline of events takes place, resulting in the account holder losing $2.
- Thread A opens a transaction
- Thread B opens a transaction
- Thread A reads from the database, seeing that the balance is $10
- Thread B reads from the database, seeing that the balance is $10
- Thread A writes to the database, setting the balance to $12
- Thread A closes the transaction
- Thread B writes to the database, setting the balance to $12
- Thread B closes the transaction
Your new balance is $12 when it should be $14. You’ve just lost $2, thanks to a race condition.
A way to fix the race condition is to run a straight SQL query instead:
UPDATE accounts
SET balance = balance + 2
WHERE id = 31415
and build it into the model with an update_all
method:
class Account < ActiveRecord::Base
def add_money!(value)
sql = self.class.sanitize_sql_array("balance = balance + ?”, value)
self.class.where(id: id).update_all("#{sql}, updated_at = now()")
reload
end
end
Now we have good news and bad news.
Good news: it fixed the race condition but created a new problem.
Bad news: the method update_all
circumvents ActiveRecord, so the paper-trail gem is unaware of the update. An inaccurate audit trail is worse than no audit trail.
More good news: there’s a simple solution.
History Tracking Directly in Postgres
If you’ve never worked with Postgres triggers, the code I’m about to show you might seem daunting, but I’m going to break it down into digestible pieces. The code performs the following steps:
- Executes whenever a record is added, updated, or deleted on a table
- Records the following data into a table called “archives”:
- Model id
- Old record’s values
- New record’s values
- Table name
- ActiveRecord model name
- Timestamp
- Makes sure that for every unique record (
table_name
,record_type
,model_id
), there is only one “most_recent
” snapshot at a time - The old_values and new_values are stored as a JSON type in Postgres, meaning that we don’t need to know anything about what columns are in the table whose values we’re archiving.
Some of the variable names below might look like magic, so here is a quick explanation of some otherwise magic-looking variables.
TG_ARGV |
An array of positional arguments passed to the function |
TG_OP |
The operation being performed “INSERT”, “UPDATE”, or “DELETE” |
TG_TABLE_NAME |
The name of the table on which the trigger is being invoked |
OLD |
The state of the record before the operation (immutable) |
NEW |
The new record which will be inserted (can be modified) |
now() |
Function returning the current timestamp |
Here’s the code:
CREATE FUNCTION make_archive_of_changes() RETURNS TRIGGER AS $$
-- Expects one argument, the record_type
-- It's the stringified ActiveRecord class name
-- For example 'User', or 'Account'
BEGIN
-- Previous snapshots should be marked as stale
-- This little denormalization trick is so that ActiveRecord
-- can immediately pull up the most recent snapshot without
-- having to sort through all the records by their timestamps
UPDATE archives
SET most_recent = FALSE
WHERE
table_name = TG_TABLE_NAME
AND most_recent = TRUE
AND record_type = record_type
AND record_id = (
CASE WHEN TG_OP = 'DELETE'
THEN OLD.id
ELSE NEW.id
END
);
IF TG_OP = 'INSERT' THEN
INSERT INTO archives (
table_name, record_type, record_id, operation, new_values, most_recent, recorded_at
)
VALUES (
TG_TABLE_NAME, TG_ARGV[0], NEW.id, TG_OP, row_to_json(NEW), TRUE, now()
);
RETURN NEW;
ELSIF TG_OP = 'UPDATE' THEN
INSERT INTO archives (
table_name, record_type, record_id, operation, new_values, old_values, most_recent, recorded_at
)
VALUES (
TG_TABLE_NAME, TG_ARGV[0], NEW.id, TG_OP, row_to_json(NEW), row_to_json(OLD), TRUE, now()
);
RETURN NEW;
ELSIF TG_OP = 'DELETE' THEN
INSERT INTO archives (
table_name, record_type, record_id, operation, old_values, most_recent, recorded_at
)
VALUES (
TG_TABLE_NAME, TG_ARGV[0], OLD.id, TG_OP, row_to_json(OLD), TRUE, now()
);
RETURN OLD;
END IF;
END;
$$;
Once we’ve defined the function once, we can attach it to as many tables as we want:
CREATE TRIGGER trg_make_archive_of_changes_for_accounts
AFTER INSERT OR DELETE OR UPDATE ON accounts
FOR EACH ROW EXECUTE FUNCTION make_archive_of_changes('Account');
CREATE TRIGGER trg_make_archive_of_changes_for_users
AFTER INSERT OR DELETE OR UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION make_archive_of_changes('User');
If we want to track changes for a new table, we just attach the trigger function to that table. No need to create new tables or write new functions. The next step is to make this easy to access inside of Ruby on Rails.
Polymorphic Associations in Rails
This next step is optional. We don’t necessarily need to be able to access the historical values from Rails, but it does make our debugging work a lot easier. There are often use cases where our business logic needs to know what was just changed.
Because we pass the name of the ActiveRecord class into the Postgres trigger, we’re able to set up a polymorphic association.
class Account < ActiveRecord::Base
# We can add these two lines to every Rails model whose history we track in Postgres
has_many :archives, -> { order(recorded_at: :desc) }, as: :record
has_one :most_recent_change, -> { where(most_recent: true) }, class_name: 'Archive', as: :record
end
class Archive < ActiveRecord::Base
belongs_to :record, polymorphic: true
def readonly?
true
end
def changed_fields
changed_values.keys
end
def changed_values
my_old_values = old_values || {}
my_new_values = new_values || {}
keys = my_old_values.keys.to_set | my_new_values.keys.to_set
changes = {}
keys.each do |key|
if my_old_values[key] != my_new_values[key]
changes[key] = [
my_old_values[key],
my_new_values[key]
]
end
end
changes
end
end
Now if I want to see the changes made to the Account
model, I can just type in:
puts account.most_recent_change.changed_values
and I’ll get back a hash with the changes:
{
'balance' => [ 13, 12 ],
'updated_at' => [
Tue, 19 Apr 2022 19:53:29.341834000 EDT -04:00,
Sat, 13 Aug 2022 08:40:29.341891000 EDT -04:00
]
}
Integrating Triggers with Rails Migrations
Behold: the power of “execute”
For this one you just have to roll up your sleeves and write some executable code. Rails migrations allow you to use the execute
command to run raw SQL commands during a migration.
You’ll also want to make sure that the migrations are reversible. Note that the order of operations in the down
method is the opposite of the order in the up
method.
class AttachTriggerToTables < ActiveRecord::Migration
def up
execute <<~SQL
CREATE TRIGGER trg_make_archive_of_changes_for_accounts
AFTER INSERT OR DELETE OR UPDATE ON accounts
FOR EACH ROW EXECUTE FUNCTION make_archive_of_changes('Account');
CREATE TRIGGER trg_make_archive_of_changes_for_users
AFTER INSERT OR DELETE OR UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION make_archive_of_changes('User');
SQL
end
def down
execute <<~SQL
DROP TRIGGER trg_make_archive_of_changes_for_users ON users;
DROP TRIGGER trg_make_archive_of_changes_for_accounts ON accounts;
DROP FUNCTION make_archive_of_changes;
SQL
end
end
Switching from db/schema.rb to db/structure.sql
Once you start to execute pure SQL in Rails migrations, the db/schema.rb
file will not be able to keep track of all the changes. When the schema is stored as a Ruby file, it only works for operations that work on all major database platforms. When we execute custom Postgres-only commands, Rails doesn’t know what to do. Instead we can use db/structure.sql
which is an exact copy of your database in Postgres. It requires a quick change to your configuration file, most likely found in config/application.rb
.
module YourApp
class Application < Rails::Application
# Leave the other configuration intact
# Add this line:
config.active_record.schema_format = :sql
end
end
One tip is that you will often get merge conflicts with the db/structure.sql
file, so a quick way to avoid that is to run the following command.
RAILS_ENV=test rails db:drop db:create db:migrate
This will rebuild your db/structure.sql
file. Never try to resolve merge conflicts manually in db/structure.sql
. This should always be an auto-generated file.
Testing Event Tracking
There’s a myth that it’s hard to test your database. In fact it’s quite easy. You can just write regular tests in Ruby. I’m going to use RSpec. The main thing that we want to test is that change tracking happens even when we circumvent ActiveRecord.
⚠ Make sure you’re comfortable writing tests before doing anything fancy with the database. I don’t normally follow a strict red-green-refactor pattern in my day-to-day work, but I certainly do when building custom logic in the database.
Basic test pattern:
describe Account, type: :model do
it 'tracks changes in Postgres' do
valid_parameters = { … }
model = Account.create!(**valid_parameters)
# Change one field directly via a SQL statement
ActiveRecord::Base.connection.execute <<~SQL
UPDATE "#{model.table_name}"
SET updated_at = now()
WHERE id = #{model.id}
SQL
# check that 2 archives exist, one for the model creation
# one for the model that was just updated
expect(model.archives.count).to eq(2)
changed_keys = model.most_recent_change.changed_values.keys
expect(changed_keys).to match_array(["updated_at"])
end
end
DRY it up with a custom matcher
One of the benefits of this pattern is that you can use it for a lot of tables over and over, so it would be nice if there were a one line custom matcher that would test for our most common use cases.
RSpec.describe Account, type: :model do
it "should track changes in postgres" do
valid_params = { … }
model = Account.create!(**valid_params)
expect(model).to track_changes_in_postgres
end
end
require 'rspec/expectations'
RSpec::Matchers.define :track_changes_in_postgres do
custom_matcher = PostgresChangeTrackingMatcher.new
match { |model| custom_matcher.matches?(model) }
failure_message { custom_matcher.failure_message }
failure_message_when_negated { custom_matcher.failure_message_when_negated }
end
class PostgresChangeTrackingMatcher
def matches?(model)
rails_tracking = track_changes_done_in_rails?(model)
postgres_tracking = track_changes_done_in_postgres?(model)
if rails_tracking && postgres_tracking
true
elsif !rails_tracking && !postgres_tracking
false
elsif rails_tracking && !postgres_tracking
raise RuntimeError.new <<~TXT
changes were tracked via rails but not via postgres
TXT
elsif !rails_tracking && postgres_tracking
raise RuntimeError.new <<~TXT
changes were tracked via postgres but not via rails
TXT
end
end
def track_changes_done_in_rails?(model)
model.created_at = 2.minutes.ago
model.updated_at = 1.minute.ago
model.save!
most_recent_change = Archive.find_by({
record_type: model.class.name,
record_id: model.id,
most_recent: true
})
keys_changed = most_recent_change && most_recent_change.changed_values.keys
if keys_changed == ['created_at', 'updated_at']
true
elsif keys_changed.nil?
false
else
raise RuntimeError.new <<~TXT
a change was recorded but an unexpected set of fields was changed: #{keys_changed}
TXT
end
end
def track_changes_done_in_postgres?(model)
ActiveRecord::Base.connection.execute <<~SQL
UPDATE "#{model.class.table_name}"
SET updated_at = now()
WHERE id = #{model.id}
SQL
most_recent_change = Archive.find_by({
record_type: model.class.name,
record_id: model.id,
most_recent: true
})
keys_changed = most_recent_change && most_recent_change.changed_values.keys
if keys_changed == ['updated_at']
true
elsif keys_changed.nil?
false
else
raise RuntimeError.new <<~TXT
a change was recorded but an unexpected set of fields was changed: #{keys_changed}
TXT
end
end
def failure_message
"failed to track changes in postgres"
end
def failure_message_when_negated
"model is not supposed to track changes in postgres but it's being tracked anyway"
end
end
One rule when you define a matcher: you should define expected behavior for both the matcher and its negation. For another model without tracking, we should be able to pass in the following matcher and have it pass.
RSpec.describe SomeOtherModelWithoutTracking, type: :model do
it "should NOT track changes in postgres" do
model = FactoryBot.create(:some_other_model_without_tracking)
expect(model).not_to track_changes_in_postgres
end
end
When we define the custom matcher, the match
block accepts one argument: the object being tested. It either returns true
meaning it matches, false
meaning it doesn’t match, or it throws an error. This is an important pattern. When we’re testing that a matcher matches, we often want multiple criteria all to come back true. When we’re testing that a matcher does NOT match, we expect multiple criteria all to come back false. If for any reason the criteria are inconsistent, i.e. some come back some true and some false, the test should fail.
I apply the same principle when I’m running smaller sub-tests inside of the matcher. For example I’m testing whether changes made through ActiveRecord are tracked. That’s either going to return true, return false, or raise an error. The same principle applies when I check that changes made through a raw SQL query are tracked.
The Payoff?
If you’ve made it this far in the tutorial, congratulations.
The bank example might seem contrived, but scammers have taken advantage of this vulnerability to steal money by making simultaneous withdrawals. Our client runs third-party background checks on people, and the person’s application can only be processed when all those third-party responses come back. If the final two responses come back at the same time, that person’s application might get stuck in pending forever.
Adding business logic to Postgres is a different mental model than most Rails developers are used to. It’s a lot to take in the first time, but once you get used to this pattern, it becomes easier to implement in the future.
For the next post in this series, we'll look at how to work with enum types in Postgres.
Learn more about how The Gnar builds Ruby on Rails applications.
Top comments (2)
Is there a schema for the archive model? It doesn't seem to be included in the article unless you infer it from the function.
Here you go: