Alexey Osipenko for RITER.CO

Posted on Jul 30, 2018

SQL on Rails concept

#ruby #rails #sql #postgres

First of all, this article is not about how much I love Rails. And secondly, it is not about how I hate it. We can treat Rails quite differently and it will become better only if we change something. But also, Rails can get worse just if we try to make corrections to it. Well, anyway, you've been warned and I hope you got me.

One of the basic concepts of ActiveRecord is that the database is quite utilitarian in nature and can be easily modified. So, there you are sitting and writing your models for MySQL, and suddenly you find out somewhere that it's that simple to just up and replace MySQL with MongoDB. Well, not so drastically, but, say, you may have reasons to replace MySQL with PostgreSQL. Or vice versa, I have nothing against MySQL.

And here ActiveRecord claims to come in handy, since supposedly it's a piece of cake for it. After all, scopes, before/after filters and associations are abstract enough not to worry about generating queries on databases and focus just on the logic of the application. Instead of writing WHERE is_archived = TRUE, you can easily write where(is_archived: true) and ActiveRecord will do the rest for you.

But it's not really as simple as all that! In practice, it turns out that this layer of abstraction is full of gaps, like a broken trough from the tale of the Golden Fish. And that many basic features can not be used, like comparing dates or working with arrays. And we have got scopes with forced where("#{quoted_table_name}.finished_at >= ?", Date.current) or where("#{quoted_table_name}.other_ids <@ ARRAY[?]", ids). To which ActiveRecord gives a completely conscious and expected answer: do not use this. Instead of arrays, use habtm-association, and if you need to compare dates, accept this. And God forbid, you miss quoted_table_name in such a scope - the first includes or joins will put everything in its place. It's easier to add them wherever and whenever possible, so that you do not lose the skill.

And, of course, once you decide on such an interference in the work of ActiveRecord, there's no going back. Not only chances, but also a vague hope for a painless transition to another database will be gone. It will be much better to print this code out and burn it. And surely, there is no other reason not to use extra-database capabilities in your application. You're welcome to use and make everybody do!

And when it turns out that use of extra-opportunities is more than half of your scopes in the models folder, it will be quite obvious that ActiveRecord is just a convenient wrapper for integrating one labeled piece of code with another piece of code. And scopes like where(is_archived: true).joins(:sprint).merge(Sprint.archived) will work fine and their combining won't be much more difficult than cooking eggs, will it?

The next stage is denormalization. Of course, denormalization has always been and has not disappeared anywhere, but taking care of it was placed on mighty shoulders of Rails and ActiveRecord, and you know, that these two guys don't suffer from excessive lightness and ascetic in their resource requirements. Let's say, counter_cache: true is the first step to denormalization, after all, ActiveRecord won't let you just do COUNT(*) AS sprints_count so easily (I mean, you're not going to use select method, are you?). And counter_cache is far from ideal and in some cases there may be a desynchronization of the real quantity from the cached one. Not critical, of course, but unpleasant. And this is only the first candidate to settle in the database and not load the already burden head of the Ruby machine. Just a couple of triggers and it's done! When deleting and adding an entry to the A-table, you need to recalculate the number of records in the B-table and that's all, right? And, of course, you do the same when editing an entry if foreign_key has been changed, so as the request UPDATE B SET a_id = $1 WHERE id = $2 will break counter_cache down both for the old and new tables.

  CREATE OR REPLACE FUNCTION update_#{parent_table}_#{child_table}_counter_on_insert() RETURNS TRIGGER AS $$
  BEGIN
    UPDATE #{parent_table} SET #{counter_column} = COALESCE((SELECT COUNT(id) FROM #{child_table} GROUP BY #{foreign_column} HAVING #{foreign_column} = NEW.#{foreign_column}), 0) WHERE (#{parent_table}.id = NEW.#{foreign_column});
    RETURN NULL;
  END;
  $$ LANGUAGE plpgsql;

The next piece of work with a database will be related to the date-time. First, let the created_at and updated_at fields be serviced in the database itself, fortunately, it's much simpler. Primarily, let's set default values:

change_column_default :table_name, :created_at, -> { 'CURRENT_TIMESTAMP' }
change_column_default :table_name, :updated_at, -> { 'CURRENT_TIMESTAMP' }

And to do this everywhere at once, we can organize a cycle throughout all tables, where these fields are present. Except of the schema_migrations and ar_internal_metadata tables, certainly:

(tables - %w(schema_migrations ar_internal_metadata)).each { ... }

That's all, now the default values for these tables will be exactly the same as we need. And now it's time to make sure that Rails won't touch these fields. There is an option in the configuration of the framework, which is responsible for this:

Rails.application.config.active_record.record_timestamps = false

So, the next step is to change the updated_at field when a record is changed. That's simple:

CREATE OR REPLACE FUNCTION touch_for_#{table_name}_on_update() RETURNS TRIGGER AS $$
  BEGIN
    SELECT CURRENT_TIMESTAMP INTO NEW.updated_at;
    RETURN NEW;
  END;
  $$ LANGUAGE plpgsql;

Now we need to completely get rid of touch: true in models. This thing is very similar to the target in the shooting gallery - it's also completely leaky. And I won't even explain why, because you already know all these cases. This is not much more difficult, you just need to update updated_at where necessary:

CREATE OR REPLACE FUNCTION touch_for_#{table_name}_on_update() RETURNS TRIGGER AS $$
  BEGIN
    UPDATE foreign_table_name SET updated_at = CURRENT_TIMESTAMP WHERE id = NEW.foreign_column_name;
    SELECT CURRENT_TIMESTAMP INTO NEW.updated_at;
    RETURN NEW;
  END;
  $$ LANGUAGE plpgsql;

Of course, the sequence of calls of such triggers will cause extra actions, but Postgres doesn't provide any sane mechanism to call triggers without changing the record itself. You could try to write SET title = title but this is hardly ever better than SET updated_at = CURRENT_TIMESTAMP.

Exactly the same trigger serves for insertion, just update_at is not necessary:

CREATE OR REPLACE FUNCTION touch_for_#{table_name}_on_insert() RETURNS TRIGGER AS $$
  BEGIN
    UPDATE foreign_table_name SET updated_at = CURRENT_TIMESTAMP WHERE id = NEW.foreign_column_name;
    RETURN NEW;
  END;
  $$ LANGUAGE plpgsql;

Certainly, you could try to write this with one function by adding a check on the current event directly into the trigger like IF TG_OP = 'UPDATE' THEN, but it is preferable to make all triggers as simple as possible, in order to reduce the probability of error.

You might want to somehow automate the generation of such triggers, and then you will most likely need to find all foreign relations between the current table and the rest ones. Here you can easily do that with this request:

SELECT ccu.table_name AS foreign_table_name, kcu.column_name AS column_name
  FROM information_schema.table_constraints AS tc
    JOIN information_schema.key_column_usage AS kcu
    ON tc.constraint_name = kcu.constraint_name
    JOIN information_schema.constraint_column_usage AS ccu
    ON ccu.constraint_name = tc.constraint_name
  WHERE constraint_type = 'FOREIGN KEY' AND tc.table_name = '#{table_name}'
  ORDER BY ccu.table_name;

One more very useful tip. Name all triggers from a template to be able to verify the presence or absence of a required trigger with a single request. For example, this query will find all touch-insert triggers:

SELECT routine_name AS name
  FROM information_schema.routines
  WHERE
    routine_name LIKE 'touch_for_%_on_insert' AND
    routine_type ='FUNCTION' AND
    specific_schema='public';

And the last thing left is the most terrible. The fact is that Rails is not designed for at least a marginally smart database and it definitely doesn't care that something can be changed there, except ID-field, and even then only when inserting data into the table. Therefore, there is no sane way to add RETURNING id, updated_at to update-queries, you will need to dive headlong into Rails thicket.

Monkey patch turned out not so much neat, but primarily the goal was to minimize contravention of the current work of the framework as much as possible. Here's how it looks in the end:

module ActiveRecord
  module Persistence
    # https://github.com/rails/rails/blob/v5.2.0/activerecord/lib/active_record/persistence.rb#L729-L741
    def _create_record(attribute_names = self.attribute_names)
      attribute_names &= self.class.column_names
      attributes_values = attributes_with_values_for_create(attribute_names)

      an_id, *affected_rows = self.class._insert_record(attributes_values).dup
      self.id ||= an_id if self.class.primary_key
      Hash[ApplicationRecord.custom_returning_columns(self.class.quoted_table_name, :create).take(affected_rows.size).zip(affected_rows)].each do |column_name, value|
        public_send("#{column_name}=", self.class.attribute_types[column_name.to_s].deserialize(value)) if value
      end

      @new_record = false

      yield(self) if block_given?

      id
    end
    private :_create_record

    # https://github.com/rails/rails/blob/v5.2.0/activerecord/lib/active_record/persistence.rb#L710-L725
    def _update_record(attribute_names = self.attribute_names)
      attribute_names &= self.class.column_names
      attribute_names = attributes_for_update(attribute_names)

      if attribute_names.empty?
        affected_rows = []
        @_trigger_update_callback = true
      else
        affected_rows = _update_row(attribute_names)
        @_trigger_update_callback = affected_rows.any?
      end

      Hash[ApplicationRecord.custom_returning_columns(self.class.quoted_table_name, :update).take(affected_rows.size).zip(affected_rows)].each do |column_name, value|
        public_send("#{column_name}=", self.class.attribute_types[column_name.to_s].deserialize(value))
      end

      yield(self) if block_given?

      affected_rows.none? ? 0 : 1
    end
    private :_update_record
  end

  module ConnectionAdapters
    module PostgreSQL
      module DatabaseStatements
        # https://github.com/rails/rails/blob/v5.2.0/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb#L93-L96
        def exec_update(sql, name = nil, binds = [])
          execute_and_clear(sql_with_returning(sql), name, binds) { |result| Array.wrap(result.values.first) }
        end

        # https://github.com/rails/rails/blob/v5.2.0/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb#L147-L152
        def insert(arel, name = nil, pk = nil, _id_value = nil, sequence_name = nil, binds = [])
          sql, binds = to_sql_and_binds(arel, binds)
          exec_insert(sql, name, binds, pk, sequence_name).rows.first
        end
        alias create insert

        # https://github.com/rails/rails/blob/v5.2.0/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb#L98-L111
        def sql_for_insert(sql, pk, id_value, sequence_name, binds) # :nodoc:
          table_ref = extract_table_ref_from_insert_sql(sql)
          if pk.nil?
            # Extract the table from the insert sql. Yuck.
            pk = primary_key(table_ref) if table_ref
          end

          returning_columns = quote_returning_column_names(table_ref, pk, :create)
          if returning_columns.any?
            sql = "#{sql} RETURNING #{returning_columns.join(', ')}"
          end

          super
        end

        # No source in original repo
        def quote_returning_column_names(table_ref, pk, action)
          returning_columns = []
          returning_columns << pk if suppress_composite_primary_key(pk)
          returning_columns += ApplicationRecord.custom_returning_columns(table_ref, action)
          returning_columns.map { |column| quote_column_name(column) }
        end

        # No source in original repo
        def sql_with_returning(sql)
          table_ref = extract_table_ref_from_update_sql(sql)

          returning_columns = quote_returning_column_names(table_ref, nil, :update)

          return sql if returning_columns.blank?
          "#{sql} RETURNING #{returning_columns.join(', ')}"
        end

        # No source in original repo
        def extract_table_ref_from_update_sql(sql)
          sql[/update\s("[A-Za-z0-9_."\[\]\s]+"|[A-Za-z0-9_."\[\]]+)\s*set/im]
          Regexp.last_match(1)&.strip
        end
      end
    end
  end
end

The most important thing is that here we turn to ApplicationRecord.custom_returning_columns to find out which columns, besides id, we are interested in. And this method looks something like this:

class << self
    def custom_returning_columns(table_ref, action)
      return [] if ['"schema_migrations"', '"ar_internal_metadata"'].include?(table_ref)

      res = []
      res << :created_at if action == :create
      res << :updated_at

      res += case table_ref
             when '"user_applications"'
               [:api_token]
             when '"users"'
               [:session_salt, :password_changed_at]
             # ...
             else
               []
             end

      res
    end
  end

All examples are given for PostgreSQL, not for MySQL, so MySQL followers will have to reinvent their own wheel.

Instead of conclusions, we could say that now the aching Rails's head will hurt a little less. Such routine processes as counter_cache and touch are going to sink into oblivion, and the next article will be focused on something more global, like removing hanging spaces, validating data, cascade data deletion, or paranoid removal and rspec for sql. If this article proves to be interesting, of course.

DEV Community

SQL on Rails concept

Top comments (0)

Read next

Handbook to migrate your Postgres from Heroku to Kamal

Statement Level Trigger vs Row Level Trigger

Cosine Similarity Search on Vectors in Postgres with pgvector

Aurora Limitless - Sharded Table