If you're familiar with Rails, you might be reading this and wondering, "Why would I ever need to remove a database column from console? Shouldn't I be doing this through a proper Rails migration?" Well, you're right! This should probably be done safely through a Rails migration; however, we here at DEV found ourselves in a funny situation.
We had just merged a pull request that would add a few indices to our notifications table. Since that table is pretty large, we generally have to increase our database statement timeout to handle that migration. Sometimes it takes a while for a migration to complete, but when things go smoothly that's usually the only concern. This time around we decided to use PostgreSQL's concurrent index creation.
Normally when building an index, Postgres has to lock the table that's being indexed, preventing new writes. This probably means that no one would receive any new notifications for a few minutes. With concurrent index building, Postgres allows writes to come into the database while building the index. A consequence of using concurrency though is that building the index takes significantly longer.
Well, it took a while, that’s for sure. In fact, it took so long that it timed out, even with our preemptive increase to the database’s statement timeout. We increased the statement timeout, ran the migrations again, but this time, we got an unexpected error message:
ArgumentError: Index name 'index_notifications_on_user_notifiable_and_action_not_null' on table 'notifications' already exists
Hmm wait what? That's odd; that index shouldn't exist already because the migrations failed. Now, generally this is not a huge cause for concern, but the migrations still needs to complete. I offered one solution that might fix it:
heroku run rails db:migrate VERSION=20190531094609
# and then
heroku run rails db:migrate VERSION=20190531094926
That however went totally wrong though, and rolled back all migrations up until that point. Yikes. There was a missing keyword there: up
. It should have been this:
heroku run rails db:migrate:up VERSION=20190531094609
# and then
heroku run rails db:migrate:up VERSION=20190531094926
Thankfully the migrations up until the oldest failed version (20190531094609) were pretty much indices and one column change. Any other time though it probably would have been very bad. So, we scrambled to properly migrate the ones that were up in the first place, and then proceeded to resolve the two failing migrations.
After a few minutes of figuring out how to remove an index, @ben figured out how to do it via Rails console. After testing in development, the commands needed were:
ActiveRecord::Migration.remove_index :notifications, %i[user_id notifiable_id notifiable_type action]
ActiveRecord::Migration.remove_index :notifications, %i[user_id notifiable_id notifiable_type]
Both commands completed with a supposedly successful response, so we ran the migrations again. This time, we had an even larger statement timeout. And... it worked! A few performance hiccups and several hundred errors aside, all was pretty much back in place. While this was a bit of an emergency for us, it probably didn't need to be anything too overblown.
The key thing was figuring out how to remove the index properly (and not running incorrect commands 😅). This problem was specific, but it's a Rails-Postgres-specific problem and not one directly related to how we handle our business logic. Hopefully this post is helpful for anyone else with this sort of specific problem.
Top comments (3)
Oh hmm that's interesting! I'm more intrigued as to why you were hitting the statement timeouts, as I often (enough) add indexes to hot tables using the concurrent feature in postgres, and haven't ran into this before I don't think.
Do you guys use the 'disable_ddl_transaction' when doing the concurrent migrations, my understanding was that was necessary but maybe not!
Also not sure if this would have caught it or not, but Ive enjoyed using Strong Migrations to try to catch dangerous migrations early! Though you may have already been following that advice here lol
github.com/ankane/strong_migrations
Why not just remove it using SQL directly? That seems more safe to me.
We knew ActiveRecord syntax at the time and didn't see any problems with running it, so we went with that.