rhymes

Posted on Oct 9, 2018 • Edited on Mar 16, 2022 • Originally published at rhymes.dev

Come with me on a journey through this website's source code by way of a bug

#rails #webdev #opensource

This article started because I thought that researching a particular bug could be useful to understand devto's source code a little better. So I literally wrote this while doing the research and I ended up writing about other things as well. Here we go.

The bug I'm referring to is called Timeouts when deleting a post on GitHub. As the title implies, removing a post results in a server timeout which results in an error to the user. @peter in his bug report added a couple of details we need to keep present for our "investigation": this bug doesn't always happens (which would be better in the context of finding a solution, deterministic is always better) and it more likely presents itself with articles that have many reactions and comments.

First clues: it happens sometimes and usually with articles with a lot of data attached.

Let's see if we can dig up more information before diving into the code.

A note: I'm writing this post to explain (and expand) the way I researched this while it happened at the same time (well, over the course of multiple days but still at the same time :-D), so all discoveries were new to me when I wrote about them as they are to you if you read this.

Another note: I'm going to use the terms "async" and "out of process" interchangeably here. Async in this context means "the user doesn't wait for the call to be executed" not "async" as in JavaScript. A better term should be "out of process" because these asynchronous calls are executed by an external process through a queue on the database with a library/gem called delayed job.

Referential integrity

ActiveRecord (Rails's ORM), like many other object relational mappers, is an object layer that sits on top of a relational database system. Let's take a little detour and talk a little about a fundamental feature to preserve data meaningfulness in database systems: referential integrity. Why not, the bug can wait!

Referential integrity, to simplify a lot, is a defense against developers with weird ideas on how to structure their relational data. It forbids insertion of rows that have no correspondence in the primary table of the relationship. In layman terms it guarantees that there is a correspondent row in the relationship: if you have a table with a list of 10 cities, you shouldn't have a customer whose address belongs to an unknown city. Funnily enough it took more than a decade for MySQL to activate referential integrity by default, while PostgreSQL had it for 10 years already, at the time. Sometimes I think that MySQL in its early incarnations was a giant collection of CSV files with SQL on top. I'm joking, maybe.

With referential integrity in place you can rest (mostly) assured that the database won't let you add zombie rows, will keep the relationship updated and will clean up after you if you tell it to.

How do you instruct the database to do all of these things? It's quite simple. I'll use an example from PostgreSQL 10 documentation:

CREATE TABLE products (
    product_no integer PRIMARY KEY,
    name text,
    price numeric
);

CREATE TABLE orders (
    order_id integer PRIMARY KEY,
    shipping_address text,
);

CREATE TABLE order_items (
    product_no integer REFERENCES products ON DELETE RESTRICT,
    order_id integer REFERENCES orders ON DELETE CASCADE,
    quantity integer,
    PRIMARY KEY (product_no, order_id)
);

The table order_items has two foreign keys, one towards orders and another that points to products (a classic example of many-to-many in case you're wondering).

Wnen you design tables like this you should ask yourself the following questions (in addition to the obvious ones like "what am I really doing with this data?"):

what happens if a row in the primary table is deleted?
do I want to delete all the related rows?
do I want to set the referencing column to NULL ? in that case what does it mean for my business logic? does NULL even make sense for my data?
do I want to set the column to its default value? what does it mean for my business logic? does this column even have a default value?

If you look back at the example what we're telling the database are the following two things:

products cannot be removed, unless they do not appear in any order
orders can be be removed at all times, and they take the items with them to the grave 😀

Keep in mind that removal in this context is still a fast operation, even in a context like dev.to's if an article were to have tables linked with a cascade directive, it should still be a fast operation. DBs tend to become slow when a single DELETE triggers millions (or tens of millions) of other removals. I assume this is not the case (yet or in the future) but since the point of this whole section is to expand our knowledge about referential integrity and not to actually investigate the bug, let's keep on digging.

Next we open the console and check if the tables are linked to each other, using psql :

$ rails dbconsole
psql (10.5)
Type "help" for help.

PracticalDeveloper_development=# \d+ articles
...
Indexes:
    "articles_pkey" PRIMARY KEY, btree (id)
    "index_articles_on_boost_states" gin (boost_states)
    "index_articles_on_featured_number" btree (featured_number)
    "index_articles_on_hotness_score" btree (hotness_score)
    "index_articles_on_published_at" btree (published_at)
    "index_articles_on_slug" btree (slug)
    "index_articles_on_user_id" btree (user_id)

This table has a primary key, a few indexes but apparently no foreign key constraints (the indicators for referential integrity). Compare it with a table that has both:

PracticalDeveloper_development=# \d users
...
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
    "index_users_on_confirmation_token" UNIQUE, btree (confirmation_token)
    "index_users_on_reset_password_token" UNIQUE, btree (reset_password_token)
    "index_users_on_username" UNIQUE, btree (username)
    "index_users_on_language_settings" gin (language_settings)
    "index_users_on_organization_id" btree (organization_id)
Referenced by:
    TABLE "messages" CONSTRAINT "fk_rails_273a25a7a6" FOREIGN KEY (user_id) REFERENCES users(id)
    TABLE "badge_achievements" CONSTRAINT "fk_rails_4a2e48ca67" FOREIGN KEY (user_id) REFERENCES users(id)
    TABLE "chat_channel_memberships" CONSTRAINT "fk_rails_4ba367990a" FOREIGN KEY (user_id) REFERENCES users(id)
    TABLE "push_notification_subscriptions" CONSTRAINT "fk_rails_c0b1e39717" FOREIGN KEY (user_id) REFERENCES users(id)

This is what we learned so far: why referential integrity can come into play when you remove rows from a DB and that the articles table has no apparent relationships with any other table at the database level. But is this true in the web app? Let's move up one layer, diving into the Ruby code.

ps. in the case of Rails (don't remember since which version) you can also see which foreign keys you have defined by looking in the schema.rb file.

ActiveRecord, associations and callbacks

Now that we know what referential integrity is, how to identify it and now that we know it's not at play in this bug we can move up a layer and check how is the Article object defined (I'll skip stuff that I think it's not related to this article and the bug itself, altough I might be wrong because I don't know the code base well):

class Article < ApplicationRecord
  # ...

  has_many :comments,       as: :commentable
  has_many :buffer_updates
  has_many :reactions,      as: :reactable, dependent: :destroy
  has_many  :notifications, as: :notifiable

  # ...

  before_destroy    :before_destroy_actions

  # ...

  def before_destroy_actions
    bust_cache
    remove_algolia_index
    reactions.destroy_all
    user.delay.resave_articles
    organization&.delay&.resave_articles
  end
end

A bunch of new information from that piece of code:

Rails (but not the DB) knows that an article can have many comments, buffer updates, reactions and notifications (these are called "associations" in Rails lingo)
Reactions are explictly dependent on the articles and they will be destroyed if the article is removed
There's a callback that does a bunch of stuff (we'll explore it later) before the object and its row in the database are destroyed
Three out of four associations are the able type, Rails calls these polymorphic associations because they allow the programmer to associate multiple types of objects to the same row, using two different columns (a string with the name of the model type the object belongs to and an id). They are very handy, though I always felt they make the database very dependent on the domain model (set by Rails). They can also require a composite index in the associated table to speed up queries

Similarly to what the underlying database system can do, ActiveRecord allows the developer to specify what happens to the related objects when the primary one is destroyed. According to the documentation Rails supports: destroying all related objects, deleting all related objects, setting the foreign key to NULL or restricting the removal with an error. The difference between destroy and delete is that in the former case all related callbacks are executed prior to removal, in the latter one the callbacks are skipped and only the row in the DB is removed.

The default strategy for the relationships without a dependent is to do nothing, which means leaving the referenced rows there in place. If it were up to me the default would be the app doesn't start until you decided what to do with the linked models but I'm not the person who designed ActiveRecord.

Keep in mind that the database trumps the code, if you define nothing at the Rails level but the database is configured to automatically destroy all related rows, then the rows will be destroyed. This is one of the many reasons why it's worth taking the time to learn how the DB works :-)

The last bit we haven't talked about the model layer is the callback which is probably where the bug manifests itself.

The infamous callback

This before destroy callback will execute prior to issuing the DELETE statement to the DB:

def before_destroy_actions
  bust_cache
  remove_algolia_index
  reactions.destroy_all
  user.delay.resave_articles
  organization&.delay&.resave_articles
end

Cache busting

The first thing the callback does is call the method bust_cache which in turn calls sequentially six times the Fastly API to purge the article's cache (each call to bust is two HTTP calls). It also does a cospicuos number of out of process calls to the same API (around 20-50, depends on the status of the article and the number of tags) but these don't matter because the user won't wait for them.

One thing to annotate: six HTTP calls are always going out after you press the button to delete an article.

Index removal

dev.to uses Algolia for search, the call remove_algolia_index does the following:

calls algolia_remove_from_index! which in turns calls the "async" version of the Algolia HTTP API which in reality does a (fast) synchronous call to Algolia without waiting for the index to be cleared on their side. It's still a synchronous call subject adding to the user's latency
calls Algolia's HTTP API other two times for other indexes

So, adding the previous 6 HTTP calls for Fastly, we're at 9 APIs called in process

Reactions destruction

The third step is reactions.destroy_all which as the call implies destroys all the reactions to the article. In Rails destroy_all simply iterates on all the objects and calls destroy on each of them which in turn activate all the "destroy" callbacks for proper cleanup. The Reaction model has two before_destroy callbacks:

class Reaction < ApplicationRecord
  # ...

  before_destroy :update_reactable_without_delay
  before_destroy :clean_up_before_destroy

  # ...
end

I had to dig a little bit to find out what the first one does (one of the things I dislike about the Rails way of doing things are the magical methods popping up everywhere, they make refactoring harder and they encourage coupling between the model and all the various gems). update_reactable_without_delay calls update_reactable (which has been declared as an async function by default) bypassing the queue. The result is a standard inline call the user waits for.

update_reactable recalculates (this time out of process), the scores of the Article (a thing that should probably be avoided since the Article is up for removal) if the article has been published. Then (back inline) it reindexes the article (twice) calling Algolia, removes the reactions from Fastly's cache (each call to bust the cache is two Fastly's calls), busts another cache (two more HTTP calls) and possibly updates a column on the Article (which is probably not needed since it's going to be removed). The total is 6 HTTP calls: one async HTTP calls (the first one to Algolia), one other call to Algolia and four to Fastly. Let's annotate down the 5 the user has to wait for.
clean_up_before_destroy reindexes the article on Algolia (a third time).

Let's sum up: a removal of a reaction amounts to 6 HTTP calls. If the article has a 100 reactions... well you can do the math.

Let's say the article had 1 reaction, plus the calls tallied before we're at around 15 HTTP calls:

6 to bust the cache of the article
3 to remove the article from the index
6 for the reaction attached to the article

There's an additional bonus HTTP call that I've identified by chance using a gist to debug net/http calls, it calls the Stream.io API to delete the reaction from the user's feed. A total of 16 HTTP calls.

This is what happens when a reaction is destroyed (I added the awesome gem httplog to my local installation):

[httplog] Sending: PUT https://REDACTED.algolia.net/1/indexes/Article_development/25
[httplog] Data: {"title":" The Curious Incident of the Dog in the Night-Time"}
[httplog] Connecting: REDACTED.algolia.net:443
[httplog] Status: 200
[httplog] Benchmark: 0.357128 seconds
[httplog] Response:
{"updatedAt":"2018-10-09T17:23:44.151Z","taskID":945887592,"objectID":"25"}

[httplog] Sending: PUT https://REDACTED.algolia.net/1/indexes/searchables_development/articles-25
[httplog] Data: {"title":" The Curious Incident of the Dog in the Night-Time","tag_list":["discuss","security","python","beginners"],"main_image":"https://pigment.github.io/fake-logos/logos/medium/color/8.png","id":25,"featured":true,"published":true,"published_at":"2018-09-30T07:44:48.530Z","featured_number":1538293488,"comments_count":1,"reactions_count":0,"positive_reactions_count":0,"path":"/willricki/-the-curious-incident-of-the-dog-in-the-night-time-3e4b","class_name":"Article","user_name":"Ricki Will","user_username":"willricki","comments_blob":"Waistcoat craft beer pickled vice seitan kombucha drinking. 90's green juice hoodie.","body_text":"\n\nMeggings tattooed normcore kitsch chia. Fixie migas etsy hashtag jean shorts neutra pork belly. Vice salvia biodiesel portland actually slow-carb loko chia. Freegan biodiesel flexitarian tattooed.\n\n\nNeque. \n\n\nBefore they sold out diy xoxo aesthetic biodiesel pbr\u0026amp;b. Tumblr lo-fi craft beer listicle. Lo-fi church-key cold-pressed.\n\n\n","tag_keywords_for_search":"","search_score":153832,"readable_publish_date":"Sep 30","flare_tag":{"name":"discuss","bg_color_hex":null,"text_color_hex":null},"user":{"username":"willricki","name":"Ricki Will","profile_image_90":"/uploads/user/profile_image/6/22018b1a-7afa-47c1-bbae-b829977828e4.png"},"_tags":["discuss","security","python","beginners","user_6","username_willricki","lang_en"]}
[httplog] Status: 200
[httplog] Benchmark: 0.031995 seconds
[httplog] Response:
{"updatedAt":"2018-10-09T17:23:44.426Z","taskID":945887612,"objectID":"articles-25"}

[httplog] Sending: PUT https://REDACTED.algolia.net/1/indexes/ordered_articles_development/articles-25
[httplog] Data: {"title":" The Curious Incident of the Dog in the Night-Time","path":"/willricki/-the-curious-incident-of-the-dog-in-the-night-time-3e4b","class_name":"Article","comments_count":1,"tag_list":["discuss","security","python","beginners"],"positive_reactions_count":0,"id":25,"hotness_score":153829,"readable_publish_date":"Sep 30","flare_tag":{"name":"discuss","bg_color_hex":null,"text_color_hex":null},"published_at_int":1538293488,"user":{"username":"willricki","name":"Ricki Will","profile_image_90":"/uploads/user/profile_image/6/22018b1a-7afa-47c1-bbae-b829977828e4.png"},"_tags":["discuss","security","python","beginners","user_6","username_willricki","lang_en"]}
[httplog] Status: 200
[httplog] Benchmark: 0.047077 seconds
[httplog] Response:
{"updatedAt":"2018-10-09T17:23:44.494Z","taskID":945887622,"objectID":"articles-25"}

[httplog] Sending: PUT https://REDACTED.algolia.net/1/indexes/Article_development/25
[httplog] Data: {"title":" The Curious Incident of the Dog in the Night-Time"}
[httplog] Status: 200
[httplog] Benchmark: 0.029352 seconds
[httplog] Response:
{"updatedAt":"2018-10-09T17:23:44.541Z","taskID":945887632,"objectID":"25"}

[httplog] Sending: PUT https://REDACTED.algolia.net/1/indexes/searchables_development/articles-25
[httplog] Data: {"title":" The Curious Incident of the Dog in the Night-Time","tag_list":["discuss","security","python","beginners"],"main_image":"https://pigment.github.io/fake-logos/logos/medium/color/8.png","id":25,"featured":true,"published":true,"published_at":"2018-09-30T07:44:48.530Z","featured_number":1538293488,"comments_count":1,"reactions_count":0,"positive_reactions_count":1,"path":"/willricki/-the-curious-incident-of-the-dog-in-the-night-time-3e4b","class_name":"Article","user_name":"Ricki Will","user_username":"willricki","comments_blob":"Waistcoat craft beer pickled vice seitan kombucha drinking. 90's green juice hoodie.","body_text":"\n\nMeggings tattooed normcore kitsch chia. Fixie migas etsy hashtag jean shorts neutra pork belly. Vice salvia biodiesel portland actually slow-carb loko chia. Freegan biodiesel flexitarian tattooed.\n\n\nNeque. \n\n\nBefore they sold out diy xoxo aesthetic biodiesel pbr\u0026amp;b. Tumblr lo-fi craft beer listicle. Lo-fi church-key cold-pressed.\n\n\n","tag_keywords_for_search":"","search_score":154132,"readable_publish_date":"Sep 30","flare_tag":{"name":"discuss","bg_color_hex":null,"text_color_hex":null},"user":{"username":"willricki","name":"Ricki Will","profile_image_90":"/uploads/user/profile_image/6/22018b1a-7afa-47c1-bbae-b829977828e4.png"},"_tags":["discuss","security","python","beginners","user_6","username_willricki","lang_en"]}
[httplog] Status: 200
[httplog] Benchmark: 0.028819 seconds
[httplog] Response:
{"updatedAt":"2018-10-09T17:23:44.612Z","taskID":945887642,"objectID":"articles-25"}

[httplog] Sending: PUT https://REDACTED.algolia.net/1/indexes/ordered_articles_development/articles-25
[httplog] Data: {"title":" The Curious Incident of the Dog in the Night-Time","path":"/willricki/-the-curious-incident-of-the-dog-in-the-night-time-3e4b","class_name":"Article","comments_count":1,"tag_list":["discuss","security","python","beginners"],"positive_reactions_count":1,"id":25,"hotness_score":153829,"readable_publish_date":"Sep 30","flare_tag":{"name":"discuss","bg_color_hex":null,"text_color_hex":null},"published_at_int":1538293488,"user":{"username":"willricki","name":"Ricki Will","profile_image_90":"/uploads/user/profile_image/6/22018b1a-7afa-47c1-bbae-b829977828e4.png"},"_tags":["discuss","security","python","beginners","user_6","username_willricki","lang_en"]}
[httplog] Status: 200
[httplog] Benchmark: 0.02821 seconds
[httplog] Response:
{"updatedAt":"2018-10-09T17:23:44.652Z","taskID":945887652,"objectID":"articles-25"}

[httplog] Connecting: us-east-api.stream-io-api.com:443
[httplog] Sending: DELETE http://us-east-api.stream-io-api.com:443/api/v1.0/feed/user/10/Reaction:7/?api_key=REDACTED&foreign_id=1
[httplog] Data:
[httplog] Status: 200
[httplog] Benchmark: 0.336152 seconds
[httplog] Response:
{"removed":"Reaction:7","duration":"17.84ms"}

If you countn them they are 7, not 16. That's because the calls to Fastly are only executed in production.

Resaving articles

User.resave_articles refreshes the user's other articles and is called out of process so it's not interesting to us right now. The same happens to the organization if the article is part of one but again, so we don't care.

Let's recap what we know so far. Each article removal triggers a callback that does a lot of things that touch third party services that help this website be as fast as it is and it also updates various counters I didn't really investigate :-D.

What happens when the article is removed

After the callback has been dealt with and all the various caches are up to date and the reactions are gone from the database, we still need to check what happens to the other associations of the article we're removing. As you recall every article can possibly have comments, reactions (gone by now), buffer updates (not sure what they are) and notifications.

Let's see what happens when we destroy an article to see if we can get other clues. I replaced a long log with my summaries:

> art = Article.last # article id 25
> art.destroy!
# its tags are destroyed, this is handled by "acts_as_taggable_on :tags"...
# a bunch of other tag related stuff happens, 17 select calls...
# the aforamentioned HTTP calls for each reaction are here too...
# there's a SQL DELETE for each reaction...
# the user object is updated...
# a couple of other UPDATEs I didn't investigate but which seem really plausible...
# the HTTP calls to remove the article itself from search...
# the article is finally deleted from the db...
# the article count for the user is updated

Aside from the fact that in my initial overview I totally forgot about the destruction of the tags (they amount to a DELETE and an UPDATE each to the database) I would say there's a lot going on when an article is removed.

What happens to the rest of the objects we didn't find in the console?

If you remember from what I said earlier, in Rails relationships everything not marked explicitly as "dependent" survives the destruction of the primary object, so they all are in the DB:

PracticalDeveloper_development=# select count(*) from comments where commentable_id = 25;
 count
-------
     3
PracticalDeveloper_development=# select count(*) from notifications where notifiable_id = 25;
 count
-------
     2
PracticalDeveloper_development=# select count(*) from buffer_updates where article_id = 25;
 count
-------
     1

I think we can be a little bit confident that the issue that sparked this article is likely to manifest if such article is really popular before being removed having many reactions, comments and notifications.

Timeout

Another factor that I mentioned in a comment to the issue is the Heroku's default timeout setting. dev.to IIRC runs on Heroku, which has a 30 seconds timeout for HTTP calls once the router has processed them (so it's a timer for your app code). If the app doesn't respond in 30 seconds, it timeouts and sends an error.

dev.to, savvily, cuts this timeout in half using rack timeout default service timeout which is 15 seconds.

In brief: if after hitting the "remove article" button the server doesn't finish in 15 seconds, a timeout error is raised. Having seen that a popular article can possibly trigger dozens of HTTP calls, you can understand why in some cases the 15 seconds wall can be hit.

Recap

Let's recap what we learned so far about what happens when an article is removed:

referential integrity can be a factor if the article has millions of related rows (unlikely in this scenario)
Rails removing associated objects sequentially is a factor (considering that it also has to load such objects from the DB to the ORM before removing them because it has to if it wants to trigger the various callbacks)
Inline callbacks and HTTP calls are another factor
Rails is not smart at all because it could decrease the amount of calls to the DB (for example by buffering the DELETE statements for all reactions and using a IN clause)
Rails magic is sometimes annoying 😛

Possible solutions

This is where I stop for now because I'm not familiar with the code base (well, after this definitely more :D) and because I think it could be an interesting "collective" exercise since it's not a critical bug that needs to be fixed "yesterday".

At first the simplest solution that could pop up in a one's mind is to move everything that happens inline when an article is removed in a out of process call by delegating everything to a job that is going to be picked up by the queue manager. The user just needs the article gone from their view after all. The proper removal can happen with a worker process. Aside from the fact that I'm not sure I considered everything that's going on (I found out about tags by chance as you saw) and all the implications, I think this is just a quick win. It would fix the user's problem by swiping the reported issue under the rug.

Another possible solution is to split the removal in its two main parts: the caches need to be updated or emptied and the rows need to be removed from the DB. The caches can all be destroyed out of process so the user doesn't have to wait for Fastly or Algolia (maybe only Stream.io? I don't know). This requires a bit of refactoring because some of the code I talked about is also used by other parts of the app.

A more complete solution is to go a step further from the second solution and also clean up all the leftovers (comments, notifications and buffer updates) but there might be a reason why they are left there in the first place. All these three entities can be removed in a separate job because two out of three have before destroy callbacks which trigger other stuff I haven't looked at.

This should definitely be enough for the user to never encounter again the pesky timeout error. To go an extra mile we could also look into the fact ActiveRecord issues a single DELETE for each object it removes from the database but this is definitely too much for now. I would annotate this somewhere and come back to it after the refactoring if needed.

Conclusions

If you are still with me, thank you. It took me quite a while to write this :-D

I don't have any mighty conclusions. I hope this deep dive in dev.to's source code served at least a purpose. For me it has been a great way to learn a bit more and write about something that non-Rails developers in here might not know but more importantly to help potential contributors.

I'm definitely hoping from some feedback ;-)

Top comments (14)

Dian Fay • Oct 10 '18

O/RM methods like destroy_all really need to be renamed destroy_slowly_and_painfully_use_sql_if_you_can, I swear. This isn't the first time I've seen something like this. Batching ids to delete is one option; if you can just run a script or prepared statement to delete reactions by the article/post id that's even better since the engine can use an index on the foreign key (although the difference may be negligible at the scales we're talking about currently). If the reaction deletion from users' feeds is in-band it probably doesn't need to be, and managing the cache of reactions to a dead article is obviously superfluous. No idea why it's saving other articles since there isn't any metrics aggregation but that's already out of band like you mentioned.

rhymes • Oct 10 '18

O/RM methods like destroy_all really need to be renamed destroy_slowly_and_painfully_use_sql_if_you_can, I swear.

Ahahah, yeah pretty much.

This isn't the first time I've seen something like this

Nope, definitely not. A lot of ORM like AR encourage their users to surrender to all possible callbacks and many of the gems used by Rails applications hook into those callbacks without the developer knowing so after a while it's not uncommon to be hit by this issue.

Batching ids to delete is one option; if you can just run a script or prepared statement to delete reactions by the article/post id that's even better since the engine can use an index on the foreign key (although the difference may be negligible at the scales we're talking about currently)

Yeah, a post is never going to have millions of reactions, at least not for a while. By the way there's no foreign key between the two tables (articles and reactions) so no index to take advantage of.

Dian Fay • Oct 10 '18

You could add an index without the proper foreign key constraint, or vice versa, but without looking at the data model my guess is that having both is probably best in this scenario.

rhymes • Oct 10 '18

You can't easily add a foreign key between reactions and articles because of the "polymorphic" nature of these relation. A reaction can be attached to a comment, to an article and other stuff I guess. So what Rails does it setup a pair of fields called reactable_id and reactable_type like this:

PracticalDeveloper_development=# \d reactions
                                          Table "public.reactions"
     Column     |            Type             | Collation | Nullable |                Default
----------------+-----------------------------+-----------+----------+---------------------------------------
 id             | integer                     |           | not null | nextval('reactions_id_seq'::regclass)
 category       | character varying           |           |          |
 created_at     | timestamp without time zone |           | not null |
 points         | double precision            |           |          | 1.0
 reactable_id   | integer                     |           |          |
 reactable_type | character varying           |           |          |
 updated_at     | timestamp without time zone |           | not null |
 user_id        | integer                     |           |          |
Indexes:
    "reactions_pkey" PRIMARY KEY, btree (id)
    "index_reactions_on_category" btree (category)
    "index_reactions_on_reactable_id" btree (reactable_id)
    "index_reactions_on_reactable_type" btree (reactable_type)
    "index_reactions_on_user_id" btree (user_id)

On top of those it mounts what it calles polymorphic associations. The gist is that every select triggered by something like article.reactions becomes a SELECT * FROM reactions WHERE reactable_type = 'Article' AND reactable_id = 1234, same for the other queries

Dian Fay • Oct 10 '18

Gross. The site runs on Postgres though iirc, so you could establish a foreign key relationship between reactions and reactables as a parent table extended by articles, posts, and so on. But I don't know if ActiveRecord would play nicely with that kind of specialized structure.

rhymes • Oct 10 '18

No you can't use Postgresql inheritance in Rails, not easily.

There's a concept of single table inheritance but it's all virtual. The various entities share the same table and are distinguished by a type column. You can find the explanation here medium.com/@dcordz/single-table-in... - I've used it once and it was used to map entities with differed only in name (sensors with a uniform API)

Dian Fay • Oct 10 '18

Another option is a complex junction table with reaction_id, article_id, post_id, and a CHECK constraint ensuring that only one reactable foreign key can have a value. But that does add enough complexity to at least give me pause.

The indexing in the current model could still be improved since reactable_id is unreliable on its own and reactable_type is low-cardinality. A single index on (reactable_type, reactable_id) would be much more useful (including for bulk delete!).

rhymes • Oct 10 '18

Yeah, I think the index on both keys is a good comprise

Dian Fay • Oct 10 '18

It might be nice longterm to merge reactables if they're similar enough. Articles are almost exactly posts without parents, after all. Ancillary information can go into a specialization table with an optional 1:1 relationship.

Ben Halpern • Oct 10 '18

This is fabulous, I’ll have some thoughts to add color in the morning.

There are a couple main themes here for me:

what makes sense in one context does not in another, which is part of why ActiveRecord callbacks are not well-loved in general.
Certain “secondary” actions, like deleting articles, remain under-cared for amidst the focus on developing actions taken more frequently, and perhaps with more on the line.

This is exactly why we open sourced. Dives like this help make the code awesome.

rhymes • Oct 10 '18

Thanks Ben! Take your time, I know it's a lot :-)

what makes sense in one context does not in another, which is part of why ActiveRecord callbacks are not well-loved in general

Yeah, count me in. I tend to move the callback logic in explicit "service objects" or something like that, like your "labor" folder. It makes testing a little bit easier as well I guess. These kind of refactorings definitely require lots of digging, testing and code breaking because you have to go counter the Rails way of trusting implicit side effects and make them explicit. The gain is that code usually becomes as side effect free as possible and more understandable for new comers because it tends to be narrated in the controller:

def destroy
  CacheBuster.bust(article)
  SearchSeeker.byebye(article)
  RelationshipCounselor.drop_or_ghost_everyone_and_everything(article)
  article.destroy!
end

something like that :D

Ben Halpern • Oct 12 '18

We currently do use this sort of pattern, and even give it its own folder app/services, but we only use it sometimes.

Right now I'd say we're sort of caught in between "hacked to get things to work" and "decent architecture" state (most code could probably be described that way).

The problem is probably that we've sort of gone half-way with some of these things.

The fact that we have the labor folder and the services folder (and the liquid_tags) and the models folders is indicative of some indecision combined with lack of taking the time to actually take the time for dedicated large-scale refactoring.

There are different schools of thoughts in terms of folder structure here (should it all go in models, with subfolders?)

If you have any opinions here, I'm all ears.

PS, we actually define our own timeout at about 9 seconds. RACK_TIMEOUT accepts some a Rails-magicky ENV variable called RACK_TIMEOUT_SERVICE_TIMEOUT. We could extend this longer, but everything we do really should finish sooner, so it would be better to fix the code IMO.

rhymes • Oct 12 '18

The fact that we have the labor folder and the services folder (and the liquid_tags) and the models folders is indicative of some indecision combined with lack of taking the time to actually take the time for dedicated large-scale refactoring.

I think it's normal :) A way to refactor could be choosing one style and slowly convert the rest to it

There are different schools of thoughts in terms of folder structure here (should it all go in models, with subfolders?)

Dunno. I've never liked the "fat model" approach because it leads to giant files with a lot of different business logic. For example, where do you put a method that operates on a user and a comment? In user? In comment? 42?

One thing that I like is having "third party services" clients in the same folder, especially if they are used in multiple part of the business logic.

I've also used presenters in the past to group the display logic and (some) query objects to hide complex queries and be able to test them in isolation.

These are a few articles on the topic of how to structure or refactor Rails code:

7 Patterns to Refactor Fat ActiveRecord Models (an intro from 2012 but with some still relevant pointers)
Presenters: I love presenters, because they make clear which part of the logic is just for the templates. A date formatter for the UI shouldn't be in a model for example. They aren't super easy to test in pure Ruby but dev.to moved on from controller tests so this shouldn't be an issue.
Sandi Metz’ rules for developers are more or less in tuned with what CodeClimate is warning about (ps. it's okay to break them if needed :D)
This is the technique I used to extract query objects

Another thing I would consider is to make more explicit what is an asynchronous job. The code base right now uses delayed_job directly with its various .delay, _without_delay and handle_asynchronously. Refactoring all of this into Active Jobs would have the effect of make jobs explicit and produce a nice app/jobs folder as a central repository for all the async stuff. This way you can test it in isolation if needed or if in the future you outgrow delayed job you know where to go to make the changes.

PS, we actually define our own timeout at about 9 seconds. RACK_TIMEOUT accepts some a Rails-magicky ENV variable called RACK_TIMEOUT_SERVICE_TIMEOUT.

Ah! Good to know :) Well, it's entirely possible that the wall is hit at 9 seconds then.