Jeremy Friesen

Posted on Jul 19, 2023 • Originally published at takeonrules.com on Jul 15, 2023

Refactoring to Understand a Code-Base

#procedures #programming #todayilearned

In 2022 I started writing this. At the time, I was working at Forem.

I want to walk through my sometimes scatter shot approach for understanding a code-base. I’m going to focus on Rails as that’s where I spend most of my professional coding energy. This supplements reading any documentation and talking with team mates. I worked in a globally distributed team; it required considerable calendar coordinating effort to secure synchronous time for conversations.

Strategies

As part of orienting to a code-base, either by fixing bugs or adding features, I like to explore the code-base. And in order for me to fix a bug or add a feature, I want to get an overview of the entire code-base and then dive into specifics.

These strategies are about exploring.

Generate the Inline Documentation

I favor writing documentation in YARD. One reason is that Solargraph, a Language Server, can leverage that documentation to improve your LSP experience.

I like to run yard on the project and open the generated doc/index.html in my browser. If I’m working on a model, I’ll navigate to that model and see what documentation we have.

Comments on Comments

Writing code is about communicating to two audiences: humans and computers. All of this advice is to help humans approach your code.

I was once of the camp that thought code should not have comments; which is not very nuanced.

Let’s break that down a bit:

Code should have inline documentation.
Code should have expressive names for symbols.

What is inline documentation? It is a type of comment that can be picked up by documentation generation tools. It is typically a comment block found before a method, class, or module definition.

What is expressive names for symbols? The names of the methods, variables, constants, modules, and classes should provide insight into their purpose. So help me if I see single letter variable names. Take time to write more robust names.
More robust names are easier to search and replace across the entire code-base. Perhaps not as vital if you have a compiler that can perform this refactoring. But still, searching your code for words is a better experience than searching for single letters.

What about other comments? I don’t think comments within methods should say what the code does. The code should say that. Instead they should say why it does this.

If you find yourself writing a comment explaining what a chunk of code does, consider extracting a method; naming the method to give it meaning and documenting it’s purpose.

Looking at the Focal Model

A former team member of mine commented that in their experience a Rails application has two models that grow faster than any other models:

The User model.
The focus of the application.

In the case of DEV Community and the Forem codebase, that’s the Article model.

With the two important models in hand, I start exploring places where I find Article. or User. declarations (e.g., code that’s likely “finding” things from the database).

The following command gives me a quick and scannable insight into how we’re using Article: rg "Article\." -A 2 app/controllers.

Consolidating Scopes

I also find that in app/controllers I want to see how we’re using ActiveRecord::Base.where; refactoring those where calls into scopes on the User model might provide further insight into the User domain.

Scopes are a great way-station on a possible pathway to extracting query objects.

Parameterizing Magic Variables

When I look at methods and see something like something > 5.days.ago, I’m curious what that 5 represents. I try to name it. And if this is part of a method call

In a completely arbitrary example let’s say I have the following:

def accept_comment_from(user:)
  return true if user.created_at > 5.days.ago

  false
end

As I asked, what’s the significance of 5? In this case, I’m seeing it as number_of_days_we_consider_a_user_to_be_new.

def accept_comment_from(user:, number_of_days_we_consider_a_user_to_be_new: 5)
  return true if user.created_at > number_of_days_we_consider_a_user_to_be_new.days.ago

  false
end

That’s a bit more descriptive and hints at a possible property on the user (e.g. user#is_newly_registered?). I’ll check the User model for the method or other places. I might also look for 5.days.ago in the code-base. It’s rare in a code base that we would only ever ask this kind of question once.
Ideally we’d ask the question once, after all we want to follow DRY principles. But doing so requires considerable vigilance.

Also, in the above method, by providing a named parameter we can more easily write tests. Yet we shouldn’t stop there.

With that refactor, I’d look to promote that named parameter to a constant and even move the method to the User.

class User
  # Warning, don’t set this as 5.days.ago as that will freeze that
  # point in time relative to when we boot the application.
  NUMBER_OF_DAYS_WE_CONSIDER_A_USER_TO_BE_NEW = 5

  def consider_as_new_user?
    created_at > NUMBER_OF_DAYS_WE_CONSIDER_A_USER_TO_BE_NEW.days.ago
  end
end

def accept_comment_from(user:)
  user.consider_as_new_user?
end

Code Analysis

Leverage tools likeRubocop and Flay. Rubocop can help find complex code. And Flay can help find structural duplication.

These areas may warrant review to see if there’s common logic to extract into a class or service. And organizing things is hard. Where do you put these things? Documentation and directory structure can help. As can having a conversation with your team.

As you work on these things you may be on the pathway of teasing out a new concept. In my experience taking time to play with and name that concept pays tremendous dividends as it is often a pathway to a new set of features or flexibility.

Naming Things is Hard

Recently I’ve been working on the Derivative Rodeo. We spent time naming the DerivativeRodeo::StorageLocations, and documenting that concept in the BaseLocation. We also spent time naming the DerivativeRodeo::Generators and documenting that concept in the BaseGenerator. We help orient to those high-level concepts in the Daerivative Rodeo repository’s README.

In other words, we shared the discoveries we made while of delving into the problem space and naming and organizing things.

Profile Examples

I often use RSpec for testing. One feature that I like to turn on is profile_examples. I set that value to 5 or 10. As part of running the tests, RSpec will report the 5 (or 10) slowest tests.

The slow tests are often feature tests. However, when I run a subset of tests, I’ll see the slowest of that subset. I’ve used this information to get curious about implementations.

For example, why is the slowest test ten times slower than the second slowest test? Look to that test and there’s likely a useful refactoring to consider.

Perhaps the setup is arduous; if so how could you break it apart? Perhaps it’s one test for a single method; how could you refactor that method to break apart the slow test?

Predicate Methods on User

Oh those pernicious little User#admin? and User#editor? methods. They invariably indicate leaky authorization logic. Look in the views for if current_user.editor? declarations; . There is an implicit authorization layer that’s being encoded and entwined in the view logic.

A quick rg "user\.\w+\?" app/views --only-matching --no-filename | wc tells me that there are about 40 instances in the Forem codebase where I need to look and see what those questions of the user are about. That’s a place for refactoring and review.

Knowing that these predicate methods exist can give insight into how cautious you should be when working through authorization issues.

Look to History

The code-base is assuredly in version control. Run a meta-analysis on the repository. The git-extras repository provides the git effort command:

Displays “effort” statistics, currently just the number of commits per file, showing highlighting where the most activity is. The “active days” column is the total number of days which contributed modifications to this file.

This output can show you hot-spots in code. Code that sees a lot of change is likely a place from which concepts are emerging. Or bugs are occurring. Or disagreements are happening.

They are interesting because people are spending time interacting with them. Look to those interactions. Read the commit summaries of those files. See what changed with those commits.

Write Documentation

All of this orienteering will give you insights into the code. Take time to write that up as documentation or inline comments. And share those documentation-only changes. Have team members review that documentation; hopefully you’ll both learn something and begin to have more meaningful conversations about the code-base.

Consider if you put something forward and it’s missing a core concept. Your team member can help share that context.

Conclusion

These are a few of my approaches; I don’t always use them. However each strategy provides an actionable path to start moving on a task. For myself, moving within the code is the best way for me to orient.

DEV Community