Jump ahead to the refactors
When I get too proud of the industry I've stumbled into, I like to stop and reflect on the fact that software is likely at the heart of a massive displacement crisis in this part of the country; our chief achievements in the past decade haven't been all that remarkable in hindsight; much of software is a ridiculously fraught house of cards we're building around ourselves; and our big drive now is to create money out of thin air (and, you know, vast, unconscionable amounts of the planet's energy supply).
After that grounding exercise, I reflect on what attracted me to programming in the first place: The puzzle-solving it asks of you, the freeform nature and critical thinking involved in finding a solution, the myriad ways it overlaps with rhetoric and writing, and how if you make it your profession, as John put it, you should
[a]bandon the idea that you are ever going to finish.
— Steinbeck: A Life in Letters
One can almost always find a better way to write a paragraph, just as one can almost always find a better way to write a block of code. Part of this comes from the fact that "better" is a subjective term, but the other half of the equation is that putting thoughts into words is a rough art, and our language isn't necessarily as precise – certainly nowhere near as acute – as the feeling we try to convey with it. Such is also the way of processes and workflows; if you've ever felt the joy of programming in Turtle (I haven't), you will understand the paradox of trying to succinctly yet clearly convey your intentions to a well-meaning but all-too-straightforward creature. Conveying intentions to another human being through the spoken word is far more perilous, outside of the scope of this post, and best regarded by the reader as impossible.
But edits to both prose and program are, indeed, achievable, and to borrow another quote,
To write is human, to edit is divine.
— Stephen King, On Writing: A Memoir of the Craft
Much is said about technical debt, and "the business value" it provides. Conventional wisdom says that "if it isn't a feature that the user sees, it provides no benefit to them, and is not worth the investment." By this rationale, we as developers ought to abandon all private
methods, 95-98% of our server logic, 100% of our test cases – who's afraid of feature regressions, anyway? – support and QA teams… essentially everything that isn't your business's marketing team, sales team, and anything that isn't eventually rendered within the client interface (also, keep the metrics. Executives. Like. Metrics. Charts are their gods.)
But the truth is that technical debt inevitably manifests in effects that the user will see, and chasing after the business value ignores this fact in pursuit of delivering features, which makes actually delivering features grow in complexity on a nonlinear scale. The more time your team spends figuring out just what your code is trying to do, the less time your team has to build new features on top of that code, and worse, the more likely they are to make mistakes as they build those new features. Those mistakes may cause new features to function improperly, existing features to stop functioning properly, or/and features yet-to-be developed that much more difficult to build. Engaging with technical debt, and editing your code to be more clear, concise, and understandable, is vital to maintaining consistent feature turnaround. It might sound weird, but keeping your code organized will also make working in your codebase a much more enjoyable experience.
So let's practice being divine, and refactor some code together.
When should I refactor?
My rule of thumb for deciding when to refactor boils down to about four key questions:
- Does this block have more than two levels of indentation than the rest of the code in the file?
- Is there more than three branching conditions, either as a chain of
if…else if… else if… else
s orswitch/case/cond
expressions? - Is this block of code working to modify more than one object or model?
- Do I need to reread the block, even once, in order to understand what it's doing?
If I answer "yes" to any of these questions, I make either a todo or a ticket in my project management tool, to remind myself come back to the code and clean it up when I've finished the feature I'm working on. If I answer "yes" to more than one, I refactor the code as I'm working on the feature. Some of you might balk at that, but trust me, you don't want to get caught up in a feature-drive, and realize after a month or more that you haven't got a clue what the thing you wrote is supposed to actually be doing.
Let's break these down with some examples.
1. More than two levels of internal indentation
In nearly every language these days, your indentation level denotes that work is being done within some kind of subprocess, whether that's a class declaration, a method definition, an if-else
case, a switch
-like, a loop, or a block, lambda, or internal anonymous function. For instance, here's a sample block of code modeled after something I came across while I was working at an enterprise company, on a project written in Python:
items_with_classes = []
for item in items:
for tag in item.tags:
if tag.type in current_page.selected_types:
items_with_classes.push({tag, tag.type})
This isn't the exact code, of course. Partly because I'm pretty sure my NDA forbids me from "taking" any code I've seen at that job and using it elsewhere, and partly because it was over a year and a half ago that I encountered this. But this is the same basic structure of that block, and the actual behavior of it was nested three levels deep within the function I found it in. In essentially any language you work with, there will be constructs for handling this situation more elegantly. In most cases, you can extract the behavior of each loop to its own separate function, but Python has an even more elegant way of creating a new list out of extracted data from lists: A comprehension.
items_with_classes = [{tag, tag.type} for item in items for tag in item.tags if tag.type in current_page.selected_types]
Three levels of nested logic just became a single line of code. And, you are still free to break the comprehension out across more than one line, if you don't find this very readable.
items_with_classes = [{tag, tag.type}
for item in items
for tag in item.tags
if tag.type in current_page.selected_types]
In either situation, the intent of the items_with_classes
list is now front-loaded with the information about what each item is going to use as its class, and it was done by leveraging one of the more interesting and expressive features of the language. (Plus, it's still one less line of code 😛)
2. More than three cases of branching logic
Applications become increasingly complex as they grow to handle more variations of data, and this becomes a point of developer overhead once the data variations have to be codified and handled in their own specific ways, and the rate at which that overhead grows will depend on how capable your language is at abstractions and dynamic retrieval and operation.
When you're building a front-end using React and the Flux paradigm, it's typical to have a state
object at the root of your application, which is passed down through the component tree to the rendered nodes, and which handles storing all the data your app will use to determine what to render, how it's rendered, and when to render it. This example is based on an internal tool I worked on at the same company as the previous example. I strongly suspect the engineer who wrote it came from a Java background… not to say anything negative about Java, it just felt like this pattern adhered to its strict constraints around getters and setters, as opposed to… well, any other language, besides maybe C.
function updateSearchStateAttribute(state, attrName, newValue) {
var searchAttributes = state.searchAttributes;
if (attrName === "inputField") {
searchAttributes.inputField = newValue;
} else if (attrName === "searchResults") {
searchAttributes.searchResults = newValue;
} else if (attrName === "mediaType") {
searchAttributes.mediaType = newValue;
} else if (/*…*/) {
/* four cases later... */
} else {
throw new Error("attribute " + attrName + " does not exist");
}
state.searchAttributes = searchAttributes;
return state;
}
I had no words. And the application had Babel and Webpack. And I still have no words, but I wound up refactoring about two dozen functions that mimicked this pattern, with varying numbers of else if
cases, down to this:
function updateStateAttribute(state, sectionName, attrName, value) {
const section = state[sectionName];
if (!section) {
throw new Error(`${sectionName} is not part of the state object.`);
} else if (!section.hasOwnProperty(attrName)) {
throw new Error(`${attrName} is not part of ${sectionName} state.`);
}
section[attrName] = value;
return {
...state,
section
}
}
Again, the language constructs offered an elegant means of removing mental overhead, and this time, removing such a substantial amount of conditional logic not only meant the amount of mental overhead for reading the code improved, but that writing new code required less mental overhead, which substantially improved feature turnaround on this application.
3. More than one kind of entity being handled in one block
There can sometimes be exceptions to this point, since models will eventually have to interact with each other to meaningfully relate in the context of what the user is doing, but as a general rule of thumb, if you find one method taking on more than one responsibility, you should consider if there's some way you could refactor it.
This example is one of my own, and although it has no state-machine components itself, it's inspired by a pattern I refactored within one company's custom-built state machine. Let's suppose I'm building a class to codify my own personal workflows, and I'm implementing the method for how I make_pets_happy
…
class Me < Person
# def initialize ... end
def make_pets_happy(pets = self.pets) # mebbe I'm takin' care of other ppls pets idk
pets.each do |pet|
water = Water.new # if only creating water were this easy in real life
case pet.class.to_s
when "Dog"
food = DogFood.new # always dogfood your code, if you can
bed = pet.bed.good_condition? ? pet.bed : DogBed.new
toy = BALLBALLBALLBALLBALL.new # OMG A BALL
rawhide = Rawhide.new
pet.feed(food)
pet.play(toy)
pet.fill_water(water)
pet.reward(rawhide) if pet.goodgirl? # always returns `true` for `Dog`
# aliased to `#goodboy?` on `Pet` and `#gooddog?` on `Dog`
pet.sleep(bed) if pet.tired?
when "Cat"
food = FancyFeast.new
bed = pet.caretaker.bed
self.bed = CatBed.new # I... I guess I can sleep here...
toy = Catnip.new
world = $world # grab global `world`
pet.feed(food)
pet.play(toy)
pet.fill_water(water)
pet.reward(world) if pet.goodgirl? # returns `true` while cat thinks it's actually *your* owner.
pet.sleep(bed) if pet.tired?
when "Bird"
food = BirdSeed.new
bed = nil # birds sleep standing up, nerd.
toy = WhateverBirdsPlayWith.new # I dunno
pet.feed(food)
pet.play(toy)
pet.fill_water(water)
pet.sleep(bed) if pet.tired?
end
end
end
end
Now, it probably just looks like a really big method to the untrained eye, but in reality, this is an unmitigated disaster of maintainability. Suppose there needed to be an update to this behavior tree, and I suddenly had to worry about my budget
, which were defined as budget = self.wallet.funds
. The code is all imperative, hard-coded line by line, so in order to take the new budget
variable into account, I would need to make sure I found every line initializing a new object, and either add another line beneath it to take budget -= whatever_object_was_initialized.price
, or pass budget
as an argument to ::new
. If I missed an invocation, I would either be miscounting the amount of money I'm spending each time I initialized that object, or getting charged with a misdemeanor for taking something from the store without paying for it (aliased to #shoplifting
).
Moreover, what if I start taking care of Hamster
s, Lemur
s, or Cuttlefish
? I would have to add another block of code to this case expression, making the entire coditional that much more brittle in the process. The way to rein in this chaos is something most computer science courses don't teach until near the end of the semester: Polymorphism.
def make_pets_happy(pets = self.pets)
pets.each { |pet| pet.fulfill(self.wallet.funds) }
end
That's all this method needs to be. One line where I pass my budget
constraint in to a method, defined on each class of pet that I might take care of, where the individual branches of the case
up above is contained and abstracted away from Me
. Me
need not worry about any of that. Me
should only worry about Me
. … … … Not … not "me" me, I should worry about the world around me… You get what I mean, don't pretend that you don't.1
4. "Wait, what did I just read?"
I'll make this quick, because this post is starting to get pretty long… The last common refactor I worry about is how easy it is to parse what an expression or statement is actually doing. I don't know about you, but long Boolean expressions and inverted Boolean logic are two of the most painful banes of my existence as a software developer.
Let's imagine a #valid?
method on some ActiveRecord model:
def valid?
state.in?(['active', 'pending', 'finished', 'drafted', 'published', 'deferred', 'unrequited'])
&& user_id.present? && user_id < 5000
|| state.in?(['finished', 'drafted', 'published'])
&& data['status'].between?(200, 299)
end
If I stared at this long enough, I could probably surmise that it was some kind of HTTP request, and #valid?
was trying to make sure it was either being performed by a specific user (likely an admin), and in a recognized state, or that it was a request being made by a normal user and in a complete-like state, and that the request status came back within the HTTP success range of statuses. That's all stuff I could codify as discrete methods that the resolved Boolean of parts of this expression. I should also codify the literals in here as constants on their respective classes
COMPLETED_STATES = %w[finished drafted published]
RECOGNIZED_STATES = COMPLETED_STATES + %w[active pending deferred unrequited]
def valid?
admin_request_in_recognized_state?
|| successful_complete_request?
end
def admin_request_in_recognized_state?
@state.in?(RECOGNIZED_STATES)
&& @user_id.in?(User::ADMIN_IDS)
end
def successful_complete_request?
@state.in?(COMPLETED_STATES) && JSON.parse(@data)['status'].success?
end
Conclusion, finally.
Refactoring is more than just about code clarity and readability. At WWDC 2018, Dave Abrahams gave a talk about how abstracting away from loops and toward algorithms improved not only code readability, but code performance as well (if the video doesn't play, it might only be playable in Safari... let me know if that's the case, and I'll do everything I can to get it into a more universal codec). You can't work toward faster coding implementation patterns, until you begin thinking in terms of abstracted coding patterns, and in order to abstract most of the code in your codebase, you have to begin thinking in terms of how to refactor your code.
To code is human; to refactor? Well... you know the rest. 😉
Thanks for taking the time to read all of this!
-
but if you don't get what I mean then please ask in the comments and I'll be more than happy to describe the logic behind
Me
. ↩
Top comments (0)