You are in front of a long-lived codebase and you understand… Well, not so much.
You want the code to tell you its story in order to add your own part to it. Nevertheless, you may find a code base, or some piece of it, that is a little pandemonium, where you cannot obtain too much information easily. You need to interpret how some concepts are represented and how some processes are reflected. Maybe, you try to read the documentation, but it can be outdated, redundant, useless, or non-existent.
You need to refactor for better comprehension before your start implementing a new feature or fixing that bug.
Refactor for comprehension is the process we perform to evolve present code structure to another one that is more understandable. One that reflects better the state of business knowledge and one that is easier to operate with.
In fact, we should consider code as an executable representation of the business or domain knowledge. Tests, on the other hand, are another representation of the same knowledge, built around the outcomes of the production code.
When we need to work with a messy piece of code, we surely will need to introduce some changes that should improve our ability to understand what’s going on. We have three areas in which to intervene:
Production code itself
Test covering that production code
Comments and documentation
Before we start talking about how to make code easier to understand, we should first learn about the “two hats metaphor”.
The two hats metaphor by Kent Beck helps us to understand something very important about refactoring.
The basic idea is that you must not mix refactoring with changing the behavior of a piece of code.
So, you work on preparatory refactoring crafting structural changes wearing the refactor hat. Then, you commit that set of modifications as a whole. Once committed, you work in the behavioral change and commit it wearing the change behavior hat. After that, you may need to put on the refactor hat again to tidy up things.
Beck says that the preparatory refactoring consists on reorganizing code to make the change of behavior easy and safe to apply:
You can’t wear two hats at the same time… Well, you shouldn’t. So, we ask you to put the refactor hat right now because we are going to introduce some nice ideas about how to improve the storytelling abilities of the code.
The first tool to make code explain itself is to choose names wisely.
Naming is reputed as one of the most difficult things in computer science, along with invalidating the cache, but you don’t have the obligation to spot on the first try. You can rename things for very good reasons, among others:
A concept that is represented in code with a name that doesn’t describe it accurately from the start.
A concept that could have evolved and its name became obsolete.
A name that could be too general or too narrow for the concept that it describes.
We can apply a rename refactor in these situations. Let’s see some examples.
Consider this variable name:
Leaving aside other questions about the code, what is the problem with this name? As we can see, the name mentions the current specific time limits decided by the business.
So, if those limits change someday, we could end with something like the first line of the following example, so we should change the variable name in order to keep it consistent with business requirements.
We will need to do the same every time those limits are changed. Nevertheless, it’s easy to forgot updating the name at some point, and that will cause some moments of anxiety to the future developer.
We can see that a more abstract concept is arising: the idea that there is a period of time (overnight) that requires a special treatment and that is defined by certain time limits. Probably, when talking about this feature with the business people or users, someone asked something similar to “What if the appointment is cancelled overnight?“. As programmers we need to define that with some kind of temporal marks, but the concept itself isn’t tied to a precise hour range.
We could express it this way:
Now, we have a more expressive name that doesn’t depend on implementation details. The concept overnight is an abstraction easier to understand than an arbitrary time interval, so its name doesn’t need to change frequently.
This very same idea can be applied to function and method names.
You can use the tools provided by your IDE, taking into account some safety measures.
You will have no problems when:
changing a variable name inside a method or function.
changing a private method name inside a class.
Most IDEs offer the rename refactor. At the end, it’s a find and replace, but IDEs can work at the syntax tree level, so it is usually more precise than us to complete this refactor.
For example, in the JetBrains IDEs you select the variable or function you want to rename, select Refactor, then Rename (caps-F6), type the new name, and you’re done.
If you change a public function name, you may find that there are a lot of places affected, so we will use a more conservative approach.
Let’s see an example: Imagine that you have this method in a class that you want to rename because its current naming doesn’t help too much. Sure, it’s funny, but a bit misleading. Nevertheless, the method is called in a lot of places, so a bulk rename involves risks:
First of all, duplicate the function and rename the copy. All calls to the original name are preserved and now you have a method with the better name that has the very same behavior.
Now, delete the body of the old function and write a call to the new one instead. This way, you avoid the code duplication without hurting existing uses of the old name.
Progressively, change the calls to the old method to the new one every time you find them and has something to do with your current task.
When there are no calls to the old method remaining, delete it.
Alternatively, you can use *extract method *refactor provided by the IDE selecting the body of the method you want to rename and extracting it with the new name.
There is a well-known code smell called magic numbers. It refers to primitive values that are present in the code but you don’t have any clue about what they mean.
In the same line we have used in the previous example about the overnight period. the limits of this period are represented by numbers.
Those numbers were decided by business people. From the code point of view they are arbitrary values, and with the passing of time, their meaning can be forgotten. So, it is a good idea to give them names.
The simplest way to do that is to convert these values into constants with a name:
There are other potential improvements for this line, but we will leave them for future articles about refactoring for better design.
This refactor can be applied to any arbitrary value you could find in a code fragment and needs explanation. But there are some more advantages:
If those values change you don’t need to touch the code that uses them, lowering the chances to break something by accident.
If they are used in several places, you will have only one point to change them, guaranteeing consistency.
Complex expressions should be broken into parts in order to improve readability, but also to avoid potential errors. Tangled expressions with a lot of elements are a good place for bugs to hide, specially when we need to modify them.
We can use the *extract variabl*e refactor, that consists of replacing part of a expression with a variable. A good rule of thumb is to apply this refactor to parts wrapped in brackets. Let’s see a typical example:
Ok. This looks like a very simple expression, but it should help us to understand the intent of the refactor and how to proceed.
The *$this->amount * vat **subexpression represents the concept of *tax or VAT amount. That concept could be expressed in a variable:
Now, the expression is easier to read. Imagine the same applied to more complex calculations.
Sometimes, these kind of extractions reveal the need for a behavior, even for a public one:
Extract variable helps us abstracting concepts inside the scope of a method or function. Extract method, instead, is useful when we identify public or reusable behaviors.
Every conditional expression that is a combination of two or more single conditions is a good candidate to be encapsulated in a method with a name that provides a meaning.
Imagine that we have a pricing schema that allow us to offer different prices to different age ranges.
The conditions in this method apply the price for each age range. Let’s take a look at line 12. This condition expresses an age interval for adults. We could extract this to a method that explicitly states that idea:
Even single conditions could need to be extracted if they are not expressive enough or easy to understand. We can do the same for line 17:
This is especially true when the condition is negated, because this kind of expression is more difficult to process when reading.
Not to mention negated negations. In that case, introduce a new method in the class being tested or a function that doesn’t need to be negated.
Introducing new methods is pretty cheap because they are not used in any other part of the code and you are not removing existing code. You can introduce progressively the use of the new code.
By the way, we want to mention that this kind of boolean properties are usually best represented by positive names that are easy to process even when they are being negated. Nevertheless, you should pay attention to the business value of that property in order to choose the best name.
We will back to conditional expressions in future articles about moving knowledge to the right place.
Conditionals can be tricky. And sometimes, in subtle ways. Take a look at this code, for example. It works perfectly, but can you spot the problem?
The problem is that the main responsibility of this method is under a condition. That makes the method name and body contradict each other in a way.
As a general rule is better to check requisites first and fail fast if they are not met. Or, as in this example, to return early doing nothing.
This kind of conditional is called a guard clause and its purpose is to avoid invalid data to reach the main processing. Guard clauses are used to ensure preconditions are met before proceed with the job in that method. We can use assertions or throw an exception if we prefer to fail.
Not only magic values need names. A lot of parts of the code will benefit from having good intent revealing names.
In general, every cohesive code block could be extracted to a private method, giving it an expressive name. Let’s return to the extract method refactor in a future post about long methods and classes, so we will pay attention for now to some refactor opportunities in which extract method contributes to better understability.
Body of loops. It is a good practice to separate iterations from actions in loops. To do this, extract the body of the loop to its own method. You can see an example here:
Of course, in this example you could go functional, but take into account that this approach could be way more difficult to understand at first sight:
Conditional legs. This extraction can also be applied to the legs of conditionals, making them easier to understand at a higher level. You simply has to extract legs to private methods. Let’s consider this example:
We can extract the body of the true branch to its own method.
Now, we are hiding the details in the extracted method and can scan the conditional structure faster, digging deeper if we need.
This also paves the way for applying further refactors. If you look at the code you will see that this part is pretty strange and it will need more work to get a better organization.
Extracting the branches of conditionals to methods reduces the indentation level and makes it easier to tidy things.
In this post your have seen some techniques that can help your code to be a better storyteller, for you and for future developers.
In new posts we will talk about other areas of improvement. For example, how to deal with long classes and methods. Those long code blocks usually mix responsibilities and are difficult to manage and test.
Also, we will address the problem of moving knowledge and responsibilities to their proper places, applying some well known design principles.
Keep an eye on our blog for this series and other interesting posts.