Adam Nathaniel Davis

Posted on Feb 24, 2020 • Edited on Jul 29, 2020

Tearing Down a Function (and Building It Back Up) - Part 1

#refactoring #javascript #tutorial #codequality

The refactoring subject can get bogged down in a lot of generic, abstract concepts that are challenging to translate into real-life coding scenarios. So rather than just hit you with a bunch of slogans and bromides about "you should always do this" or "you should never do that", I think it's more effective to just walk you through examples of how I typically refactor my own code - or the code of projects on which I'm assigned.

Baseline Assumptions

Sometimes "refactoring" is used to refer to the wholesale rewriting of an entire codebase. The task can be daunting if the proposed refactoring applies to code that was written by someone else. Even worse, the refactoring could be targeted at code that, ostensibly, already works.

We've all seen it. Code that, technically, does what it's supposed to do. But man... does it look ugly. On one hand, there's a strong desire to get in there and clean that shit up. On the other hand, there's an equally-strong (and healthy) fear that your attempts to make the codebase "cleaner" will somehow introduce bugs into an application that may not have been pretty, but was otherwise working.

So first, let's set aside the idea of a wholesale, application-wide refactoring. Not only would that go well beyond the scope of any given blog post, but in my experience, massive refactoring projects are a lot like Bigfoot and the chupacabra. We've all heard rumors of them - but we've rarely (if ever) actually seen them.

Most companies don't want to pay you to rewrite an entire codebase that already "works". And most developers don't want to be bogged down in days/weeks/months of simply rethinking every portion of an existing application - even if everyone agrees that the code for that application is downright fugly.

The techniques I'd like to cover are more targeted to single functions. In some cases, they're targeted to single lines of code. They're the kind of refactorings that you can do anywhere, anytime, without any serious fear of disrupting the fragile beast that is Your Scary Legacy Codebase.

Second, let's put aside the idea of diving deeply into business logic. There are times when existing business logic must/should be rewritten. But there's a ton of "low-hanging fruit" that we can target to make almost any code snippet cleaner and easier to read. And we can do all of it with rock-solid certainty that we're not tinkering with any legacy logic.

So let's dive right into a practical example:

getThumbnail = post => {
   let thumbnail = <div style={{height: 108, width: 67}}> </div>;
   if (post.preview && post.preview.images) {
      const images = post.preview.images[0];
      if (images.resolutions) {
         const smallestThumbnail = images.resolutions[0];
         if (smallestThumbnail.width === 108) {
            const url = smallestThumbnail.url.replace(/&amp;/g, '&');
            thumbnail = (
               <div style={{height: 108, width: 67}}>
                  <img src={url} alt={'thumbnail'}/>
               </div>
            );
         }
      }
   }
   return thumbnail;
};

Here we have a not-terribly complex React function that's designed to return a <div> that contains a thumbnail image. I wrote it myself for a perfectly-functional application. And the function... works. There's nothing necessarily "wrong" with it.

But is it... optimal?? Not really. I'm in-no-way ashamed of it. But it could use some TLC. It could be easier to read. It could be more self-explanatory. So let's see what we can do to polish it up a bit.

The first thing that catches my eye is the use of some magic numbers. Specifically, we see the numbers 108 and 67 repeated several times throughout the function.

Why are those specific numbers important? The answer is that the posts that we're sorting through are from Reddit. Every post has the potential to have thumbnail images associated with it, and these thumbnails can come in a variety of sizes.

The smallest-possible thumbnails that Reddit offers have a height of 108 and a width of 67. So we want to inspect the post and see whether one of these smallest-possible thumbnails is associated to it. If it is, we will return a <div> that contains an <img> for the thumbnail. If one of these smallest-possible thumbnails is not available, then we just return an empty <div> that's set to the height-and-width of the nonexistent thumbnail.

The "problem" with those magic numbers is that you can't really expect another dev to understand their significance merely by reading through this code. In the example, the significance of 108 and 67 can only be gleaned through a reliance on the dev team's "tribal knowledge".

Magic numbers (and their ugly siblings: magic strings) are generally considered code smells. They may not be wrong, per se. But their presence makes the code harder to follow and can potentially mask bugs.

For example, in the above snippet, what if some dev decides that 100 and 50 are much "nicer" round numbers. They may think, "What's it hurt if we tweak the dimensions on this thumbnail?" Their folly would be encouraged because there's nothing in the function to highlight the numbers' significance. But if they change those values, the code will no longer properly match upon the expected results that are returned from the Reddit API.

Not all numbers should be shouted down as magic numbers. There is a time-and-place to just chunk those numbers right into your code. For example:

render = () => <div style={{padding: 20}}>foo</div>;

In this little snippet, there's nothing "magical" about {{padding: 20}} because it's fairly obvious that 20 has no context. In other words, there's nothing terribly special about the number 20 in this example. It's safe to assume that we've used 20 for the mere fact that we wanted 20 pixels of padding in this particular <div>. This should not be targeted for refactoring as a magic number.

But when you have hyper-specific numbers - like 108 or 67 - it begs the question, "What is so special about those specific numbers?" Luckily, we can easily address such a question with a little syntax-fu:

getThumbnail = post => {
   const smallestPossibleRedditThumbnail = {
      height: 108,
      width: 67,
   };
   let thumbnail = (
      <div style={{
         height: smallestPossibleRedditThumbnail.height,
         width: smallestPossibleRedditThumbnail.width,
      }}> </div>
   );
   if (post.preview && post.preview.images) {
      const images = post.preview.images[0];
      if (images.resolutions) {
         const smallestThumbnail = images.resolutions[0];
         if (smallestThumbnail.width === smallestPossibleRedditThumbnail.width) {
            const url = smallestThumbnail.url.replace(/&amp;/g, '&');
            thumbnail = (
               <div style={{
                  height: smallestPossibleRedditThumbnail.height,
                  width: smallestPossibleRedditThumbnail.width,
               }}>
                  <img src={url} alt={'thumbnail'}/>
               </div>
            );
         }
      }
   }
   return thumbnail;
};

So now, the code literally tells us exactly why 108 and 67 are important. Because they're the height-and-width dimensions that correspond to Reddit's smallest possible thumbnail. If you didn't know that already, you can tell simply by reading through the code.

OK... I can actually hear some of you out there starting to grumble. That's right. I can hear it, right now, in real-time, through the magic of the interwebs. A bunch of you are starting to whine about:

smallestPossibleRedditThumbnail is too-long and incredibly unwieldy as a variable name.

I will freely admit that my "coding style" tends to veer toward the verbose. Do you have to create variable names that are that long?? Of course not. And I know that most of you won't.

Coders are notorious for lazily choosing ridiculously-short variable names. They'll abbreviate "OK" if you let them. They act as though they're still coding in Notepad (not even, Notepad++), where they'd still have to painstakingly type out every letter of every variable.

Of course, in reality, once they're done bitching about my supposedly-too-long variable names, they'll then fire up their own, modern, IDE - which will swiftly code-complete even the longest of variable names with just a few keystrokes. Nowadays, even the jankiest (technical term) of IDEs will perform the needed code-completion on these supposedly-too-long variable names with only a few keystrokes.

Granted, it's not "wrong" if you choose to name this variable something like smPossRedditThumb. But I've been doing this life for waaayyyy too long. And I can tell you from (painful) experience that, when you're in the middle of coding something up, all of those clever little abbreviations seem to be obvious and self-evident. And then... a year later, when someone else is trying to go through your code... Hell, even when you are trying to go through your own code, those supposedly-obvious abbreviations can start to feel mighty obtuse.

This leads to the second whiny gripe that I can literally hear you muttering at your screen:

Dooood... your "refactoring" actually added lines to the function.

Before I address this, I need you to perform one simple task. I need you to pick up the heaviest object in your vicinity and bang it into your face. You don't need to go overboard. A few dozen times will do just fine. Just enough to get a nice, bloody patina going all over your oh-so-attractive features.

Go ahead. Get to banging. I'll wait...

Still waiting...

Alright. Have you completed your self-punishment? Good. Now repeat after me:

Refactoring is NOT a mindless exercise to see how many LoC you can eliminate.

Granted, a good, solid refactoring will often result in fewer LoC. And that's generally a good thing. But don't get cocky and go overboard with that shit. If your refactoring results in code that's harder to read, then you've FAILED.

Fewer LoC is a frequent side effect of skilled refactoring. It is not the goal of said refactoring.

OK, now that I've completed that rant, and now that you've wiped some of the blood off your face, let's get back to looking at that code.

Once I assigned descriptive, more-verbose names to our magic numbers, it did, in fact, cause some of my code lines to get rather long. To "combat" this, I started breaking up some values onto their own separate lines. And this did, indeed, end up increasing the LoC in the function.

But... it also highlights some redundancies in the function. With DRY in mind, we can now target those redundancies for consolidation.

Specifically, my new smallestPossibleRedditThumbnail object actually holds the exact same values that are used during both of the times when we build the containing thumbnail <div>. So we can pretty this up as so:

getThumbnail = post => {
   const smallestPossibleRedditThumbnail = {
      height: 108,
      width: 67,
   };
   let thumbnail = <div style={smallestPossibleRedditThumbnail}> </div>;
   if (post.preview && post.preview.images) {
      const images = post.preview.images[0];
      if (images.resolutions) {
         const smallestThumbnail = images.resolutions[0];
         if (smallestThumbnail.width === smallestPossibleRedditThumbnail.width) {
            const url = smallestThumbnail.url.replace(/&amp;/g, '&');
            thumbnail = (
               <div style={smallestPossibleRedditThumbnail}>
                  <img src={url} alt={'thumbnail'}/>
               </div>
            );
         }
      }
   }
   return thumbnail;
};

With this change, we've gotten our LoC back down to a manageable level and we've provided the much-needed context to the hyper-specific numbers of 108 and 67. Now let's turn our attention to that first if() condition.

We all know that our conditionals and our loops should be indented on their own level. Code with no indentation is objectively awful code. But indentation, when taken too far, can have its own cognitive cost. Consider the following example:

someFunction = () => {
   if (condition1) {
      const someVar1 = 'foo';
      if (condition2) {
         const someVar2 = 'bar';
         if (condition3) {
            const someVar3 = 'baz';
            if (condition4) {
               const someVar4 = 'fu';
               // do some kinda nested logic HERE
            }
         }
      }
   }
};

The above snippet isn't necessarily wrong. In fact, there will be some times when you have little choice but to write something very similar. But the "problem" that arises is when others have to cognitively trace through your code (often, long after you originally wrote it), and they have to think about all the conditions which have been met in order to reach the nested logic.

For this reason, it's generally good practice to "flatten" this logic whenever possible. When you're reading through code (and we spend far more time reading code than we do writing it), it's easier to follow the control logic if you don't have to keeps tabs in your head on a series of conditions that were met for you to reach this point in the code.

To be clear, you can't always avoid this kind of logic-nesting. But you can usually avoid it. Most of our lives are spent writing code that lives within a given function/method. The beauty of this is that the function has a built-in "short-circuit" that allows us to forgo the rest of the function's processing. That "short-circuit" is the return keyword. And we can often use it to "flatten" our code.

Once we've assigned a default value to thumbnail, we then check to ensure that the supplied post object has a preview property. And then we check to ensure that the post.preview object has an images property. If it fails either of these basic checks, the function "short-circuits" by simply returning the default thumbnail value.

But we can use the function's built-in return feature to flatten our logic like so:

getThumbnail = post => {
   const smallestPossibleRedditThumbnail = {
      height: 108,
      width: 67,
   };
   let thumbnail = <div style={smallestPossibleRedditThumbnail}> </div>;
   if (!post.preview || !post.preview.images) 
      return thumbnail;
   const images = post.preview.images[0];
   if (images.resolutions) {
      const smallestThumbnail = images.resolutions[0];
      if (smallestThumbnail.width === smallestPossibleRedditThumbnail.width) {
         const url = smallestThumbnail.url.replace(/&amp;/g, '&');
         thumbnail = (
            <div style={smallestPossibleRedditThumbnail}>
               <img src={url} alt={'thumbnail'}/>
            </div>
         );
      }
   }
   return thumbnail;
};

We basically inverted the logic in the first if() condition. Rather than saying, "We will only execute the following code if these conditions are true," we reversed it to say "If the conditions are not true, then just return the default thumbnail."

What's the difference?? Logically, there is no difference. But by inverting the logic, it allowed us to eliminate one layer of nesting in our function.

But there's no need to stop there. A few lines later, we have another of those pesky if() conditions that's foisting its own layer of indentation on us. So we can invert that condition as well and eliminate yet another layer like this:

getThumbnail = post => {
   const smallestPossibleRedditThumbnail = {
      height: 108,
      width: 67,
   };
   let thumbnail = <div style={smallestPossibleRedditThumbnail}> </div>;
   if (!post.preview || !post.preview.images) 
      return thumbnail;
   const images = post.preview.images[0];
   if (!images.resolutions) 
      return thumbnail;
   const smallestThumbnail = images.resolutions[0];
   if (smallestThumbnail.width === smallestPossibleRedditThumbnail.width) {
      const url = smallestThumbnail.url.replace(/&amp;/g, '&');
      thumbnail = (
         <div style={smallestPossibleRedditThumbnail}>
            <img src={url} alt={'thumbnail'}/>
         </div>
      );
   }
   return thumbnail;
};

Awww, yeah...! Now we're really starting to get nice-and-flat. But by now, you can probably see that there's still more opportunity to flatten this sucker by, yet again, inverting the next if() conditional like so:

getThumbnail = post => {
   const smallestPossibleRedditThumbnail = {
      height: 108,
      width: 67,
   };
   let thumbnail = <div style={smallestPossibleRedditThumbnail}> </div>;
   if (!post.preview || !post.preview.images) 
      return thumbnail;
   const images = post.preview.images[0];
   if (!images.resolutions) 
      return thumbnail;
   const smallestThumbnail = images.resolutions[0];
   if (smallestThumbnail.width !== smallestPossibleRedditThumbnail.width) 
      return thumbnail;
   const url = smallestThumbnail.url.replace(/&amp;/g, '&');
   thumbnail = (
      <div style={smallestPossibleRedditThumbnail}>
         <img src={url} alt={'thumbnail'}/>
      </div>
   );
   return thumbnail;
};

So we've flattened the hell outta this puppy. Are we done?? Naww, man. Let's press on.

If you look at the last two commands in this function, they contain an inefficiency that I spot in other people's code all the time. The inefficiency is very simple and it always looks something like this:

myFunction = () => {
   // some other processing up here
   const myReturnValue = 'foo';
   return myReturnValue;
}

In this scenario, the temporary variable myReturnValue is completely superfluous. There's really no utility in setting some value, and then immediately, on the very next line, returning that same value. If that's all you're going to do, then just bypass the temporary variable altogether and directly return the computed value like so:

myFunction = () => {
   // some other processing up here
   return 'foo';
}

When we apply that to our newly-flattened function it will look like this:

getThumbnail = post => {
   const smallestPossibleRedditThumbnail = {
      height: 108,
      width: 67,
   };
   let thumbnail = <div style={smallestPossibleRedditThumbnail}> </div>;
   if (!post.preview || !post.preview.images) 
      return thumbnail;
   const images = post.preview.images[0];
   if (!images.resolutions) 
      return thumbnail;
   const smallestThumbnail = images.resolutions[0];
   if (smallestThumbnail.width !== smallestPossibleRedditThumbnail.width) 
      return thumbnail;
   const url = smallestThumbnail.url.replace(/&amp;/g, '&');
   return (
      <div style={smallestPossibleRedditThumbnail}>
         <img src={url} alt={'thumbnail'}/>
      </div>
   );
};

Now I'm going to address readability. Whether your function contains 3 lines or 33 lines, the "base" language constructs can make it challenging to understand just why the code is doing what it's doing - even for seasoned, senior devs like myself.

Let me be very clear and explicit about this point.

Exceptionally "clean" code can help any dev, of any relative skill level, grasp exactly what your code is doing. But it still may not be evident exactly why the code is doing what it's doing.

There have been plenty of times when I'm reading through a legacy codebase and I'm thinking:

OK, I understand that you're replacing a given value out of this string. And then you're comparing it to some other string. And then you're checking to see if that string already exists in some store. Yeah... I get that. But what exactly is this accomplishing in our business logic??

So the challenge of solid refactoring goes beyond the process of making your code "clean". In the best-possible scenario, it enables any other devs to quickly-and-easily transfer the underlying logic of that code into their brain.

This is why it's extremely useful to eschew esoteric abbreviations in your variable names. Whenever possible, name the variables in such a way that they represent the explicit state of the information being presented.

Furthermore, any chunks of code (even if it's only two-or-three lines), can be made far easier to understand if they're encapsulated in their own, well-named functions. If several lines of code are all designed to, in aggregate, perform a single function, then don't be afraid to actually put them into their own, descriptively-named function).

I'm going to tweak a few more variable names, to hopefully make it painfully evident what this function is doing.

thumbnail was originally declared with let because I was thinking that it would deliver either the "default" empty thumbnail <div>, or the populated thumbnail <div>. But now, thumbnail holds only a single, default value that should really be immutable. I'd also like to make it explicitly clear that this is the default, "empty" thumbnail. So I'll name it accordingly.

Also, I'm going address the naming of smallestThumbnail in these lines:

const smallestThumbnail = images.resolutions[0];
if (smallestThumbnail.width !== smallestPossibleRedditThumbnail.width) 
   return thumbnail;
const url = smallestThumbnail.url.replace(/&amp;/g, '&');

Naming is always a tricky beast. But if you want your code to be as clear as possible, naming is especially critical whenever you're trying to compare two values that may-or-may-not be similar and/or identical. In the example above, I could imagine a dev thinking:

Wait... What's the distinction between the smallestThumbnail and the smallestPossibleRedditThumbnail? They kinda feel like the same thing.

So these changes will be incorporated as such:

getThumbnail = post => {
   const smallestPossibleRedditThumbnail = {
      height: 108,
      width: 67,
   };
   const emptyThumbnailContainer = <div style={smallestPossibleRedditThumbnail}> </div>;
   if (!post.preview || !post.preview.images) 
      return emptyThumbnailContainer;
   const images = post.preview.images[0];
   if (!images.resolutions) 
      return emptyThumbnailContainer;
   const smallestProvidedImage = images.resolutions[0];
   if (smallestProvidedImage.width !== smallestPossibleRedditThumbnail.width)
      return emptyThumbnailContainer;
   const url = smallestProvidedImage.url.replace(/&amp;/g, '&');
   return (
      <div style={smallestPossibleRedditThumbnail}>
         <img src={url} alt={'thumbnail'}/>
      </div>
   );
};

That feels a bit more "explicit". (To me, anyway.) I truly believe that even some of most hardcore refactoring zealots would stop right here and call it a day. But we can still make this more... self-documenting. With helper functions.

But this post is getting long. So I'm gonna leave further improvements for Part 2...

DEV Community

Tearing Down a Function (and Building It Back Up) - Part 1

Baseline Assumptions

Top comments (0)