DEV Community

Cover image for Premature Optimization
Kostas Kalafatis
Kostas Kalafatis

Posted on

Premature Optimization

We all enjoy writing code and building things. Developers are also expensive and in short supply. One of the key challenges is optimizing our time management to maximize productivity. We don't want to ship code that our users dislike or that doesn't work.

However, most debates about performance are a waste of time, not because performance is unimportant but because people are really passionate about it.

The Tradeoff Triangle

Imagine a world in which your software can adapt effortlessly to changing demands while racing through complex calculations. On top of that, imagine how quickly these modifications will be implemented. That sounds like a dream, doesn't it? The tradeoff triangle is here to ground us and remind us that finding the perfect balance is not always possible, pushing us to make difficult decisions and sacrifices.

The tradeoff triangle is a visual representation of constant tension in software development. Its three edges—velocity, adaptability, and performance—illustrate the challenges of maximizing all three simultaneously.

Image description

  • Velocity refers to the rate at which a software development team can introduce new features and updates.
  • Adaptability is the ease with which a software system can be modified in response to changing requirements or environments.
  • Performance is how efficiently a software system executes tasks and utilizes resources.

You would believe that velocity and adaptability go hand in hand. After all, the more versatile the system, the easier it is to add new features, right? So, yes and no. While adaptability can increase velocity, often the two exist in tension. Prioritizing one can sometimes come at the cost of the other.

When you only care about velocity, you often have to give up code quality for speed. This makes people write code quickly, with bad structure and not enough testing, and makes them more likely to take shortcuts to meet deadlines. This method adds a lot of features quickly at first, but over time it causes technical debt that makes the system harder to change, fix, or add to. Over time, this rushed development leads to a decrease in feature development speed, an increase in the frequency of bugs, and higher costs as the challenges resulting from hasty development become more difficult to manage.

Image description

Too much focus on adaptability can lead to 'analysis paralysis,' a state where excessive time is dedicated to planning for every conceivable future scenario. In order to make the system as flexible as possible, this usually means making very general designs, complicated patterns, and a lot of documentation. Even though the goal is good, this method can make the initial development time a lot longer. The risk is investing in flexibility for changes that may or may not happen, which delays the project's main functions and benefits for users.

Image description

Adaptability can also slow things down because the parts of a system that make it flexible often add extra work. Adaptable systems may use more levels of abstraction, dynamic configurations, or generalized code paths to deal with change. While these features make the system more flexible, they can also slow it down and use more resources than a system that was designed to do a single job very well.

So, we should always choose something in the middle, right? Well, again, yes and no. The decision heavily relies on the current status of your project.

Image description

For instance, consider Twitter's early days. Twitter's early days likely demanded high velocity. They needed to get features out rapidly to compete, gain users, and validate their ideas. In this stage, some sacrifices in code adaptability and raw performance might have been acceptable to achieve rapid growth.

Image description

However, as Twitter matured, adaptability became crucial. They needed the ability to evolve, introduce new features, and support an ever-expanding user base. This likely prompted shifts away from the pure velocity approach to ensure the platform could handle change and scale.

The key takeaway is that there's no one-size-fits-all answer within the velocity-adaptability-performance triangle.

The Two Aspects of Performance

Distinguishing between performance issues is beneficial, categorizing them into two camps: micro-performance and macro-performance.

In software engineering, macro-performance refers to a system's overall, high-level performance as experienced by the end user. This encompasses factors like the responsiveness of the interface, the time it takes for complex features or tasks to execute, and the system's ability to handle large loads or datasets without significant slowdown. Macro-performance is the cumulative result of individual software components, architectural design, and hardware interaction.

Micro-performance focuses on the efficiency of small, granular code segments within a system. This includes the speed of individual algorithms, the way data is processed, memory usage patterns, and low-level interactions with hardware instructions. Micro-optimisations aim to squeeze the most performance out of these specific pieces of code, often through techniques like careful algorithm selection, avoiding unnecessary operations, and taking advantage of the CPU's architecture.

Micro performance and Premature Optimization

Premature optimization is frequently associated with micro performance issues. This is typically where someone comments in a code review that "You should do X instead of Y, because X is faster than Y".

But computers are really fast, and at the end of the day, we are writing code to solve a real world problem. And typically, it is better to solve a problem faster with slower code than to solve a problem slower with faster code.

Examples of Micro performance and Premature Optimization

Let's talk about some examples of premature optimization at the micro performance level.

In the C family of languages, there are two operators that can both let us increment the variable by one. You can use pre-increment (++i) or post-increment (i++). Some developers argue that pre-increment is faster than post-increment, so they always use pre-increment to optimize their code. This is because i++ first copies a copy of the value of i, then increments the value of i and finally returns a copy of the original value.

You might now think, OK, if ++i is faster than i ++, I should always use ++i for better performance. Instead of relying on what Joe Random says on Stack Overflow, it's crucial to conduct your own testing and evaluation.

Let's create a C programme that runs a for-loop and prints out the value of i using post-increment.

#include <stdio.h>

int main() {
    for (int i = 1; i <= 10; i++) {
        printf("%d ", i);
    }

    printf("\n");
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

Now let's view the assembly code for this operation:

Image description

The code might not look that straightforward, but it does the following: It copies the value 4 bytes below the base pointer (or the value of i in our case), prepares some printf arguments, and then increments the value of %rbp (which again is i) by one, and store it to the memory.

Now, let's try the same with pre-increment:

#include <stdio.h>
int main() {
    for (int i = 1; i <= 10; i++) {
        printf("%d ", i);
    }

    printf("\n");
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

And the assembly code:

Image description

Is exactly the same. So there is no difference there.

Now, let's explore a more complex example involving iterators. Below is a program in C++ that creates a vector of numbers from 1 to 10 and then iterates through them using custom iterator overloads for both prefix and postfix increment operations.

Let's first use the prefix increment operator to iterate through the vector and print each element to the console.

#include <iostream>
#include <vector>

int main() {
    std::vector<int> numbers(10);

    // Initialize the vector (iota is just a shortcut)
    std::iota(numbers.begin(), numbers.end(), 1);

    // Iterator loop using overloaded prefix increment
    std::cout << "Postfix increment:\n";
    for (NumberIterator it = numbers.begin(); it != numbers.end(); it++) {
        std::cout << *it << " ";
    }
    std::cout << std::endl;

    return 0;
}
Enter fullscreen mode Exit fullscreen mode

And the assembly code for that is the following:

Image description

Now let's try the postfix operator:

#include <iostream>
#include <vector>

int main() {
    std::vector<int> numbers(10);

    // Initialize the vector (iota is just a shortcut)
    std::iota(numbers.begin(), numbers.end(), 1);

    // Iterator loop using overloaded postfix increment
    std::cout << "Postfix increment:\n";

    for (std::vector<int>::const_iterator it = numbers.begin(); it != numbers.end(); it++) {
        std::cout << *it << " ";
    }

    std::cout << std::endl;
    return 0;
}
Enter fullscreen mode Exit fullscreen mode

And the assembly code for that is:

Image description

This version takes longer and uses the prefix operator, as you can see! There are differences in how well they do! The two versions are different until you turn on the compiler optimisations.When the compiler optimizes the code, the two solutions are identical.

The clear answer to the question of whether prefix or postfix is faster is MAYBE. The next step that makes sense is to measure everything.

After measuring the performance of these two methods on my work laptop, the speed of the slowest increment is 2.8 nanoseconds. To put this in perspective, 2.8 nanoseconds are equivalent to 0.00000000028 seconds. By the time you started reading this part, the increment would have run about 68,571,428,571 times. Or 4,500,000,000,000 times during the time it took me to investigate, set up, and measure everything. The real question is, will the increment actually run this many times?

And the question you should ask yourself is, "Do you find this conversation truly worth the effort?". If you find yourself suggesting this change in a code review, it may be worth reassessing the priorities of the review.

The Measure-Optimize-Measure Principle

When optimizing code, it's easy to fall into the trap of depending on intuition or assumptions about where the bottlenecks are. However, the most successful way to make significant improvements is to adopt a data-driven approach: measure, try something, and measure again.

  1. Measure (Identify Bottlenecks): Before changing a single line of code, use profiling tools to identify the actual performance flaws in your system*.* This will offer you a clear baseline and keep you from wasting time optimizing code that isn't a serious issue.

  2. Try Something (Make Specific Changes): Use your measurements to perform a specific modification. This could include improving an algorithm, reorganizing the code flow, or experimenting with new data structures. Resist the impulse to make broad modifications to your codebase without a clear approach.

  3. Measure Again (Assess Impact): After making your changes, rerun your profiling tools. Did your optimization make a difference? Did it help the area you were focusing on? Did it unexpectedly reduce performance in another portion of the system?

How to Optimize your Code

Optimizing your code necessitates a systematic approach. It is about choosing the appropriate instruments for the job at hand. The first, and perhaps most important, consideration is your choice of data structures. The way you organize information has a big impact on how quickly and efficiently your code can process it. Consider selecting the appropriate data structure to be similar to selecting the ideal hammer for a construction job: it makes a huge difference.

Let's look at some popular data structures. If you need to access elements in a precise order or find them directly by position (for example, the tenth item in a list), arrays are your best option. Need to easily hunt up data using a unique identifier? Hash tables excel at performing these extremely rapid searches. Linked lists provide flexibility for data that expands and decreases often (such as an order queue). There are even specialized structures, such as trees (for hierarchical data) and graphs (for representing complex relationships).

Even after carefully selecting data structures, your code may still have hidden performance bottlenecks. This is where profilers can become your best buddy. These tools function like detectives, analyzing your code as it runs and highlighting "hotspots" - portions that take disproportionately long to execute. Profilers allow you to focus your efforts where they will have the greatest impact.

Profiling allows you to make informed optimization selections. You can identify which specific processes or loops are slowing down your system. This targeted approach saves you time optimizing code areas that were never a problem in the first place. Remember that optimization is typically an iterative process: profile, make a change, re-profile, and assess the outcomes.

Do you recall the notorious loading times that plagued Grand Theft Auto Online? An enterprising player, armed with a profiler, discovered a major bottleneck within the game's code. It turned out to be an inefficient loop processing a JSON file. Once identified, this structural issue could be addressed, significantly improving loading times for everyone.

Once you've optimized your data structures and used a profiler to identify bottlenecks, it's time to examine the inner workings of your code. Delve into the core functionality of your code by assessing the efficiency of your algorithms. Consider shifting to more effective algorithms to improve the speed of specific operations. Evaluate the complexity of existing algorithms and explore better alternatives.

Find parts of your code where operations are more complex than needed. Search for repetitive calculations that can be improved by saving results or refining your code's logic. Aim to reduce unnecessary computational overhead.

Assess potential compromises involved in enhancing performance. Recognize that improving performance may sometimes require sacrificing a minor degree of code readability. Evaluate strategies such as loop unrolling or the use of lookup tables, weighing their benefits against any impact on code maintenance.

Finally, consider the interaction between your code and memory utilization carefully. Allocating memory, though it might seem harmless, incurs a cost. Recognizing and comprehending these expenses is essential for optimizing critical parts of your code.

Allocating memory, like adding an item to a list or creating a new object, comes with some extra work. The system has to perform tasks to locate an appropriate spot in memory for these operations. This overhead, especially in code areas where performance is critical, can significantly impact overall efficiency.

To mitigate potential slowdowns, it's advisable to pre-allocate memory whenever feasible. If you have a good estimate of the eventual size of a data structure, allocating space for it in advance can prevent expensive resizing operations down the line. Additionally, for scenarios where your code frequently creates and discards many short-lived objects, employing an object pool can be beneficial. By reusing objects rather than continuously generating and disposing of them, you can lessen the strain on the memory management system.

Conclusion

In the process of creating software, it's all about striking the right balance between making it run efficiently, ensuring it can adapt over time, and ensuring it's effective at what it's designed to do. It’s like walking a tightrope, where each decision can sway you in different directions. Throughout this article, you've delved into various strategies to enhance software performance, from optimizing resource utilization to selecting the most effective tools and approaches for your tasks.

Optimization goes beyond just speeding up software; it involves thoughtful decision-making, such as selecting the best ways to structure data or pinpointing when and how to refine certain parts of the program for better efficiency. We've looked at real-life examples, like Twitter's growth and a gamer improving a game's loading times, to see how these decisions play out in the real world.

Enhancing software demands a blend of technical expertise, a profound understanding of its objectives, and a readiness to tackle intricate problems. It's about making software that not only performs well but is also understandable and manageable for those who use it and work on it.

In conclusion, the journey to enhance software performance embodies a dedication to excellence, continuous improvement, and user-centric solutions. Let this commitment drive your pursuit of coding mastery and innovation in every line of code you craft. Although attaining the ideal blend of speed, adaptability, and functionality presents challenges, unwavering dedication to this process propels technological progress. Continue to craft your software with care, intelligence, and creativity. Above all, take pride in the code you create.

Top comments (0)