DEV Community

Cover image for How Have You Refractored or Optimized Code for Improved Performance? staff for The DEV Team

Posted on

How Have You Refractored or Optimized Code for Improved Performance?

Code optimization and refactoring are crucial for enhancing the efficiency and speed of software. Share your experience of a specific instance where you had to tackle performance issues in code. What steps did you follow to improve it? We're interested to know the outcome of your efforts and any lessons learned along the way.

Follow the DEVteam for more discussions and online camaraderie!

Image by pch.vector on Freepik

Top comments (12)

codenameone profile image
Shai Almog

I have 3 performance tips:

  • Caching
  • Caching
  • Caching

It's almost always one form or another of caching (assuming it isn't a bug). One of the earliest examples I did of this was in the 80s where I pre-caculated the results of trig functions and stored them in an array. I could then perform trig calculations needed for my game with the cost of an array lookup.

tiguchi profile image
Thomas Werner

About 10 years ago on earlier Android devices. I worked on image processing apps and a couple games. Floating point calculations were super slow on Android devices back then, specifically for image processing and video game math. So step one was to convert those floating point calculations to integer calculations, which already gave a pretty decent performance boost. Step two was to rewrite those routines in C and call them using the Java Native Interface.

The next problem was garbage collector activity that would stall especially video game animations and game performance. So the optimization trick was to recycle all objects and arrays etc., so nothing would be ever garbage-collected while the game was running. So if the game had entities such as enemies, projectiles etc, I would use so-called "pools" for each entity type and retrieve them when needed, and put them back again when done.

sjmulder profile image
Sijmen J. Mulder

Pools are great for this sort of thing! I hear games using MonoGame using similar approaches, never allocating in the game loop. In C I just use static arrays for this kind of thing, saves a bunch of work on memory management too. In C++ you can use custom allocators that'll automate this behaviour for you.

bias profile image
Tobias Nickel

a while back, working on a SASS platform, I implemented something I named at the time "request caching". so caching requests instead of the results. Much later I found out this prwctise is named 'query batching' and it worked just like the Dataloader library from facebook. just that my solution worked with higher order functions, could seemless integrate with transactions and other contexts like pagination.

by adding this into the projects own ORM, the entire app got a perfornance boost.

by the way it was a time when node apps where made with callbacks and not even with promises.

by the way, tcacher still has some advantages over Dataloader. But it could be lifted to the age of ESM modules.

ralphhightower profile image
Ralph Hightower

I was part of a development team that were developing Microsoft ISAPI extensions in C++
for a web application.

  • Move invariant code out of loops. There was a utility function that returned the holidays for a year from a database. It was being called repeatedly while processing rows from a different database query.
  • Minimize database repetitions. There was an internally developed base class that retrieved rows from SELECTs on tables. The base class first performed the query and then processed the rows, and performing another SELECT for the row that was returned. The base class to retrieve rows already has the data. Why SELECT the row again? I rewrote the base SELECT rows class and modified the classes that used the base class.
thumbone profile image
Bernd Wechner

Ironically, I've not done much performance tuning/refactoring in the 30+ years I've been developing or maintaining software (except perhaps for caching, as Shai Almog says ... caching!).

But my most memorable and lauded work has often been quite the opposite.

That is, actively slowing things down or at least discarding performance as a criterion in order to pursue competing goals (generally, maintainability, the cost or even facility of maintenance).

I'm not the only person to have landed on a project that was a black box because no-one, since implementation, wants to touch it. Anyone who's looked at it saw a house of cards, a mysterium of complexity and fled. The risks of making changes or the costs of a complete replacement both judged too high ... Just leave it be, if it ain't broke don't fix it.

But then a rewrite is budgeted, mainly because new hardware is bought with new peripherals, firmware and OS etc... And so this black box needs porting which equates to a rewrite.

A deep analysis of the thing to be rewritten begins the job, teasing it apart, building an internal documentation of the old, an internal spec, all the things lost to time in this legacy system. Then a rewrite, but often the main goal this time is not to land here again, but to have software that can be maintained, enduring staff turn over. And with that goal eclipsing performance, with a new generation of hardware providing enormous performance gains, a lot of complicated and difficult to describe optimisations in the old software, on the old hardware are tested against a simpler implementation on the new hardware ... scrutinising performance and accuracy and precision (I have always worked in the engineering and science realms). If the new is not significantly slower than the old or if, on the new hardware, is still faster, then with its clarity of code, internal documentation and interface specifications, it is the winner.

The result: performance optimisations removed in favour of simpler code that can be maintained into the future and evolve incrementally unlike the black box that just burst from its bubble.

sjmulder profile image
Sijmen J. Mulder

As for actual examples, there's two that come to mind:

  • To temporarily replace Google reverse geolocation ("what city is this coordinate?") we exported data from OpenStreetMap to a file as binary records (e.g. 20 bytes for the name, 2 8-byte floats for coords, etc) and memory-mapped it to C# structs. Then used a simple Linq MinBy query checking squared distances. The JIT generated really tight vectorized code for this and it was super fast.
  • Did a POC for a pension fund rewriting some of the policy projection calculations in OpenCL to run on a graphics card. The original C# code wasn't bad but GPU performance was just another level!
jfrenchtweet profile image
Jeremy French

I once managed a 1000x speed up from some code. The language had two similar types lists and arrays.

Lists had some features not available to arrays but we're basically wrappers around strings. So doing any operation like sorting them involved a lot of string manipulation.

One feature was talking over a minute to run and it was due to this list processing. All of the things inside it could be done with arrays. So I tried changing it. Converting to and from lists on either side of the system. I was hoping it would run in less than thirty seconds but it came in as fractions of a second.

I guess the moral is know your data types and how they are handled internally.

ralphhightower profile image
Ralph Hightower


  • Loop optimization: moving invariant code out of loops. One instance was retrieving holidays from a database table.
  • Eliminate retrieving data twice. There were internally developed base class for retrieving a row, and rows from database tables. The class that retrieved rows, first retrieved rows from the table, using the rows returned and then retrieved the the rows individually again to create the list. Why do that all over again? You've already got the data.
sjmulder profile image
Sijmen J. Mulder • Edited
  • Use a profiler
  • Do less work
    • Simplify and cut out the cruft. Avoid unnecessary abstractions or steps.
    • Don't repeat work (e.g. duplicate database queries)
    • Don't ask more than needed (DB queries are again a good example)
    • Preprocess what you can, e.g. take work out of the hot loop and do it once it advance, create lookup tables, etc.
  • Make the work faster
    • Lay out your data in a way that's convenient for the processing (data-oriented programming). Array in, loop, array out tends to play well with how CPUs and memory work.
    • Avoid indirection. That's pointers, virtual methods, string matching, etc.
vishwassingh47 profile image
Vishwas Singh

When I joined my recent company, I was given the task to improve the performance of a Socket Server ( + Nodejs).
At that time we were only able to handle 2K concurrent users, after that our EC2 instance used to go down due to High CPU.

We were doing some API call whenever the users connects to the Socket Server. Doing lots of parallel API calls was resulting in High CPU as HTTP 3 way handshake is CPU intensive.

Then instead of making API calls we decided to do DB queries on the Socket Server only, eliminating the REST call.

Then we started queuing requests in a local IN Memory queue and were processing multiple users request in a Single DB Batch Query (one DB query per 500 requests).

Then we used Redis to horizontally scale these socket servers.

Now we are handling 35K concurrent users and have tested out Service to handle 100K concurrent connections per ec2 instance (1CPU x 2GB RAM).

siddharthshyniben profile image

This is oddly specific but for some reason Array.from is faster than a spread operator on strings in JavaScript.