DEV Community

Cover image for Performance Benchmarking: String and String Builder

Performance Benchmarking: String and String Builder

Kaleem on August 12, 2022

In this article, we will do a performance benchmarking of String and StringBuilder classes in Java and discuss how to modify strings efficiently. ...
Collapse
 
jmfayard profile image
Jean-Michel 🕵🏻‍♂️ Fayard • Edited

You are doing benchmark for 100k+ concatenations, and that's fine

But for me, the more interesting result would be: what is the limit for which the performance gap doesn't matter and we should use the cleaner API : String?

I've seen people using StringBuilder to avoid a few concatenation of small strings, and that's for me the pinnacle of premature optimization.

Collapse
 
cicirello profile image
Vincent A. Cicirello • Edited

@kaleemniz I agree with Jean-Michel here. Under-the-hood Java's StringBuilder is implemented as a partially filled array. Doing n appends to a partially filled array requires time that is linear in n. On the other hand, concatenating n equal length strings with + requires time that is quadratic in n since each concat requires filling an increasing length array (length 2 then 3 then 4 .... the sum of which is quadratic in n).

So it is no surprise that with huge n like you are using that the StringBuilder is faster. You don't need to time anything for that. Linear time is asymptotically faster than quadratic time. Big-O however hides the effects of low order terms and constants, etc since it is focused on what happens for large inputs.

Microbenchmarks of alternatives with asymptotically different runtimes is far more interesting for smaller input sizes to discover where the break even point is. If n is 2 for example, concatenating the 2 Strings with + is almost certainly faster than the overhead of creating a StringBuilder, as is likely the case for the next few n as well.

But where is the break even point? When does the StringBuilder actually become faster? Your lowest n is 100000. Which for the task, where you are comparing a linear runtime and a quadratic runtime alternative for the same task, may as well be infinity as it doesn't provide any more info than an asymptotic analysis.

I'd be interested to see what you'll find with small n and using a microbenchmarking framework. When is String concatenating with + faster than using StringBuilder and when does StringBuilder become faster?

Collapse
 
brkerez profile image
Pavel Trka • Edited

I think this is too general to make any type of rule as every situation is differnet. I'm using simple rule - use your intuition and micro-benchmark particular situation when in doubt ;)

To be little more specific - when I know that I'm adding string contactenation to the code which is guaranteed to be called often hundreds times per second and I'm not too concerned with worse readability, I will optimize the hell out of it. Good example was when I was writing logging wrappers - logging classes will process hundreds of thousands of strings from every part of application so every small piece matters.

But when I'm writing error message strings, email bodies sent from the code which is executed few times a minute I don't care and readability and maintainability is in the driver seat.

And with modern JDK the +/StringBuilder ratio shifted very much to using + sign almost all the time (depends of the type of application obviously).

Those were little bit extreme examples but that's the general way I'm approaching it.

Collapse
 
kaleemniz profile image
Kaleem

This is such a note-worthy point that I did not measure what is the pivot point of n = k where String Builder becomes faster than String.

Collapse
 
brkerez profile image
Pavel Trka

Not trying to be rude or smart-ass or anything but I may have some tips for better benchmarks ;)

This measurement has few problems so you may be not getting relevant results.

Also I would argue that relevance of micro-benchmarks are limited if you don't measure exactly the thing you're then using in real code. Isolated micro-benchmarks have of course purpose but can mislead as they may not tell you much about real situation where much more things are in play and modern compilers do not make it simpler as they introduce many tricks ;) In other words are you concactenating those numbers of such Strings in this loop in your real code? ;)

I'm not trying to tell that's the only way but I'm always micro-benchmarking with very narrow focus for some particular situation/problem where I need to decide which way to go and even then I'm always very careful about interpreting results.

some tips (see link at the end for deeper info and links) :

  • your benchmark has no warm-up phase, JVM needs it to eliminate class loading effects etc
  • since Java 9, String concatenation handling in JDK is more complicated under the hood than it seems and simple + sign is bad does not longer apply. See openjdk.org/jeps/280 and maybe some additional deep-dive in metebalci.com/blog/digging-into-je... (and many more discoverable by Goole search)
  • don't measure using System.nanoTime(), use some microbenchmark frameworks like JMH

Good tips on java microbenchmarking: stackoverflow.com/questions/504103...

Collapse
 
cicirello profile image
Vincent A. Cicirello • Edited

@kaleemniz there are a couple issues with your comparison. Check out Pavel's comment above on microbenchmarking frameworks. They handle the warmup phase that your comparison overlooks.

Also, the StringBuilder version isn't entirely fair. Ultimately if using a StringBuilder you'll eventually call toString, but yours does not.

I'd also rather see both versions have only n and the appended character as parameters. And instead of void, return a String. And then use a microbenchmarking framework.

Why this suggestion on returning a string and not passing the StringBuilder as a parameter? The version with repeated + is updating parameter variable which is not observable external to the method due to pass by value, and thus even the final string is subject to garbage collection. While in the StringBuilder version, the calls to append are changing state of the StringBuilder you passed, so those changes are externally observable.

Collapse
 
kaleemniz profile image
Kaleem

These resources are super helpful and great tips thanks for writing such a detailed response.

Collapse
 
hngvchnh1 profile image
Hưng Võ Chánh

I can't see the use of StringBuilder in 2rd code snippet.

Collapse
 
kaleemniz profile image
Kaleem

Thank you so much for pointing out, It was a copy paste mistake, fixed now.

Collapse
 
hngvchnh1 profile image
Hưng Võ Chánh

But after the use of StringBuilder, we usually call toString() method. Can you put it in the benchmark?

Thread Thread
 
dagnelies profile image
Arnaud Dagnelies

I tried to see if it makes a difference out of curiosity, but it does not really.

Collapse
 
kaleemniz profile image
Kaleem

Reading the invaluable responses here is the highlight - the highlights will be helpful if there is part two on this subject.

  • Use a Microbenchmarking framework like JMH to see realistic results.

  • StringBuilder has a more complex API than String, so it's worth identifying the pivot point "k" for which StringBuilder becomes faster than String, making it easy to decide whether to use String or StringBuilder.

  • Do not pass String and StringBuilder as method parameters; instead, create String and StringBuilder inside the test functions and at the and use toString() to return the response.

Collapse
 
dagnelies profile image
Arnaud Dagnelies

I guess what this boils down to is memory allocation. Using strings concatenation, new memory is allocated each time, while using StringBuilder the buffer approximatively doubles in size when needed.

Collapse
 
kaleemniz profile image
Kaleem

Rightly, modifying the String class creates a new String instance in the heap memory, which makes the execution of String append slow.