Horace FAYOMI

Posted on Mar 18, 2023

Python: How simple string concatenation can kill your code performance

#python #tutorial #beginners #performance

Note: a video version of this tutorial is available Python: How simple string concatenation can kill your code performance

In general, there is no problem with string concatenation

String concatenation is one of the first and most used features in programming. You learn variables basics, integers, booleans, and strings. It’s even one of the first things we learn in the python official Documentation:

So yes, string concatenation is not bad, and it might even be impossible to do programming without needing to concatenate strings for one reason or another.

Although, yes, like almost any useful tool or feature, when used the wrong way it can lead to really bad effects. So it’s important to be aware of how to use them correctly.

When string concatenation could be dangerous?

You should pay attention to string concatenation when those two conditions are met:

The feature is a performance-critical one. (Example: a text data analysis feature on a web application that is running, and a user in the website interface is waiting to get back the result, so your code should be as fast as possible)
You’re dealing with a lot of concatenation (starting from 10000) because of a loop maybe, and/or the strings you’re concatenating are really really long.

That can lead to significant overhead in terms of memory allocation and garbage collection, slow down your code, and in the worse case just make your code crash in production 😬.

And, that also means it's totally safe to use string concatenation when those conditions are not met, and you’re sure that, these conditions can never be met in the future. So for instance:



firstname = "John"
lastname = "Joe"
fullname = fistname + lastname

is totally okay because there is no way someone’s name can become long enough to create a performance issue. Anyways, not in this world 😅.

Why string concatenation is dangerous in those cases?

In Python, strings are immutable objects, which means that every time a string is concatenated with another string, a new string object is created in memory.

To illustrate that let's run this simple code:



my_str = "Hello"
print(f"my_str: {my_str}")
print(f"id = {id(my_str)}")
my_str += " World"
print(f"\n my_str: {my_str}")
print(f"id = {id(my_str)}")

Note: Python id() function returns the identity of an object. The identity of an object is an integer, which is guaranteed to be unique and constant for this object during its lifetime.

The output will be:



my_str: Hello
id = 140575060484848

my_str: Hello World
id = 140575060532336

As you can see the id of the string before the concatenation and after are not the same so python has created a new string.

The solution to that issue and a comparison with string concatenation

To address this performance issue, you can use alternative techniques such as using a list to hold the individual string components and then joining them together using the join method.

Python doesn’t create a new list each time you append an element, and the join method is optimized to avoid creating a new string object for each string in the list it has to join.



import timeit

# Concatenate strings
start_time = timeit.default_timer()
s = ''
for i in range(10000):
    s += 'a'
elapsed_time_concat = timeit.default_timer() - start_time
print('Time taken for string concatenation:', elapsed_time_concat)

# Join strings
start_time = timeit.default_timer()
lst = []
for i in range(10000):
    lst.append('a')
s = ''.join(lst)
elapsed_time_join = timeit.default_timer() - start_time
print('Time taken for string join:         ', elapsed_time_join)

join_performance = (elapsed_time_concat - elapsed_time_join) / elapsed_time_concat
print('Performance gained with join', join_performance)

In this example, we generate a string of length 10,000 by concatenating the string a 10,000 times using both the string concatenation and the join method. The output shows that the join method is significantly faster than the concatenation method:



Time taken for string concatenation: 0.0019936899916501716
Time taken for string join:          0.0011655330017674714
Performance gained with join 41.53890491255549 %

As you can see, by using join we have increased the time performance by almost 42% which is wonderful.

Diving deep into the why: Math and a Big(O) explanation

This behavior could also be explained using BigO notation especially, the time complexity.

Let’s focus on the concatenation part of our previous code.

We have a string s that we concatenate M times with a string c of length on length N.



s = ''
c = 'c'
for i in range(10000):
    s += c

So here N=1 and M = 10000

Keep this in mind:

The complexity to create a string of length N is O(N).
So because concatenation creates a new string
of length len(s+c) , the complexity of the concatenation at each iteration is the length of s at the previous iteration + length of string c.

As you can see the total, or overall complexity is 1 + 2 + 3 + 4 + 5 +…+ M .
which we can mathematically approximate at (M(M+1)) / 2 .
And by neglecting the minority exponents we have:

Let’s try to so see the time complexity with the join approach:



lst = []
c = 'c'
for i in range(10000):
    lst.append(c)
s = ''.join(lst)

In this case, we have two steps:

The loop to create the list



for i in range(10000):
    lst.append(c)

The complexity here is O(M), and in our case M=1000 because python has to execute lst.append(c) M times.

The join s = ''.join(lst) Here the complexity is O(T) where T = total length of the final string after the join and in our case T = N * M .

So the final time complexity is NM + M = M (N + 1) and we can approximate it as O(MN).

Which is far better than O(M**2) for the simple concatenation.

That is for this tutorial. I hope you understand when and why string concatenation could be dangerous so you can use join instead when it’s needed.

Thanks for reading, don’t forget to checkout the video version on our youtube channel here and I see you at the next one.

Take care.

Top comments (2)

Mikhail Vasilyev • Jul 14 • Edited

Well, oddly enough, but (at least in CPython) benign s+='s' is actually O(N) - github.com/mickvav/string_mystery

La Shortlist • Apr 7 '23

Really interesting. Thanks !

DEV Community

Python: How simple string concatenation can kill your code performance

In general, there is no problem with string concatenation

When string concatenation could be dangerous?

Why string concatenation is dangerous in those cases?

The solution to that issue and a comparison with string concatenation

Diving deep into the why: Math and a Big(O) explanation

Top comments (2)

Read next

A Power-Filled IDE for Neovim with Sane Defaults

Types: char and boolean

Introduction to Git: A Powerful Version Control System

JavaScript vs. Deno, Vite 6, Astro 5, and more