DEV Community

Cover image for Optimise your String Algorithms in Java
Taosif Jamal
Taosif Jamal

Posted on

Optimise your String Algorithms in Java

While practicing DSA String problems, one thing I stumbled upon many times was trying to find alternatives to make my algorithm faster at some scales. Thus I explored different approaches, and during String questions I had to choose between one way or other so I used to check time & space complexities of operations on string. It took me few google searches to find a satisfactory explaination, but for each operation I gotta google.

Hence to relieve you from this pain, and to save your time: I am listing Time & Space complexities of various operations on string with straightforward explainations. Note that this is all with respect to Java SE 7 or v1.7

Instantiation — new String(“a”)

Time Complexity: O(n)

It creates a char array and fills each character of string in array.

Concatenation — “a” + ”b”

Time Complexity: O(mn)

Every time you concat a string a new buffer is created and the contents are copied over.

Strings are immutable in Java.

Example: Lets say you concatenate "abc + "def"
Under the hood, Java is performing these operations:
• Construct copy of "abc" with additional length; 
• Copy "d" to new array
• Construct copy of "abcd"
• Copy "e" to new array
• Construct copy of "abcde"
• Copy "f" to new array
So O(m) called n times, thus time complexity is O(mn)
Enter fullscreen mode Exit fullscreen mode

Space Complexity: O(mn)

Better Approach: StringBuilder

Use StringBuilder with .append() method, It creates internal buffer that expands only on demand. Its time complexity is almost O(m+n).
String.concat is another approach, but slower than StringBuilder.

.length()

Time Complexity: O(1)

Because string holds a counter variable.

.equals()

Time Complexity: O(n)

In most cases, its actually faster than O(n), because it checks for some things before hand like length of string is not equal then instantly return false. Also it linearly compares upto first non equal character only, but in the worst case its O(n)

Space Complexity: O(1)

.charAt()

Time Complexity: O(1)

It is because in Java, Strings are implemented using char array, and random access is in array’s nature. Hence character at index can be accessed randomly in O(1).

.substring()

Time Complexity: O(n)

As mentioned before, strings are immutable in Java. Hence for creating a substring of length n, a new string instance needs to be created. Under the hood, java is actually creating a char array of length n and copying elements from original string to new one.

Space Complexity: O(n)

It’ll need space to store contents of array.

.toCharArray()

Time Complexity: O(n)

It should be O(1) since string is nothing but a char array, right? Yes, but no. toCharArray method copies the contents of string to a new char array, hence O(n).

Space Complexity: O(n)

Obviously, it’ll need space to store contents of array.

.contains() and .indexOf()

Time Complexity: O(mn)

.contains() calls .indexOf() method and returns true if index > -1.
It is quite surprising that java uses a naive method (Loop inside loop) for finding a string in another. There are several algorithms out there that do the same job in O(n) like KMP, but for them there’s overhead space and time cost as well. So engineers at Sun/Oracle must have empirically tested various algorithms and decided that naive method works best on average for all kind of scenarios.

Space Complexity: O(1)

.replace(char, char)

Time Complexity: O(n)

It basically goes through each character in string, and replaces it with new character. Although with some optimisations

Space Complexity: O(n)

Since strings are immutable, new string needs to be created

.replace(regex, replacement) and .replaceAll()

Time Complexity: Oh(its complicated)

Whether your regex string is simple string like “red” or its a complex regular expression pattern, this function under the hood uses Patterns class to match the character and then replace it. Lets try to understand the nature of time complexity here.
Suppose your regex string is he(ll|ter|r)o
It can match hello, hetero, hero from your string.

The Regex Engine will go through the string, and if it encounters he it’ll intstantly match it, immediately after that it’ll try to match ll if found, it’ll next find o and if o is not found, it’ll go back and try to match ter and you get the point.

Well, Time complexity of this function is not in our control, but we can optimise our Regex to perform well at matching. From previous example, we can do (hello | hetero | hero) So it reduces back tracking by regex engine to optimise search function and ultimately replace function. Here’s a good resource if you want to learn more about optimising regex patterns.
Bonus: I found a tool to Visualise Regex

Space Complexity: O(n)

Remember we can’t mutate a string in Java?

.split(separator)

If Separator is single character and not in “.$|()[{^?*+\”:

Time Complexity: O(n)

Space Complexity: O(n)

If Separator string is more than one character, it is compiled into a Regex Pattern by Java for finding index to split, and as discussed in .replace() section, It’s time complexity depends on nature of pattern

Time Complexity: Oh(its complicated)

Space Complexity: O(n)— Because y’kno immutable

.toLowerCase() and .toUpperCase()

Time Complexity: O(n)

Space Complexity: O(n)

Straightforward, each character of string is checked one by one, and new string is stored separately


That’s all Folks. I hope this article will be helpful to you to optimise your string functions. Did I miss any essential function? let me know if any.
Thanks.

Oldest comments (3)

Collapse
 
naucode profile image
Al - Naucode

Thanks, it was a good read, bookmarked, and followed!

Collapse
 
cicirello profile image
Vincent A. Cicirello

Your time and space complexity for concatenation is not correct. For concatenating 2 strings of lengths n and m, it is just O(n + m). Your example of what happens during a concatenation is not correct. Concatenating "abc" + "def" is not implemented as 3 separate concats. That would be terribly inefficient. During a single concat, both lengths are known. A new string of length n+m is allocated and the characters of each copied into it.

Using a StringBuilder to concat 2 strings is likewise O(n + m). The behavior is a bit different though. StringBuilder is implemented as a partially-filled array. The purpose is to speed things up if you are doing many concats. But for doing a single concat of 2 strings of lengths n and m, the time complexity of both the concat operator + as well as using a StringBuilder is the same, although for that simple case the StringBuilder will be slightly slower due to some overhead.

A StringBuilder is only faster if you need to combine many strings. The amortized runtime of concatenating X strings with a StringBuilder is O(sum of the lengths of all the strings). The partially filled array approach doubles the size of the buffer whenever it is full to achieve this amortized runtime.

Using + repeatedly instead would be worse than that because every concat with + requires allocating a new string. The specific runtime depends on number of concats and lengths of strings. For simple example, concatenating n strings of length 1 with n-1 concats will run in O(n^2), whereas calling append on a StringBuilder n times to do the same will be O(n).

But the basic case of 1 concat of 2 strings lengths n and m, both + and StringBuilder are O(n+m) with StringBuilder having higher constant factors that O() hides.

Collapse
 
cicirello profile image
Vincent A. Cicirello • Edited

Your commentary on runtime of contains and indexOf is not entirely correct. Checking if a string of length n contains a string of length m with the naive approach has a worst case runtime of O(m(n-m+1)). Actually finding all matches has that same worst case runtime. You can decrease the likelihood of that worst case substantially with some other algorithms. But you can't actually do any better in worst case. So it isn't that surprising if they are using naive approach. The benefit of the alternatives would be minimal at best for the typical usage of those methods.