DEV Community

Cover image for Check if all characters in a string are unique - Java edition
Perry H
Perry H

Posted on • Edited on

Check if all characters in a string are unique - Java edition

Code challenges can be fun and I like doing them periodically to stay mentally sharp. The problem and solution are not unique; however, when I was learning, I appreciated posts like these. Seeing how others solved the problem helped me understand how to think when approaching problems.

The Problem:

Implement an algorithm to determine if all characters in a string are unique.

Solution 1:

As I started thinking about the problem, I remember reading about using sets to remove duplicates. Thinking along these lines, I came up with the following solution:

public static boolean isUnique(String str) {
    Set<Character> chars = new HashSet<>();
    // Load the Set
    for(int i = 0; i < str.length(); i++){
    // A set will not add a duplicate, in Java the add method
    // will simply return false if a duplicate is found
    chars.add(str.charAt(i));
    }
    // If the size of the Set is equal to the string length,
    // we know that all chars are unique. 
    return chars.size() == str.length();
}
Enter fullscreen mode Exit fullscreen mode

This solution has a time complexity of O(n) where n is the length of the string because we have to loop through the string to load the Set.
The space complexity is O(n) where n is the set size.

Another Attempt:

The above solution felt a little like cheating to me, so I thought I would try an approach you would more likely see in blogs or books.

The below approach assumes the string is extended ASCI. To understand how this problem works, it helps to understand ASCI and its variations.

ASCI characters can be represented in a number of ways, including a 7 bit or 8 bit binary number. Initially, it used only 7 bits in an 8 bit number with the left most bit being 9. Later, it was extended to use all 8 bits. There are tables (like this one) that show the mapping between the binary number, its decimal counterpart, and the character represented. You can see in the link above, the first table's decimals goes to 128, and the extended table goes to 256.

Why is this significant? A string with only distinct characters would not have a length beyond that of the ASCI set. We can leverage this to our advantage by creating a data structure with a size of 256. Using this data structure we will add characters to the structure. If we find that the character already exists in the structure, we know the string does not contain only unique characters. The structure used below is a Boolean array.

Additionally, we need to check that the length is not greater than the ASCI set size (in this case we use 256 for extended ASCI). This is unlikely but is an excellent example of checking for an edge case.

public static boolean isUnique(String str) {
    // string length must be less than ASCI 256
    if (str.length() > 256) {
         return false;
    }

    boolean[] chars = new boolean[256];
    for(int i = 0; i < str.length(); i++){
    // casting to an int to check against our array
    int val = (int) str.charAt(i);
    // if we find this in our array, than it is a duplicate
    if (chars[val]) {
        return false;
    }
    // if the number is not in the array, mark true
    chars[val] = true;
    }
    // no duplicates found
    return true;
}
Enter fullscreen mode Exit fullscreen mode

The time complexity for the above solution is O(n) where n is the length of the string. The space complexity will be O(1) because the array will always be the same size.

Another version:

The above attempt is solid and I'm sure many text books would recommend. However, I like declarative code and wanted to see if writing it in a declarative style would clean it up a bit.
The below solution uses Java Streams.

public static boolean isUnique(String str) {
    return str.chars().distinct().count() == str.length();
}
Enter fullscreen mode Exit fullscreen mode

The chars method creates an IntStream of the character's Unicode codepoint value. We then use the distinct method to create a new stream with only distinct values. Utilizing the count method will allow us to compare stream size and string length. This is similar to the first solution where we compared the set size with the string length.

The time complexity here should be O(n) where n is the string input length. None of the stream operations will grow longer than the string length.

The storage of a stream is backed by the collection of the Stream source. Pipeline streams do not store anything. Therefore the space complexity will be 0(n) where n is the size of the sequence of characters in our string.

There are my attempts at solving this challenge in the best way I can. I hope that this can help someone else improve their problem-solving skills and become better coders. Please let me know if there are any other ways to solve this problem or if there are any ways to improve on my solutions.

Photo by Zac Frith: https://www.pexels.com/photo/red-blue-and-orange-umbrella-lot-918781/

Edit

I was reviewing this code and some other solutions and stumbled upon another solution that does not use an extra data structure.

public static boolean isUnique(String str) {
    char[] chArray = str.toCharArray();

    // Arrays.sort() uses binarySort in the background
    // for non-primitives which is of O(nlogn) time complexity
    Arrays.sort(chArray);

    for (int i = 0; i < chArray.length - 1; i++) {
        // if the adjacent elements are not
        // equal, move to next element
        if (chArray[i] != chArray[i + 1])
            continue;
        // if at any time, 2 adjacent elements
        // become equal, return false
        else
            return false;
    }
    return true;
}
Enter fullscreen mode Exit fullscreen mode

The above solution's time complexity is O(nlogn) due to the sort and the auxiliary space is O(1).

Top comments (0)