DEV Community

loading...

String Compression

codewithml profile image CodeWithML Updated on ・2 min read

Problem statement : Write a function to Compress a given string. Only return the compressed string if it saves space.

Difficulty level: Easy


Test Cases

  • 'AAABAACCDDDD' ---> 'A3BA2C2D4'
  • 'BAAACCDDDD' ---> 'BA3C2D4'
  • 'AABBCC' --> 'AABBCC'
  • 'BBBAACCCCDD' --> 'B3A2C4D2'

Algorithm

  • Setup a pointer at first character of string, initialise an empty string and a counter.
  • Iterate through the string and check if character is equal to the first character of the string.
    • if both are equal increment counter.
  • Else,

    • add the first character and it's count to result string.
      • if count is greater than 1 then only add count to the result string.
      • Else, add an empty string to the result string.
    • now, the first character will be the current character.
    • reset the counter value to 1.
  • Add the last character and it's count to result string.

  • Return the resultant compressed string if it's length is less than the given string, else return the given string.

Time and Space Complexity

  • Time complexity: O(n)
  • Space complexity: O(n)

Code

class CompressString(object):
    def compress(self, input):
        if input is None or not input:
            return input
        result = ''
        count = 0
        prev_char = input[0]
        for char in input:
            if char == prev_char:
                count += 1
            else:
                result += prev_char + (str(count) if count > 1 else '')
                prev_char = char
                count = 1
        result += prev_char + (str(count) if count > 1 else '')
        return result if len(result) < len(input) else input

More precise compression

class CompressString(object):
    def compress(self, input):
        if input is None or not input:
            return input
        result = ''
        count = 0
        prev_char = input[0]
        for char in input:
            if char == prev_char:
                count += 1
            else:
               if count > 2:
                    result += prev_char + (str(count))
                    prev_char = char
                    count = 1
               elif count == 1:
                    result += prev_char
                    count = 1
                    prev_char = char
               else:
                    result += prev_char
                    result += prev_char
                    count = 1
                    prev_char = char
        result += prev_char + (str(count) if count > 2 else prev_char)
        return result if len(result) < len(input) else input

Test

import unittest
from compressString import CompressString


class TestCompress(unittest.TestCase):

    def testCompress(self, func):
        self.assertEqual(func(None), None)
        self.assertEqual(func(''), '')
        self.assertEqual(func('AABBCC'), 'AABBCC')
        self.assertEqual(func('AAABCCDDDDE'), 'A3BC2D4E')
        self.assertEqual(func('BAAACCDDDD'), 'BA3C2D4')
        self.assertEqual(func('AAABAACCDDDD'), 'A3BA2C2D4')
        print('Success: testCompress')


def main():
    test = TestCompress()
    compress_string = CompressString()
    test.testCompress(compress_string.compress)


if __name__ == '__main__':
    main()

Original article

Github solution

Happy Coding !

Discussion (5)

pic
Editor guide
Collapse
hanpari profile image
Pavel Morava

Here a little trick you may find useful

from itertools import groupby
"".join(f"{i}{len(list(j))}" for i, j in groupby("AAABBAAA"))
>>> 'A3B2A3'
Collapse
codewithml profile image
CodeWithML Author

I didn't know that, looks really neat. Thanks for the tip. 🌟

Collapse
hanpari profile image
Pavel Morava

You are welcome. Another little tip. I would get rid of the class completely being you.
Python can have independent functions, I would suggest to use them :)

Collapse
grodzickir profile image
Ryszard Grodzicki

Simple improvement proposal: you could remove the last check by replacing chars only if there are more than 2 in a row.
Replacing 'CC' with 'C2' never saves space. Replacing more than 2 characters always saves space. So you would either return the same input string or return compressed string that's shorter than original.

Collapse
codewithml profile image
CodeWithML Author

I could do something like if count is more than 2 than add the character with count, if count is 1, than only add the character , and lastly if count is exactly 2, then add the character 2 times, so it would remain same as original string.