DEV Community

Cover image for String Compression. Facebook interview question.

String Compression. Facebook interview question.

Akhil on April 25, 2020

Question: Design a system that will compress the string and then decode it. The input will consist only of alphabet characters. Eg: if the string...
Collapse
 
elmuerte profile image
Michiel Hendriks

For plain text a Lempel-Ziv (LZ) approach is better than a Run Length Encoding (RLE).

In normal text you rarely have concecutive characters, but really often repeating segments. This is where LZ shines.

RLE is easier and cheaper to implement, as you don't need to keep a dictionary around. So less memory needed.

Collapse
 
akhilpokle profile image
Akhil

True that ! this question is meant for testing string manipulation skills and not to test compression algorithm skills.

I shall look into LZ approach though ! thanks for the info!

Collapse
 
elasticrash profile image
Stefanos Kouroupis

Even if this is not a true compression, the word compression though implies less characters (or the same) so you could make an assumption that for characters repeated less than 3 times there is no need to compress them. so

  • asdf will stay asdf
  • aasdff will stay aasdff
  • aasssdfff will become aas3df3

and when compressing numbers you could use a token approach

Collapse
 
akhilpokle profile image
Akhil

Yep, we could compare if the "compressed" version is indeed shorter and if it's shorter then use that one.

Collapse
 
akhilpokle profile image
Akhil

Yep, the function can only take string with characters as a input for encoding, with integers it becomes quite complicated but doable by using some special characters like "$" or "#" and it must be guaranteed that the said special character won't be used.

Your method is actually a much smarter way of achieving it! Thanks for the tip :)

Collapse
 
devguyky profile image
devguyky

Good post although your decoder assumes that there will never be more than 9 sequential characters. It would throw an error with an input string of a3b2c11d1

Collapse
 
akhilpokle profile image
Akhil

Thanks for pointing out the bug!

Collapse
 
fejese profile image
fejese

How about decoding a10?

Collapse
 
akhilpokle profile image
Akhil

Nice catch !! just fixed it. Thanks a lot for pointing that out !

Collapse
 
slax0rr profile image
Tomaz Lovrec • Edited

Using this, encoding a string like 'asdf' will produce a string that is twice the size, since 'asdf' will become, 'a1s1d1f1'.

Collapse
 
akhilpokle profile image
Akhil

Yep, it's efficient when a string contains multiple repeated characters but still, it won't guarantee the shortest string. Eg: is the string is Mississippi, compressed will be: m1i1s2i1s2i1p2i1 which is actually longer, even though some of the characters are repeating.

This question serves as a general idea towards why we go for compression, I will write about complex topics like Huffman coding in the future.

Collapse
 
tareq12345 profile image
tareq12345 • Edited

What if i changed the while loop to
let char = chars[i]
If(i < chars.length - 1 && chars[i] == chars[i+1]){
Count++
}
else{
count = 1
res+=char + count
}