DEV Community

Cover image for String Compression. Facebook interview question.
Akhil
Akhil

Posted on • Updated on

String Compression. Facebook interview question.

Question: Design a system that will compress the string and then decode it. The input will consist only of alphabet characters.

Eg: if the string is: aaabbcccccd
1> it's compressed form ie encoded form will be a3b3c5d1.
2> then we have to take the compressed string and decode it.

note: this is a string manipulation question and is meant to test your string manipulation skills, don't confuse it with compression algorithms since there are much better compression algorithms.

Imagine a system similar to this :

String "Richard is great but you know" is compressed to RIGBY which both the client and server know what that string represents.

One can use it for their projects:
Alt Text

Enconding the string / compression

The algorithm is pretty straight forward, keep a pointer and move ahead. Count the number of times a character repeats and append the character and it's count to a result string.

var encode = function(string){
  let chars = string.split("");
  let res = "";
  for(let i=0;i<chars.length;i++){
      let count = 1;              //default count = 1 since at least 1 character
      let char = chars[i];        //the character at that pointer
      while(i<chars.length-1 && chars[i] == chars[i+1]){
        count++;                  // increment the count
        i++;                      // increment the pointer
      }
      res += char + count;        // append to resultant string.
  }
  return res;
}
Enter fullscreen mode Exit fullscreen mode

Decoding the string

Decoding the string is also similar, we keep a pointer, get the character, get its count, generate character and append it to the resultant string.

var decode = function(string){
  let res = "";
  for(let i=0;i<string.length;){
    let char = string[i];
    let count = 0;
    i++;
    while(i<string.length && parseInt(string[i])>=0 && parseInt(string[i])<=9){
      count = parseInt(count * 10) + parseInt(string[i]);
      i++;
    }
    while(count>0){
      res+= char;
      count--;
    }
  }
  return res;
} 
Enter fullscreen mode Exit fullscreen mode

Putting them together:

var encode = function(string){
  let chars = string.split("");
  let res = "";
  for(let i=0;i<chars.length;i++){
      let count = 1;
      let char = chars[i];
      while(i<chars.length-1 && chars[i] == chars[i+1]){
        count++;
        i++;
      }
      res += char + count;
  }
  return res;
}

var decode = function(string){
  let res = "";
  for(let i=0;i<string.length;){
    let char = string[i];
    let count = 0;
    i++;
    while(i<string.length && parseInt(string[i])>=1 && parseInt(string[i])<=9){
      count = parseInt(count * 10) + parseInt(string[i]);
      i++;
    }
    while(count>0){
      res+= char;
      count--;
    }
  }
  return res;
} 

let compress = encode("aaabbbccccccccccccd");
console.log(compress);
let res = decode(compress);
console.log(res);

Enter fullscreen mode Exit fullscreen mode

That's it, now you know how to solve a Facebook interview question and a bit of introduction towards compression.

github : https://github.com/AKHILP96/Data-Structures-and-Algorithms/blob/master/problems/stringCompression.js

Special thanks to the community for pointing out bugs and edge cases !! 🥰

Discussion (14)

Collapse
elmuerte profile image
Michiel Hendriks

For plain text a Lempel-Ziv (LZ) approach is better than a Run Length Encoding (RLE).

In normal text you rarely have concecutive characters, but really often repeating segments. This is where LZ shines.

RLE is easier and cheaper to implement, as you don't need to keep a dictionary around. So less memory needed.

Collapse
akhilpokle profile image
Akhil Author

True that ! this question is meant for testing string manipulation skills and not to test compression algorithm skills.

I shall look into LZ approach though ! thanks for the info!

Collapse
hanpari profile image
Pavel Morava • Edited on

The assumptions mentioned on the start are certainly incomplete.

string = "12555...aaaa"

Decoding this will break the encoding function.

Putting that aside, I would suggest rethinking your decode function as it is too complex and inefficient. This is my quick code with one iteration through the string; it builds the result from the final array.


let decode = (line) => {
  let xs = [];
  let last = undefined;
  for (const char of line) {
    if (last === char) {
      // I could not believe my eyes seeing this works!!!
      xs[xs.length - 1]++;
    } else {
      xs.push(char);
      xs.push(1);
      last = char;
    }
  }
  return xs.join('');
};

console.log(decode('aaabbcdddddddddddddddddddaa'));
//prints a3b2c1d19a2


Under normal circumstances and in a normal programming language, it should be the most effective approach. With Javascript I am not sure of anything. :)

Collapse
akhilpokle profile image
Akhil Author

Yep, the function can only take string with characters as a input for encoding, with integers it becomes quite complicated but doable by using some special characters like "$" or "#" and it must be guaranteed that the said special character won't be used.

Your method is actually a much smarter way of achieving it! Thanks for the tip :)

Collapse
hanpari profile image
Pavel Morava • Edited on

Thank you.

I have mixed feelings when being complimented over this code since I've abused loose type coercion of Javascript when joining characters and numbers, not to mention the nasty xs[xs.Length-1]++ thing; both should be used as an example of how dangerous the language is.

As for your first paragraph, there is no need for any guarantees. In such cases, some escape sequence is typically used. It is similar to "one line \n second line" vs "one line \\n no second line". You just have to double your special character.

Collapse
elasticrash profile image
Stefanos Kouroupis

Even if this is not a true compression, the word compression though implies less characters (or the same) so you could make an assumption that for characters repeated less than 3 times there is no need to compress them. so

  • asdf will stay asdf
  • aasdff will stay aasdff
  • aasssdfff will become aas3df3

and when compressing numbers you could use a token approach

Collapse
akhilpokle profile image
Akhil Author

Yep, we could compare if the "compressed" version is indeed shorter and if it's shorter then use that one.

Collapse
devguyky profile image
devguyky

Good post although your decoder assumes that there will never be more than 9 sequential characters. It would throw an error with an input string of a3b2c11d1

Collapse
akhilpokle profile image
Akhil Author

Thanks for pointing out the bug!

Collapse
slax0rr profile image
Tomaz Lovrec • Edited on

Using this, encoding a string like 'asdf' will produce a string that is twice the size, since 'asdf' will become, 'a1s1d1f1'.

Collapse
akhilpokle profile image
Akhil Author

Yep, it's efficient when a string contains multiple repeated characters but still, it won't guarantee the shortest string. Eg: is the string is Mississippi, compressed will be: m1i1s2i1s2i1p2i1 which is actually longer, even though some of the characters are repeating.

This question serves as a general idea towards why we go for compression, I will write about complex topics like Huffman coding in the future.

Collapse
fejese profile image
fejese

How about decoding a10?

Collapse
akhilpokle profile image
Akhil Author

Nice catch !! just fixed it. Thanks a lot for pointing that out !

Collapse
tareq12345 profile image
tareq12345 • Edited on

What if i changed the while loop to
let char = chars[i]
If(i < chars.length - 1 && chars[i] == chars[i+1]){
Count++
}
else{
count = 1
res+=char + count
}