DEV Community

loading...
Cover image for Advanced splitting with Symbol.split

Advanced splitting with Symbol.split

aminnairi profile image Amin ・4 min read

Introduction

Splitting strings is a trivial operation in JavaScript with the help of the String.prototype.split method.

When comes the complexity of splitting a string and keeping the delimiter, the String.prototype.split method is of no use.

Or is it? We will see how we can still use the String.prototype.split method by extending its capabilities.

Problem

Let's say we want to split a string given a delimiter.

const string = "/home/user/code/website/Dockerfile/";
const delimiter = "/";
const result = string.split(delimiter);

console.log(result);

// ["", "home", "user", "code", "website", "Dockerfile", ""]
Enter fullscreen mode Exit fullscreen mode

As we can see, this works, and it was pretty easy.

There are some empty strings at the beginning and the end of the result, but nothing crazy.

But what if we wanted to keep the delimiters? There is no options in String.prototype.split to do that unfortunately.

Solution

One solution would be to simply create the function that will take care of splitting a string, while still keeping the delimiter int the result.

I'll use the Array.prototype.reduce method for that part.

const splitWithDelimiter = (delimiter, text) => {
  const characters = [...text];

  const toSplitWithDelimiter = (parts, character) => {
    // If we hit the delimiter in the characters
    if (character === delimiter) {
      return [...parts, delimiter];
    }

    // If the last part is the delimiter
    if (parts[parts.length - 1] === delimiter) {
      return [...parts, character];
    }

    // Every parts except the last one
    const inits = parts.slice(0, -1);

    // The last part
    const tail = parts[parts.length - 1];

    return [...inits, (tail || "") + character];
  };

  return characters.reduce(toSplitWithDelimiter, []);
}

const string = "/home/user/code/website/Dockerfile/";
const delimiter = "/";
const result = splitWithDelimiter(delimiter, string);

console.log(result);

// ["/", "home", "/", "user", "/", "code", "/", "website", "/", "Dockerfile", "/"]
Enter fullscreen mode Exit fullscreen mode

The inner logic itself is not very important. I'm sure there are numerous ways for achieving this result.

What is important is that our function takes a delimiter and a string as its parameters, and returns a split of all the delimiters and the parts together. There we have it, the solution to our problem.

Not only it addresses the problem, but it is reusable, and testable.

Idiomatic solution

But what if I told you that you can achieve a similar result, while still leveraging the String.prototype.split method?

I know, I said that the String.prototype.split is not capable of splitting a string while keeping the delimiter, but that is not entirely true.

In fact, there is a special well-known symbol in JavaScript called Symbol.split that can help us. This is like a proxy for the String.prototype.split method that, when used, allows us to completely hijack the method and call our own logic instead.

If you did not understand the last part, it simply means that it will now allow us to do something like that.

const splitWithDelimiter = (delimiter, text) => {
  const characters = [...text];

  const toSplitWithDelimiter = (parts, character) => {
    // If we hit the delimiter in the characters
    if (character === delimiter) {
      return [...parts, delimiter];
    }

    // If the last part is the delimiter
    if (parts[parts.length - 1] === delimiter) {
      return [...parts, character];
    }

    // Every parts except the last one
    const inits = parts.slice(0, -1);

    // The last part
    const tail = parts[parts.length - 1];

    return [...inits, (tail || "") + character];
  };

  return characters.reduce(toSplitWithDelimiter, []);
}

const withDelimiter = delimiter => {
  return {
    [Symbol.split](string) {
      return splitWithDelimiter(delimiter, string);
    }
  };
};

const string = "/home/user/code/website/Dockerfile/";
const delimiter = "/";
const result = string.split(withDelimiter(delimiter));

console.log(result);

// ["/", "home", "/", "user", "/", "code", "/", "website", "/", "Dockerfile", "/"]
Enter fullscreen mode Exit fullscreen mode

Notice how we are now calling the String.prototype.split method while still getting the same result.

In this example, we defined a function that returns an object containing this special symbol. This is because, among all the types it takes, the String.prototype.split method will call any Symbol.split method when it gets an object as its parameters.

And that is exactly what we are returning! It will call our method with the string that should be splitted. It is a way of saying okay now I'm done, just do whatever you want, I'm not responsible for the output anymore, you are. And we can return anything we like, in this example a split of all the parts with the delimiters.

This, of course, enables any logic to be performed, and now only the imagination is the limit when it comes to split a string.

Aggressive optimization

I will show you another way which has been suggested by a comment (see down below), slightly modified and that is cleverly using the String.prototype.match method.

const splitWithDelimiter = (delimiter, string) => string.match(new RegExp(`(${delimiter}|[^${delimiter}]+)`, "g"));
const withDelimiter = delimiter => ({[Symbol.split]: string => splitWithDelimiter(delimiter, string)});

const string = "/home/user/code/website/Dockerfile/";
const delimiter = "/";
const result = string.split(withDelimiter(delimiter));

console.log(result);

// ["/", "home", "/", "user", "/", "code", "/", "website", "/", "Dockerfile", "/"]
Enter fullscreen mode Exit fullscreen mode

Note that this solution is way faster (95% faster) than what I wrote above. And it is also terser. The only drawback is that it needs to be read with some RegExp knowledge in mind since it relies on the String.prototype.match method.

Conclusion

We saw what the String.prototype.split was useful for.

We addressed a problem that was how to split a string, while still keeping the delimiters in the output result with the help of the Symbol.split symbol.

What comes next? This symbol is one among the many symbols that the language exposes. We can found a similar symbol in behavior with the Symbol.replace symbol which will work with the String.prototype.replace method.

I hope that you enjoyed learning new things with me. If you have any questions don't hesitate to comment down below and thanks for reading me!

Bonus

This bonus serves as a way of validating what you just learned. If you want to be sure you understood how the Symbol.split symbol works, you can try this challenge now!

Write a function oneOf. It will take as its only parameter a string which will contain all the delimiters that should be used for splitting a string. You will return an object containing the Symbol.split method and your task is to return an array containing all the parts (without the delimiters).

const string = "foo,bar;baz.glurk";
const delimiters = ";.,";

/**
 * Split a string with one of the delimiters.
 *
 * @param {string} delimiters
 * @return {Record<Symbol.split, string[]>}
 */
const oneOf = delimiters => {};

console.log(string.split(oneOf(delimiters)));

// ["foo", "bar", "baz", "glurk"]
Enter fullscreen mode Exit fullscreen mode

Good luck and have fun!

Discussion

pic
Editor guide
Collapse
mellen profile image
Matt Ellen

This is very interesting. I had no idea you could do overriding like that.

However, I think the easiest way to keep the single character delimiters is to use string.match:

function splitKeepingDelimiter(string, delim)
{
  let result = string.match(new RegExp(`([^${delim}]+|${delim}|[^${delim}]+)`, 'g'));
  return result;
}
Enter fullscreen mode Exit fullscreen mode
Collapse
aminnairi profile image
Amin Author

Hi Matt and thanks for your answer.

I didn't knew myself we could do that with the String.prototype.match method. Not only this is easier, but it is also faster (at least for Google Chrome) than what I did with the Array.prototype.reduce method.

But I guess we could simplify the RegExp and the fuction to one line if we wanted to to aggressive optimizations.

const splitWithDelimiter = (delimiter, string) => string.match(new RegExp(`(${delimiter}|[^${delimiter}]+)`, "g"));
Enter fullscreen mode Exit fullscreen mode

Since it does not need to check for characters before the slash (in this case since we are matching a UNIX absolute path).

Collapse
adamcoster profile image
Adam Coster

You can use String.prototype.split() for this purpose if you use a regex with capturing groups as the delimiter.

Collapse
aminnairi profile image
Amin Author

Hi Adam and thanks for your reply.

You are absolutely right, it is possible to use a RegExp to achieve a similar result without having to use the Symbol.split.

The article is more focused on what the Symbol.split is, and one of its use-case.

If you ask me, I would be more confortable using something more explicit (maybe a third-party library full of splitters?) than using a RegExp (directly).

As a matter of fact, while writing this article, I didn't even knew there was a RegExp for that but the community is full of wonderful and clever people, including someone in the comment section that helped me enhance this article with a RegExp-based solution!

RegExp are still very obscur for most of us and using something explicit and declarative is for sure an added argument for using a splitter and this symbol.