Introduction
Splitting strings is a trivial operation in JavaScript with the help of the String.prototype.split
method.
When comes the complexity of splitting a string and keeping the delimiter, the String.prototype.split
method is of no use.
Or is it? We will see how we can still use the String.prototype.split
method by extending its capabilities.
Problem
Let's say we want to split a string given a delimiter.
const string = "/home/user/code/website/Dockerfile/";
const delimiter = "/";
const result = string.split(delimiter);
console.log(result);
// ["", "home", "user", "code", "website", "Dockerfile", ""]
As we can see, this works, and it was pretty easy.
There are some empty strings at the beginning and the end of the result, but nothing crazy.
But what if we wanted to keep the delimiters? There is no options in String.prototype.split
to do that unfortunately.
Solution
One solution would be to simply create the function that will take care of splitting a string, while still keeping the delimiter int the result.
I'll use the Array.prototype.reduce
method for that part.
const splitWithDelimiter = (delimiter, text) => {
const characters = [...text];
const toSplitWithDelimiter = (parts, character) => {
// If we hit the delimiter in the characters
if (character === delimiter) {
return [...parts, delimiter];
}
// If the last part is the delimiter
if (parts[parts.length - 1] === delimiter) {
return [...parts, character];
}
// Every parts except the last one
const inits = parts.slice(0, -1);
// The last part
const tail = parts[parts.length - 1];
return [...inits, (tail || "") + character];
};
return characters.reduce(toSplitWithDelimiter, []);
}
const string = "/home/user/code/website/Dockerfile/";
const delimiter = "/";
const result = splitWithDelimiter(delimiter, string);
console.log(result);
// ["/", "home", "/", "user", "/", "code", "/", "website", "/", "Dockerfile", "/"]
The inner logic itself is not very important. I'm sure there are numerous ways for achieving this result.
What is important is that our function takes a delimiter and a string as its parameters, and returns a split of all the delimiters and the parts together. There we have it, the solution to our problem.
Not only it addresses the problem, but it is reusable, and testable.
Idiomatic solution
But what if I told you that you can achieve a similar result, while still leveraging the String.prototype.split
method?
I know, I said that the String.prototype.split
is not capable of splitting a string while keeping the delimiter, but that is not entirely true.
In fact, there is a special well-known symbol in JavaScript called Symbol.split
that can help us. This is like a proxy for the String.prototype.split
method that, when used, allows us to completely hijack the method and call our own logic instead.
If you did not understand the last part, it simply means that it will now allow us to do something like that.
const splitWithDelimiter = (delimiter, text) => {
const characters = [...text];
const toSplitWithDelimiter = (parts, character) => {
// If we hit the delimiter in the characters
if (character === delimiter) {
return [...parts, delimiter];
}
// If the last part is the delimiter
if (parts[parts.length - 1] === delimiter) {
return [...parts, character];
}
// Every parts except the last one
const inits = parts.slice(0, -1);
// The last part
const tail = parts[parts.length - 1];
return [...inits, (tail || "") + character];
};
return characters.reduce(toSplitWithDelimiter, []);
}
const withDelimiter = delimiter => {
return {
[Symbol.split](string) {
return splitWithDelimiter(delimiter, string);
}
};
};
const string = "/home/user/code/website/Dockerfile/";
const delimiter = "/";
const result = string.split(withDelimiter(delimiter));
console.log(result);
// ["/", "home", "/", "user", "/", "code", "/", "website", "/", "Dockerfile", "/"]
Notice how we are now calling the String.prototype.split
method while still getting the same result.
In this example, we defined a function that returns an object containing this special symbol. This is because, among all the types it takes, the String.prototype.split
method will call any Symbol.split
method when it gets an object as its parameters.
And that is exactly what we are returning! It will call our method with the string that should be splitted. It is a way of saying okay now I'm done, just do whatever you want, I'm not responsible for the output anymore, you are. And we can return anything we like, in this example a split of all the parts with the delimiters.
This, of course, enables any logic to be performed, and now only the imagination is the limit when it comes to split a string.
Aggressive optimization
I will show you another way which has been suggested by a comment (see down below), slightly modified and that is cleverly using the String.prototype.match
method.
const splitWithDelimiter = (delimiter, string) => string.match(new RegExp(`(${delimiter}|[^${delimiter}]+)`, "g"));
const withDelimiter = delimiter => ({[Symbol.split]: string => splitWithDelimiter(delimiter, string)});
const string = "/home/user/code/website/Dockerfile/";
const delimiter = "/";
const result = string.split(withDelimiter(delimiter));
console.log(result);
// ["/", "home", "/", "user", "/", "code", "/", "website", "/", "Dockerfile", "/"]
Note that this solution is way faster (95% faster) than what I wrote above. And it is also terser. The only drawback is that it needs to be read with some RegExp knowledge in mind since it relies on the String.prototype.match
method.
Conclusion
We saw what the String.prototype.split
was useful for.
We addressed a problem that was how to split a string, while still keeping the delimiters in the output result with the help of the Symbol.split
symbol.
What comes next? This symbol is one among the many symbols that the language exposes. We can found a similar symbol in behavior with the Symbol.replace
symbol which will work with the String.prototype.replace
method.
I hope that you enjoyed learning new things with me. If you have any questions don't hesitate to comment down below and thanks for reading me!
Bonus
This bonus serves as a way of validating what you just learned. If you want to be sure you understood how the Symbol.split
symbol works, you can try this challenge now!
Write a function oneOf
. It will take as its only parameter a string which will contain all the delimiters that should be used for splitting a string. You will return an object containing the Symbol.split
method and your task is to return an array containing all the parts (without the delimiters).
const string = "foo,bar;baz.glurk";
const delimiters = ";.,";
/**
* Split a string with one of the delimiters.
*
* @param {string} delimiters
* @return {Record<Symbol.split, string[]>}
*/
const oneOf = delimiters => {};
console.log(string.split(oneOf(delimiters)));
// ["foo", "bar", "baz", "glurk"]
Good luck and have fun!
Top comments (4)
This is very interesting. I had no idea you could do overriding like that.
However, I think the easiest way to keep the single character delimiters is to use
string.match
:Hi Matt and thanks for your answer.
I didn't knew myself we could do that with the
String.prototype.match
method. Not only this is easier, but it is also faster (at least for Google Chrome) than what I did with theArray.prototype.reduce
method.But I guess we could simplify the RegExp and the fuction to one line if we wanted to to aggressive optimizations.
Since it does not need to check for characters before the slash (in this case since we are matching a UNIX absolute path).
You can use String.prototype.split() for this purpose if you use a regex with capturing groups as the delimiter.
Hi Adam and thanks for your reply.
You are absolutely right, it is possible to use a RegExp to achieve a similar result without having to use the
Symbol.split
.The article is more focused on what the
Symbol.split
is, and one of its use-case.If you ask me, I would be more confortable using something more explicit (maybe a third-party library full of splitters?) than using a RegExp (directly).
As a matter of fact, while writing this article, I didn't even knew there was a RegExp for that but the community is full of wonderful and clever people, including someone in the comment section that helped me enhance this article with a RegExp-based solution!
RegExp are still very obscur for most of us and using something explicit and declarative is for sure an added argument for using a splitter and this symbol.