DEV Community

YCM Jason
YCM Jason

Posted on • Edited on

String.prototype.replace asynchronously?

Original post: https://www.ycmjason.com/blog/2018/04/28.html

this article assumes basic knowledge of RegExp.

Background

I was working with vuepress last week and I realise I want to be able to break my very long markdown into partials. So I raised this issue. And the legend, Evan You, suggested to use <!-- include ./sectionA.md -->. Then I picked up his advise and started digging into the code of vuepress.

String.prototype.replace

Before I explain how I solved the problem, I would like to make sure we are all on the same page. My solution is based on String.prototype.replace function which I will very briefly explain how this function works. This function takes in two arguments:

  1. What to replace (RegExp | String)
  2. What to replace with (String | Function)

String.prototype.replace(String, String)

const str = 'I am very happy, happy, happy.';
str.replace('happy', 'sad'); // I am very sad, happy, happy.
Enter fullscreen mode Exit fullscreen mode

The above example shows how we could replace a word in a string. Notice that only the first occurrence of happy is replaced by sad. This behaviour is similar to when you pass in a RegExp without global flag.

String.prototype.replace(String, Function)

const str = 'I am very happy, happy, happy.';
str.replace('happy', word => 'not ' + word);
// ^ I am very not happy, happy, happy.
Enter fullscreen mode Exit fullscreen mode

You could retrieve the matched word by passing in a replacer function. The value returned from the replacer function would be used to replace the word.

This use case is rare and probably not very useful as you already know the targeting word. You could simply do str.replace('happy', 'not happy') to have the same effect.

String.prototype.replace(RegExp, String)

const str = 'I am very happyyyyy, happy, happy.';
str.replace(/happ(y+)/, 'sleep$1'); // I am very sleepyyyyy, happy, happy.
str.replace(/happ(y+)/g, 'sleep$1'); // I am very sleepyyyyy, sleepy, sleepy.
Enter fullscreen mode Exit fullscreen mode

Should be fairly straight forward. Two things to note:

  1. /happ(y+)/ matches "happy" and all the "y"s that come after it.
  2. $1 will be replaced by whatever is matched in the groups () of the RegExp. You can have more than one groups and simply use $2, $3, $4 as their placeholders.

String.prototype.replace(RegExp, Function)

const str = 'I am very happyyyyy, happy, happyy.';

str.replace(/happ(y+)/, (match, ys) => {
    // match: 'happyyyyy'; ys: 'yyyyy'
    return 'sleep' + ys;
}); // I am very sleepyyyyy, happy, happyy.

str.replace(/happ(y+)/g, (match, ys) => {
    // This function is called 3 times:
    //     1. match: 'happyyyyy'; ys: 'yyyyy'
    //     2. match: 'happy'; ys: 'y'
    //     3. match: 'happyy'; ys: 'yy'
    return 'sleep' + ys;
}); // I am very sleepyyyyy, sleepy, sleepyy.
Enter fullscreen mode Exit fullscreen mode

The comments should be quite self-explanatory.

The synchronous way

Back to the problem we have, to replace <!-- include ./sectionA.md --> with the content of ./sectionA.md.

Any decent regex-er could come up with a regex to match that placeholder, and we came up with something like:

const placeholderRe = /<!--\s*include\s+([^\s]+)\s*-->/g
Enter fullscreen mode Exit fullscreen mode

Note: \s matches any space/tab etc. See here for more information.

This RegExp will match the placeholder as a whole and group the filename after the include.

So I basically use the String.prototype.replace to do the job:

const { readFileSync, existsSync } = require('fs');

const replaceIncludePlaceholdersWithFileContents = str => {
    const placeholderRe = /<!--\s*include\s+([^\s]+)\s*-->/g;
    return str.replace(placeholderRe, (placeholder, filename) => {
        if (!existsSync(filename)) return placeholder;
        return readFileSync(filename, 'utf8');
    });
};
Enter fullscreen mode Exit fullscreen mode

This works, we just need to handle one more case, i.e. when the partial being included also contain <!-- include file.md -->. Obviously this become a recursive problem. The way to deal with this is simply doing the Leap of faith.

Simply by applying replaceIncludePlaceholdersWithFileContents recursively on the content of each file included by the current file would do the job!

So we have something like:

const { readFileSync, existsSync } = require('fs');

const replaceIncludePlaceholdersWithFileContents = str => {
    const placeholderRe = /<!--\s*include\s+([^\s]+)\s*-->/g;
    return str.replace(placeholderRe, (placeholder, filename) => {
        if (!existsSync(filename)) return placeholder;
        return replaceIncludePlaceholdersWithFileContents(
            readFileSync(filename, 'utf8')
        );
    });
};
Enter fullscreen mode Exit fullscreen mode

This time our base case is when the included file do not contain the placeholder, then the function should terminate as the replacer function would not be called.

The asynchronous way

So I submitted the pull request, and some feedback has been given to me suggesting the use of fs.readFile, the async version of fs.readFileSync.

Immediately I realise, if I have a function called asyncStringReplace(str, search, replacer) which does what String.prototype.replace does but allow replacer to return a Promise, then I could just change my code to the following and it would work.

const { readFile, existsSync } = require('fs-extra');

const replaceIncludePlaceholdersWithFileContents = async str => {
    const placeholderRe = /<!--\s*include\s+([^\s]+)\s*-->/g;
    return await asyncStringReplace(str, placeholderRe, async (placeholder, filename) => {
        if (!existsSync(filename)) return placeholder;
        return await replaceIncludePlaceholdersWithFileContents(
            await readFile(filename, 'utf8')
        );
    });
};
Enter fullscreen mode Exit fullscreen mode

Spent so much time on thinking about the replacement of the placeholder, I would love to retain the already existing logic as much as possible.

So now what I need to write is just the asyncStringReplace method.

asyncStringReplace

The asyncStringReplace method should take in three arguments:

  1. str - the original string
  2. regex - the RegExp that represents the substring of str to be replaced
  3. aReplacer - an asynchronous function that takes in each match, should return Promise.

I basically copied from mdn the "while-loop" that loops through the matches using RegExp.prototype.exec. By using RegExp.prototype.exec we could track the RegExp.lastIndex and match.index of each match, which I couldn't think of a way to achieve this with String.prototype.match.

const asyncStringReplace = async (str, regex, aReplacer) => {
    const substrs = [];
    let match;
    let i = 0;
    while ((match = regex.exec(str)) !== null) {
        // put non matching string
        substrs.push(str.slice(i, match.index));
        // call the async replacer function with the matched array spreaded
        substrs.push(aReplacer(...match));
        i = regex.lastIndex;
    }
    // put the rest of str
    substrs.push(str.slice(i));
    // wait for aReplacer calls to finish and join them back into string
    return (await Promise.all(substrs)).join('');
};
Enter fullscreen mode Exit fullscreen mode

My approach basically split the given str with the given regex into substrings and put them into substrs.

substrs therefore contains:

[
    /* first loop in while */
    NON_MATCHING_STRING,
    aReplacer(MATCHING_STRING),

    /* second loop in while */  
    NON_MATCHING_STRING,
    aReplacer(MATCHING_STRING),

    /* ... */,

    /* n-th loop in while */  
    NON_MATCHING_STRING,
    aReplacer(MATCHING_STRING),

    /* substrs.push(restStr) */
    REST_NON_MATCHING_STRING
]
Enter fullscreen mode Exit fullscreen mode

E.g.
If we call the following

asyncStringReplace('i am happyy, happy === happyyy very!', /happ(y+)/g, someAsyncReplacer);
Enter fullscreen mode Exit fullscreen mode

The corresponding substrs would be:

[
    /* first loop in while */
    'i am ',
    someAsyncReplacer('happyy', 'yy'),

    /* second loop in while */
    ', ',
    someAsyncReplacer('happy', 'y'),

    /* third loop in while */
    ' === ',
    someAsyncReplacer('happyyy', 'yyy'),

    /* substrs.push(restStr) */
    ' very!'
]
Enter fullscreen mode Exit fullscreen mode

Notice since aReplacer is an asynchronous function, aReplacer(MATCHING_STRING) would therefore be a Promise. Promise.all could be used here to construct a Promise which resolves when all promises are resolved in this list.

The last line

    return (await Promise.all(substrs)).join('')
Enter fullscreen mode Exit fullscreen mode

await Promise.all(substrs) would yield to an array of string and .join('') would join all of them back together.

An example of how this could be applied:

const { readFile, existsSync } = require('fs-extra');

const replaceIncludePlaceholdersWithFileContents = async str => {
    const placeholderRe = /<!--\s*include\s+([^\s]+)\s*-->/g;
    return await asyncStringReplace(str, placeholderRe, async (placeholder, filename) => {
        if (!existsSync(filename)) return placeholder;
        return await replaceIncludePlaceholdersWithFileContents(
            await readFile(filename, 'utf8')
        );
    });
};
Enter fullscreen mode Exit fullscreen mode

Top comments (6)

Collapse
 
anonyco profile image
Jack Giffin • Edited

Your asynchronous String.prototype.replace function is very nice, but I can see three major ways to massively improve its performance.

  1. It calls await unnecessarily. What it should do is be maybe-asynchronous where it first checks to see if any promises were returned. (See stackoverflow.com/questions/522217...)

  2. The V8 engine has recently upgraded its string concatenation operator, making the need to prepackage strings into lists slower than simple string appendments.

  3. It should use callbacks where possible because as pretty as await/promise/async looks, the W3C really botched their performance. Permanently. Because, await/promise/async are all required to wait until the next tick before executing. What the W3C should have done is add in a delay keyword that can be added onto await and async for delay await and delay async, then an extra delay argument on the Promise and this delay would tell the browser whether or not it really needs to wait until the next tick.

Putting the top three together, let us witness my version. Notice how I do not depend on the promise's ability to be deferred to the next tick. This is intentional. If we could all program like this, then the W3C could change its errant standard toward not requiring browsers to delay the process to the next tick.

For more details on why I am making such a huge fuss over having to be delayed to the next tick, please see stackoverflow.com/questions/854777...

const asyncStringReplace = (str, regex, aReplacer, selfArg) => {
    const substrs = [], accstr = "";
    let resetIndex = 0;
    let match;
    let pendingPromises = 0;
    let accept = null;
    for(let match=regex.exec(str); match!==null; match=regex.exec(str)){
        if (resetIndex === regex.lastIndex) {
            regex.lastIndex += 1;
            continue;
        }
        // put non matching string
        substr += str.slice(resetIndex, match.index);
        // call the async replacer function with the matched array
        const retValue = aReplacer.apply(selfArg, match);
        if (retValue instanceof Promise) {
            const index = substrs.push(accstr, "") - 1;
            accstr = "";
            pendingPromises += 1;
            const thenAfter = returnValue => {
                substrs[index] = returnValue += "";
                pendingPromises -= 1;
                if (pendingPromises === 0 && accept !== null)
                    accept(substrs.join(""));
            };
            retValue.then(thenAfter, thenAfter);
        } else {
            accstr += retValue;
        }
        resetIndex = regex.lastIndex;
    }
    accstr += str.substring(resetIndex);
    // wait for aReplacer calls to finish and join them back into string
    if (pendingPromises === 0) {
        return accstr;
    } else {
        // put the rest of str
        substrs.push( accstr );
        accstr = "";
        return new Promise(function(acceptFunc){
            accept = acceptFunc;
            if (pendingPromises === 0) accept(substrs.join(""));
        });
    }
};
Collapse
 
ycmjason profile image
YCM Jason • Edited

Thanks a lot for sharing your inspiring ideas. 🍻I will keep those in mind in case I encounter any performance issue later. But for now, I would keep my version for readability and elegance. 😉

Collapse
 
cp1797 profile image
Craig Pinto • Edited

Hi!

Could you give a quick example of the aReplacer function? I can't seem to figure it out

Thanks

Collapse
 
ycmjason profile image
YCM Jason

It's a function that return a promise of string. It can be any asynchronous operations. What kind of example would you like? I can make something up if you wish.

Collapse
 
ladifire profile image
Ladifire

// put non matching string
substrs.push(str.slice(i, match.index));

what's "i" ??????????

Collapse
 
ycmjason profile image
YCM Jason

I have edited accordingly. Not sure why this obvious error was never spotted. Thank you!