DEV Community

Cover image for Speed-up your internationalization calls up to 5-1000 times
Vincent Thibault
Vincent Thibault

Posted on

Speed-up your internationalization calls up to 5-1000 times

Context

It all started two years ago. I was working on a new PWA for a big social network written from scratch that needed a i18n module to handle different languages. The module had to:

  • handle interpolation.
  • handle PLURAL and SELECT expressions.
  • be lightweight (it’s a PWA, must run with limited bandwidth).
  • run fast (some users had low-end devices).

And that’s where things got creepy, the only possible library was Google Closure MessageFormat. It was not so fast on low-end devices and weighing heavily on our bundle. So I decided to write my own with performance in mind.

Fast forward to today, the problem is still the same with i18n libraries, so I opened-source 💋Frenchkiss.js a 1kb i18n library 5 to 1000 times faster than others.
Stay with me for a journey on performances optimizations.

👉 Time to speed up your webapp for mobile devices!


🤷 How are i18n modules working?

Under the hood, it sucks, some i18n modules are re-processing the translation on each and every calls, resulting in poor performances.

Here is an example of what can happen inside the translate function (really simplified/naive version of Polyglot.js).

const applyParams = (text, params = {}) => {
  // Apply plural if exists
  const list = text.split('||||');
  const pluralIndex = getPluralIndex(params.count);
  const output = list[pluralIndex] || list[0];

  // Replace interpolation
  return output.replace(/%\{\s*(\w+)\s*\}/g, ($0, $1) =>  params[$1] || '');
}

applyParams('Hello %{name} !', {
  name: 'John'
});
// => Hello John !

In short, on each translations call we split the text, calculate the plural index, create a RegExp and replace all occurrences by the specified given parameter if it exists and returns the result.

It's not that big of a deal, but are you fine doing it multiple time on each render/filter/directive call ?

👉 It's one of the first things we learn when building app in react, angular, vuejs or any other framework : avoid intensive operations inside render methods, filters and directives, it will kill your app !

Some i18n libraries are doing better !

Some others are optimizing things quite a bit, here comes Angular, VueJs-i18n, Google Closure for example.

How are they doing it ? Actually they parse the string only once and cache a list of opcodes to process them on the next calls.

If you aren’t familiar with opcodes, it’s basically a list of instructions to process, in this case just to build a translation. Here's a possible example of opcodes generated from translations :

[{
  "type": "text",
  "value": "Hello "
}, {
  "type": "variable",
  "value": "name"
}, {
  "type": "text",
  "value": " !"
}]

And how we print the result :

const printOpcode = opcodes => opcodes.map(code => (
  (code.type === 'text') ? code.value :
  (code.type === 'variable') ? (params[code.value] || '') :
  (code.type === 'select') ? printOpCode( // recursive
    params.data[params[code.value]] || params.data.other
  ) :
  (code.type === 'plural') ? printOpCode( // recursive
    params.list[getPluralIndex(params[code.value])] || params.list[0]
  ) :
  '' // TODO not supported ?
)).join('');

With this type of algorithm, more time is allocated for the first call that generate the opcode but we store it and re-use it for faster performance in the next calls :

  • It doesn't split the string.
  • It doesn't do intensive regex operation.
  • It just read the opcode and merge the result together.

Well, that rocks ! But is it possible to go further ?


🤔 How can we speed up things ?

💋Frenchkiss.js is going one step further, it compiles the translation into a native function, this one is so light and pure that the Javascript can easily JIT compile it.

How does it work ?

Quite simple, you can actually build a function from a string doing the following :

const sum = new Function('a', 'b', 'return a + b');

sum(5, 3);
// => 8

For further informations, take a look at Function Constructor (MDN).

The main logic is still to generate an opcode list but instead of using it to generate a translation we use it to generate an optimized function that will returns the translation without further process.

It’s actually possible because of the simple structure of interpolation and SELECT/PLUTAL expressions. It’s basically a returns with some ternary.

const opCodeToFunction = (opcodes) => {
  const output = opcodes.map(code => (
    (code.type === 'text') ? escapeText(code.value) :
    (code.type === 'variable') ? `params[${code.value}]` :
    (code.type === 'select') ? ... :
    (code.type === 'plural') ? ... :
    '' // TODO Something wrong happened (invalid opcode)
  ));

  // Fallback for empty string if no data;
  const result = output.join('+') || "";

  // Generate the function
  return new Function(
    'arg0',
    'arg1',
    `
    var params = arg0 || {};
    return ${result};
  `);
});

⚠️ Note: when building dynamic function, make sure to avoid XSS injection by escaping user input !

Without further ado, let's see the generated functions (note: the real generated functions are a little more complex, but you will get the idea).

Interpolation generated function

// "Hello {name} !"
function generated (params = {}) {
  return 'Hello ' + (params.name || '') + ' !';
}

By default, we still fallback to empty string to avoid printing "undefined" as plain text.

Select expression generated function

// "Check my {pet, select, cat{evil cat} dog{good boy} other{{pet}}} :D"
function generated (params = {}) {
  return 'Check my ' + (
    (params.pet == 'cat') ? 'evil cat' :
    (params.pet == 'dog') ? 'good boy' :
    (params.pet || '')
  ) + ' :D';
}

We don't use strict equality to keep supports for numbers.

Plural expression generated function

// "Here {N, plural, =0{nothing} few{few} other{some}} things !"
function generated (params = {}, plural) {
  const safePlural = plural ? { N: plural(params.N) } :{};

  return 'Here ' + (
    (params.N == '0') ? 'nothing' :
    (safePlural.N == 'few') ? 'few' :
    'some'
  ) + ' things !';
}

We cache the plural category to avoid re-fetching it in case of multiple checks.


🚀 Conclusion

With generated functions we were able to execute code from 5 to 1000 time faster than others, avoiding doing RegExp, split, map operations in rendering critical path and also avoiding Garbage Collector pauses.

Benchmark

Last best news, it's only 1kB GZIP size !

If you're searching for a i18n javascript library to accelerate your PWA, or your SSR, you should probably give 💋Frenchkiss.js a try !

Top comments (14)

Collapse
 
qm3ster profile image
Mihail Malo

I assume it would be more code on the wire if you server-rendered these functions?
Have your considered rendering them inside a Service Worker and returning the .js file?

Collapse
 
vince_tblt profile image
Vincent Thibault

Interesting idea to do it on the server-side/service-worker/build-tool !

Yet I'm not sure there is much gain, the generated function weighs more than the string representation. And it will mean for every update of the library you'll have to re-generate all your functions in case of signature mismatch (and possible fixes on the generated function).

But yeah, it would also be possible with this method to remove the compiler from the code and gain some extra bytes :)

By the way, here is how to extract the function if needed :

import { locale, set, t, cache } from 'frenchkiss';

locale('en');
set('en', {
  test: 'Check my {pet, select, cat{evil cat} dog{good boy} other{{pet}}} :D',
});

t('test'); // generate it
console.log(cache.en.test.toString()); // extract it

// =>
// function anonymous(a,f
// /*``*/) {
// var p=a||{};return "Check my "+(p["pet"]=="cat"?"evil cat":p["pet"]=="dog"?"good // boy":(p["pet"]||(p["pet"]=="0"?0:"")))+" :D"
// }
// */
Collapse
 
qm3ster profile image
Mihail Malo • Edited
var p=a||{};return "Check my "+(p["pet"]=="cat"?"evil cat":p["pet"]=="dog"?"good boy":(p["pet"]||(p["pet"]=="0"?0:"")))+" :D"

With a little more work of the compiler, there are things that could make the function shorter:

var p=a||{},a=p.pet;return "Check my "+(a=="cat"?"evil cat":a=="dog"?"good boy":a||(a=="0"?0:""))+" :D"

Can you clarify to me what the (a||(a=="0"?0:"")) is doing?

Thread Thread
 
vince_tblt profile image
Vincent Thibault

Yeah some optimizations can definitively cleanup the generated function. I just wanted to avoid spending much time (and file size) just to prettify the output.

The var p=a||{}; for example can be removed in case of raw string (that's not actually the case).

About the (a||(a=="0"?0:"")), it's actually to avoid printing "undefined", "null" in a translation, but keep "0" working :

// 'test' : 'Value: {value} !'

t('test'); // 'Variable:  !'
t('test', { value: undefined ); // 'Value:  !'
t('test', { value: null ); // 'Value:  !'
t('test', { value: 'test' ); // 'Value: test !'
t('test', { value: 0 ); // 'Value: 0 !'
Thread Thread
 
qm3ster profile image
Mihail Malo

I'm not well-versed in i18n, is that the expected behavior of pet, select, other{{pet}}? Empty string for entirely missing key?

Thread Thread
 
qm3ster profile image
Mihail Malo • Edited

Can I suggest this way of inlining selects or will it hit performance?

var a=p&&p.pet;return "Check my "+{cat:"evil cat",dog:"good boy"}[a]||a||(a=="0"?0:"")+" :D"

If you do manage to get rid of new Function, you could do things like this though (n being a helper shared by all functions for all locales):

const n=x=>typeof x==="string"?x:"",
g=(c,x)=>c[x]||n(a)
// Elsewhere...
const c={cat:"evil cat",dog:"good boy"},
out=({pet:a}={},f)=>`Check my ${g(c,a)} :D`
Thread Thread
 
vince_tblt profile image
Vincent Thibault

Depend of the i18n libs, some are printing undefined, some others are keeping "{variable}" in translation.

As for me, I think it's a better user experience to have an empty string than a variable name (else the website seems broken).

Thread Thread
 
qm3ster profile image
Mihail Malo

Should probably report an error in prod(in addition to whatever behavior you suggest)/fail in dev?
Imo the key name is less bad than an empty string, I have been on broken sites/apps where I could complete my flow in part thanks to the variable names in the template. But that's a nitpick.

Thread Thread
 
vince_tblt profile image
Vincent Thibault

Ok, I note the suggestion, I'll try to implement it this week end, something like the onMissingKey is working :

frenchkiss.onMissingVariable((value, key, lang) => {
  // report it
  sendReport(`variable "${value}" in ${lang}->${key} is missing.`);

  // the value you want to use instead
  return `[${value}]`;
});
Thread Thread
 
vince_tblt profile image
Vincent Thibault • Edited

Just saw your comment about SELECT optimization.

I already did some tests with it working using an object mapping to values, but it doesn’t work well with nested expressions.

With nested expressions, you can’t really pre-cache objects and it will execute all the possible branches code before doing the resolution leading to performance issue.

Thread Thread
 
qm3ster profile image
Mihail Malo • Edited

With nested expressions you can have functions for some branches and strings for some. :v
Don't know about performance, but it is usually very compact, and you just need one helper:

const e = (table, x, p) => {
  const val = table[p]
  return typeof val === "function" ? val(x) : val
}
const a = { a: "A", b: "chungus" }
const b = { a: "B", b: x => `<${e(a, x, x.a)}>` }
const c = { a: "Cfunction b(x) {", b: x => `[${e(b, x, x.b)}]` }
const rule = x => `Yo, ${e(c, x, x.c)}`

const prop = { a: "b", b: "b", c: "b" }
console.log(rule(prop))
prop.a = "a"
console.log(rule(prop))
prop.b = "a"
console.log(rule(prop))
prop.c = "a"
console.log(rule(prop))
Thread Thread
 
vince_tblt profile image
Vincent Thibault

Not really so compact if you transpile it to ES5.
Here is an example of a complete solution (if I don't miss a thing ?).

The translation :

Updated: {minutes, plural,
  easteregg {never}
  =0 {just now}
  =1 {one minute ago}
  other {
    {minutes} minutes ago by {gender, select,
      male {male}
      female {female}
      other {other}
    }
  }
}

The global functions

// Global SELECT
function getBranchData(
  branch,
  prop,
  params,
  getPluralCategory,
  onMissingVariable,
  key,
  language
) {
  var data = branch.hasOwnProperty(prop) ? branch[prop] : branch.other; // fallback to 'others'

  return typeof data === "function"
    ? data(params, getPluralCategory, onMissingVariable, key, language)
    : data;
}

// Global PLURAL
function getBranchPluralData(
  branch,
  prop,
  params,
  getPluralCategory,
  onMissingVariable,
  key,
  language,
) {
  var category = getPluralCategory && getPluralCategory(params[prop]);
  var data;

  if (branch.hasOwnProperty(prop)) {
    // direct assignment
    data = branch[prop];
  } else if (category && branch.hasOwnProperty(category)) {
    // category check (easter egg)
    data = branch[category];
  } else {
    // default to other
    data = branch.other;
  }

  return typeof data === "function"
    ? data(params, getPluralCategory, onMissingVariable, key, language)
    : data;
}

// Global Interpolation
function handleInterpolation(params, prop, onMissingVariable, key, language) {
  return !params.hasOwnProperty(prop)
    ? onMissingVariable(prop, key, language)
    : typeof params[prop] === "number"
    ? params[prop]
    : params[prop] || "";
}

The generated function demo:

function functionGenerator() {
  // Closure to avoid re-defining branches at each call
  var branchA = {
    0: "just now",
    1: "one minute ago",
    other: function(params, getPluralCategory, onMissingVariable, key, language) {
      return (
        handleInterpolation(params, "minutes", onMissingVariable, key, language) +
        " minutes ago by " +
        getBranchData(
          branchB,
          params.gender,
          params,
          getPluralCategory,
          onMissingVariable,
          key,
          language
        )
      );
    }
  };

  var branchB = {
    male: "male",
    female: "female",
    other: "other"
  };

  return function(params, getPluralCategory, onMissingVariable, key, language) {
    return (
      "Updated: " +
      getBranchPluralData(
        branchA,
        params.minutes,
        params,
        getPluralCategory,
        onMissingVariable,
        key,
        language
      )
    );
  };
}

var fn = functionGenerator();
fn({
  minutes: 5,
  gender: "male"
});

I'll probably do a branch to see if it's a good candidate.

Thread Thread
 
qm3ster profile image
Mihail Malo
        params,
        getPluralCategory,
        onMissingVariable,
        key,
        language

should probably be an object, passed by reference instead of individually through arguments?

Collapse
 
qm3ster profile image
Mihail Malo

It's more about the fact that the browser can store the parsed and even potentially optimized function in cache, not just the string form, when you go through normal pathways.
new Function() is rather esoteric, and means it will definitely do a parse per instantiation, as well as cause some deoptimization around the instantiation.
Furthermore, using the library as is requires 'unsafe-eval' CSP directive on your entire page, which you otherwise might be able to avoid.