DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

DEV Community πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’» is a community of 968,873 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Create account Log in
Cover image for i18n: A lot to think about
neoan
neoan

Posted on

i18n: A lot to think about

When we think of internationalization, we usually think of translating content in various languages. But it is so much more than that. Let's look at the following topics and how to solve all of them with one library.

Number formatting

How hard can it be? Well, decimal notations alone are a point of contention. Sometimes even within a country (looking at you, Canada)
Image description
light blue: dot, green: comma, teal: both, red: Arabic decimal symbol (U+2396)
Especially the Arabic notation is tricky as it looks like an apostrophe but isn't. This means that screen readers might misinterpret numbers. But even the most common differences can be confusing: in a scientific paper, translations of something like

... ergab eine Fehlerquote von 13,935%.
[German]

and

... showed an error rate of 13,935%.
[English]

would be a horrible mistranslation just based on the decimal point alone.

If we add currency to the mix, things become even weirder:

$9.99 or 9,99 $?

You see where I am going with this; in order to be truly localized, we have to rely on trustworthy resources to even have a chance to get it right. Thankfully, PHP has an extension for that called Intl. But before you jump onto php.net to find out that you opened Pandora's box, worry not: there is an easier approach.
But first, let's look at another common issue:

Date, Time & localization

I am probably not the only one with the luxury of having meetings with people from various time zones. And applications have gotten good at it. Zoom & Co invite me for the correct localized time so I don't have to think about the pain of Europe not switching between daylight saving time on the same day or India having time zones that my brain refuses to calculate because they the difference can't even be expressed in full hours. As a developer, I need both, though. I need to format the date or time correctly for the user AND decide whether or not I want the context to be relative to the user or not.

An example:

My body still wasn't used to the sun at 1 am, so I had difficulties sleeping at the small hotel in ReykjavΓ­k I naively booked with a view of the northern lights in mind.

So sure, you want the international reader to "experience" 1 am in the suitable format, but not "translate" the time to the time zone our reader resides. On the other hand, if you take this example:

The event will take place virtually on Wednesday, August 20th at 11 pm PDT.

You most certainly will want to make sure that this information is accurately transcribed into whatever that actually means for the reader. As you can see, full control over the behavior is necessary.

Good ol' text translation

So, we talked about everything but actual translations. Besides the fact that as someone being fluent in more than one language, I don't believe in the word "translation" to begin with (It implies that meaning could be 100% transferable between languages). On a linguistic meta level, this can be shown on the following things:

German English
0 Autos, 1 Auto, 2 Autos 0 cars, 1 car, 2 cars
0 Bakterien, 1 Bakterium, 2 Bakterien 0 bacteria, 1 bacterium, 2 bacteria
0 Informationen, 1 Information, 2 Informationen information, information, information

Let's look at row 1:

So in English, it seems to be the case that a singular is only used when we are talking about exactly ONE (And zero plural?!). And if we compare it to German, then this seems to be a thing for other languages as well. So in pseudo code, our logic would somehow have to account for that.

In row 2,

we are reminded of the fact that irregular plurals are a thing in many language, but you really start to get an insight into the nightmare in

row 3,

where we have to face the reality that things might be countable in some languages, but not in others. And don't forget, these languages are rather similar in comparison. Let's be a little mean:

Japanese English
δΈ€ζžš one (e.g. when counting paper)
δΈ€ζœ¬ one (e.g. when counting pencils)
δΈ€ι ­ one (e.g. when counting elephants)
一杯 one (e.g. when counting cups)
... ...

Practicality

Lastly, we have to ask us how much time we have in order to offer as a solution to these problems, so many companies make use of AI-based translations that are either rather unreliable (like google translate) or extremely expensive. To mitigate this issue, and since PHP's Intl offers the toolset to address these issues, I wrote an i18n script that let's you write server-side rendered HTML with extended markup (template engine).

GitHub logo sroehrl / php-i18n-translate

Simple yet powerful i18n support for PHP

PHP i18n translate

Straight forward. Convenient. Fast.

Build Coverage php vegan Maintainability

Installation

composer require sroehrl/php-i18n-translate

require_once 'vendor/autoload.php';

$i18n = new I18nTranslate\Translate();

$i18n->setTranslations('de', [
    'hello' => 'hallo',
    'goose' => ['Gans', 'GΓ€nse']
]);
Enter fullscreen mode Exit fullscreen mode

Quick start:

1. In Code

echo "a: " . $i18n->t('hello') . "<br>"; 
echo "b: " . $i18n->t('goose') . "<br>";
echo "c: " . $i18n->t('goose.plural') . "<br>";

// detect plural by numeric value
foreach([0,1,2] as $number){
    echo $number . " " . $i18n->t('goose', $number) . ", ";
}
Enter fullscreen mode Exit fullscreen mode

Outputs:

a: hallo <br>
b: Gans <br>
c: GΓ€nse <br
…
Enter fullscreen mode Exit fullscreen mode

It addresses our issues like such:

// little pseudo code, but you get it:
$productModelData = SomeORM->getProduct($_GET['productId']);

$t = new I18nTranslate\Translate($userLocale, $userTimezone);

$t = setTranslationsAndSomeSettings($t);

// I already apologize for the formatting
echo $t->translate(
   Neoan3\Apps\Template\Template::embraceFromFile(
      '/theHTMLbelowThisCode.html', 
      ['product' => $productModelData]
   ) 
);
Enter fullscreen mode Exit fullscreen mode
<!-- A simple static translation using a t-tag -->
<h1><t>Welcome to our page!</t><h1>
<!-- B simple static translation using a template function -->
<h2>{{t('Welcome to our page')}}</h2>
<article>
   <!-- C There's a lot going on here, we'll break it down later  -->
   <p><t>Check out [%product-name%](%{{product.title}}%)<t/></p> 
   <!-- D show USD price, but show in user's format -->
   <div class="price" i18n-currency="USD">
     {{product.price}}
   </div>
   <div class="special offer">
     <p>
        <t>Convincing text to make you buy in the next:</t>
        <!-- E Yes, this works -->
        <span i18n-time="m">+2 min</span> <br>
        <!-- F Prints something like: Wednesday, 12.10.2022 10:30 Eastern Daylight Time -->
        <span i18n-date-local="EEEE, dd.MM.Y HH:mm zzzz">
           {{product.realease}}
        </span>
   </div>
   <div class="very-special-offer">
      <p>
        <!-- G for  "gettin' very dynamic here" -->
        <t>Buy [%number%](% {{product.offerCount}} %) for only {{i18n-currency(product.discountedPrice * product.offerCount, 'USD')}} today!</t>
      </p>
   </div>
</article>

Enter fullscreen mode Exit fullscreen mode

So I guess a little explanation wouldn't hurt?

As you noticed, I started each comment with a letter for our reference:

A - T-tag

In most cases, this will be enough. This tag runs after all other substitutions unless you additionally give the tag itself i18n-attributes. The outcome is a substitution of its content with the corresponding translation:

//...

$translations['de'] = [
   'Welcome to our page!' => 'Willkommen auf unserer Seite!',
   //...
];

//...
Enter fullscreen mode Exit fullscreen mode

B - The T-function

The template engine uses curly brackets by default to evaluate content (You see this whenever we use data from "product").
Sometimes you want translations to be run earlier in the process to control interactivity between the data-context and the translation context. This is especially useful if you have custom functions and/or attributes for the templating. As this would reach beyond the scope of this article, I will have to leave examples to your imagination.

C - Placeholders

Sometimes (or quite often, depending on your project), you need to insert dynamic content into your translations. We best explore this by looking into the cycle of what happens under the hood:

//...

$translations['de'] = [
   //...
   'Check out [%product-name%]' => 'Erfahre mehr ΓΌber [%product-name%]',
   //...
];

//...
Enter fullscreen mode Exit fullscreen mode

In our translations, we indicate dynamic values WITHIN the t-tag in order to account for placement differences or grammatical adjustments within languages. In our HTML-template, we then bind these placeholders to a value. As values could come from various sources, but we want to use the context of our product, we use curly brackets to indicate that we want to resolve the variable prior to substitution:

<t>Check out [%product-name%](% {{ product.title }} %)<t/>
Enter fullscreen mode Exit fullscreen mode

1. Find and interpret context-data:

<t>Check out [%product-name%](%Scrapbook  #2%)<t/>
Enter fullscreen mode Exit fullscreen mode

2. Find and interpret functions
(not happening in this example)

3. Sanitize string & search for translation

// pseudo code to illustrate the principle
$memorize = "Scrapbook #2"

$lookFor = "Check out [%product-name%]";

$foundTranslation = 'Erfahre mehr ΓΌber [%product-name%]';
Enter fullscreen mode Exit fullscreen mode

4. Replace placeholder with value

// pseudo code to illustrate the principle
$final = str_replace('[%product-name%]', $memorize, $foundTranslation);
Enter fullscreen mode Exit fullscreen mode

D - Attributes

We have several attributes at our disposal. Attributes are an easy way to hook static or dynamic values into i18n. In this case, we simply format the price to the user's expectations.

Pro-tip: i18n-translate uses the template engine's addCustomAttribute-method to achieve this. This means you can create your own attributes according to the needs of your shop/cummunity etc.

E - Another attribute

Some attributes need a value, others don't. In the case of i18n-time both the format (here minutes without leading zero) and the value are optional. If the value isn't a timestamp, the attribute uses PHP's strtotime, making this example (two minutes from now) possible.

Shows the current server time (but only minutes):

<!-- without content -->
<span i18n-time="m"></span>
Enter fullscreen mode Exit fullscreen mode

Shows the current server time in the time-foramt suited for the user:

<!-- without content or format -->
<span i18n-time></span>
Enter fullscreen mode Exit fullscreen mode

F - Make it local

As mentioned, we sometimes want to translate a time to the time zone of the user. That's why i18n-time-local and i18n-date-local output the given value in the user's time zone & -format.

G - Functions to prerender placeholders

All provided attributes are also available as functions. With the exception of the t-function, they translate themselves into placeholders:

  • i18n-currency -> [%currency-value%]
  • i18n-time, i18n-time-local -> [%time-value%]
  • i18n-date, i18n-date-local -> [%date-value%]
  • i18n-number -> [%number-value%]

This means, that our example would have the following translation

//...

$translations['de'] = [
   //...
   'Buy [%number%] for only [%currency-value%] today!' => 'FΓΌr nur [%currency-value%] erhΓ€lts du [%number%] StΓΌck!',
   //...
];

//...
Enter fullscreen mode Exit fullscreen mode

Accessibility

Lastly, let's talk about screen-readers & co.
Using a server-side rendered solution is already a good start, but what else is there to consider?

The semantic side

In one of our examples we discussed using the t-funciton rather than the t-tag. This can also be useful if you don't want a semantic element to have a child (the t-tag). In other cases, you might want to leverage the existence of a tag to supply aria-declarations to your content. Since it is a tag, you can supply all regular attributes as you wish:

<t class="when-visually-impared:text-xxl" aria-role="note" data-id="x">Translate me!</t>
Enter fullscreen mode Exit fullscreen mode

Extendability

We can extend the comfort using the functionality bundled with this package.
Let's say we have a JavaScript text-to-speech program we want to work with called "js-read-me" which reads from attributes:

Neoan3\Apps\Template\Constants::addCustomAttribute('js-read-me', function($domAttr, $contextData) use ($t){
    $domAttr->nodeValue = $t->t($domAttr->nodeValue);
});
Enter fullscreen mode Exit fullscreen mode
<button js-read-me="Listen to my rant by clicking this button">😀</button>
Enter fullscreen mode Exit fullscreen mode

We now ensured that the content of "js-read-me" uses our translation!

Final thoughts

There is so much more we can achieve, but I mustn't overextend the time you spent on this already. I hope this serves as an inspiration and helps explain this complex yet important topic.

Thank you for making it this far & until next time!

Top comments (0)

Let's hear from your organization

Create an Organization and start sharing content with the community on DEV.