DEV Community

Yawar Amin
Yawar Amin

Posted on

Translations without the tears

IF WE look at a typical tutorial for internationalizing a simple web application, there are quite a lot of concepts and commands to remember. For example, here is about the shortest tutorial you can find for a Python/Flask webapp: https://medium.com/i18n-and-l10n-resources-for-developers/localization-for-flask-applications-3bef6d6aaf52

And it introduces quite a lot of new things: install this package, configure the i18n tooling, tag all your strings, extract the messages into a .pot (Portable Object Template) file, generate individual locale files for each language, and compile the translations. This is before even actually translating the messages! And let's not even think about maintaining this app over time as we need to add, remove, and update messages (i.e. strings).

How much of this complexity is inherent, and how much is accidental? I've been grappling with this question on and off for a while now, and think we can actually shave off quite a lot of manual error-prone work with the help of some outside-the-box thinking, and SQLite, the developer's best friend.

Goal

What's the minimum amount of work we can do to internationalize a typical webapp? For the purpose of this post I will ignore the different types of webapp i.e. server-rendered or client-rendered, and pretend the server-rendered case serves as a typical example. Also I somewhat doubt you want to load the translated strings for all your languages on the frontend app...well, unless you don't care about performance ๐Ÿ˜‰

So let's see if we can get down to:

  1. Declare a list of language we want to support
  2. Set up some minimal i18n support on a request-response cycle
  3. Tag all our strings in the app
  4. Run the app and generate a list of all translateable strings as a result
  5. Translate the strings and rerun the app to use the translations

Enter SQLite

Typically, i18n/l10n is done in different technologies using many different tools. In the open source world, and especially in the Python ecosystem, the most popular one is the GNU gettext suite of tools and its Python support. Unfortunately these tools require the use of several different file formats and CLI tools correctly to work. My take on this is that, these file formats and tools are from an earlier era when tooling had to be custom-designed to handle structured content like translation strings. But, it doesn't have to be that way today.

Instead of a mish-mash of .pot files, .po files, and multiple commands to manage them, what if we used the near-universal SQLite library? That's the thought process behind podb, a tiny proof-of-concept library that uses SQLite to automatically handle all the boring parts of i18n.

Podb

Here's how it works. You create and open a Podb database in your app. This initializes an SQLite database on disk. Then you create some language objects. These are callables which return translations of their input strings. Then throughout your app you use these language objects. The library notes the usage of all the languages and all the strings they translate, and records them in the SQLite database. Finally, when the app exits, you close the Podb database, which also closes the SQLite database and saves .po files for all the languages and strings which need to be translated. You fill in the translations and rerun the app. On the next run, podb notices that the translations are filled in, and records them in the SQLite database. So the SQLite database is the source of truth.

No .pot files, no running different commands to extract messages, compile translations, and so on. All this is taken care of automatically. It's very comparable to the experience of writing snapshot tests, actually.

In exchange, you do have to:

  1. Commit the SQLite database to your repo. Or at least set up some pipeline so that it gets put alongside your application at deploy time. Committing it directly into the repo is way simpler though.
  2. Actually use the app in at least one of the languages you need to translate to (even an English variant like en-GB counts!). Only the code paths that are hit when used will trigger the translations to be recorded in the database.

App walkthrough

Let's walk through the bare minimum Python Flask app using podb. Here's the code interspersed with my commentary:

# app.py

from typing import Optional
from flask import Flask, g, render_template, request
from podb import Podb
import signal
import sys

pos = Podb().__enter__()
Enter fullscreen mode Exit fullscreen mode

The Podb class is designed as a context manager so it can be used as a resource. But a Flask app doesn't really support that style. So we open it manually at the start of our script.

def _shutdown(signum, frame):
    pos._close()
    sys.exit(0)

signal.signal(signal.SIGINT, _shutdown)
signal.signal(signal.SIGTERM, _shutdown)
Enter fullscreen mode Exit fullscreen mode

Again, since we manually opened the database, we have to manually arrange for it to be closed, and in a webapp, the only realistic way to do that is by handling a signal.

app = Flask(__name__)

LANGUAGES = {'fr-CA', 'fr', 'it', 'en-GB', 'en'}
Enter fullscreen mode Exit fullscreen mode

Finally, we are setting up our languages. This set can of course grow or shrink over the lifecycle of the application.

@app.before_request
def accept_language():
    # Important: construct language objects only from statically-known set of
    # language names. The best_match method will return one of the languages in
    # the set.
    lang_name = request.accept_languages.best_match(LANGUAGES, default='en')
    g.lang_name = lang_name
    g.t = pos.lang(lang_name)

@app.after_request
def content_language(resp):
    resp.content_language.add(g.lang_name)
    return resp
Enter fullscreen mode Exit fullscreen mode

Here we use Flask's before-request and after-request hooks to set up language content negotiation using HTTP headers. Side note: I find it somewhat unfortunate that pretty much everyone has veered off using this web standard and are manually implementing language dropdowns. We could shave off a lot of development time by using standard language selectors that come built-in to our browsers, e.g. chrome://settings/languages

@app.route('/hello/')
@app.route('/hello/<name>')
def hello(name: Optional[str]=None):
    return render_template(
        'hello.html',
        lang=g.lang_name,
        hello_from=g.t('Hello from'),
        hello=g.t('Hello'),
        name=name)
Enter fullscreen mode Exit fullscreen mode

Finally, the actual request handler, which renders a template. Note, we are passing in the translated strings (using the g.t object which was set up with the correct language translator for the request) into the template:

<!-- templates/hello.html -->

<!doctype html>
<html lang="{{ lang }}">
  <head>
    <title>Hello</title>
  </head>
  <body>
{% if name %}
    {{ hello }}, {{ name }}!
{% else %}
    {{ hello_from }} Flask!
{% endif %}
  </body>
</html>
Enter fullscreen mode Exit fullscreen mode

Inside the template we are interpolating the variables which were passed in.

If you want to run this, I recommend the following setup:

$ mkdir -p flask_podb_test/po # po subdirectory used by podb
$ cd flask_podb_test
$ python3 -m venv env
$ source env/bin/activate
$ python -m pip install --upgrade pip
$ python -m pip install Flask polib
Enter fullscreen mode Exit fullscreen mode

Then, copy the files from the repo I linked above, into this directory. Run using:

python -m flask --app app run # app argument refers to app.py
Enter fullscreen mode Exit fullscreen mode

Send a few requests to the server, e.g.:

curl -i -H 'Accept-Language: fr' 'http://127.0.0.1:5000/hello/'
Enter fullscreen mode Exit fullscreen mode

Exit the server with Ctrl-C and notice that it populates the po subdirectory with the SQLite database and an fr.po file:

msgid ""
msgstr ""
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: https://github.com/yawaramin/podb\n"
"Project-Id-Version: podb\n"
"Language: fr\n"

#: podb
msgid "Hello from"
msgstr ""

#: podb
msgid "Hello"
msgstr ""
Enter fullscreen mode Exit fullscreen mode

This autogenerated file is meant to be sent to translators, like SaaS platforms, translators in your org or service providers (or even yourself for testing), filled in, and put back in the po subdirectory. On the next run, the app will ingest the new translations in the SQLite database. After it exits again, it will record any remaining missing translations in the .po files. Rinse and repeat.

Notice that the process stays exactly the same no matter which stage you are in of the app's lifecycle. Once you've set up the basic podb infrastructure, whether you're adding new strings or even new languages to translate to, it works the same wayโ€“run the app, get the autogenerated files translated, run the app again. No need to remember the different commands for different stages of the process.

I do want to emphasize that in exchange, you are giving up the static extraction of translateable messagesโ€“you only get messages extract if you actually exercise those code paths e.g. with either manual or automated tests. I think this is a reasonable tradeoff, though. After all, who ships new features to production without at least trying them out? ๐Ÿ˜…

Top comments (8)

Collapse
 
sectasy0 profile image
sectasy

Hi, great material. For even easier operation with translations in python applications, you can use a library that embraces this for you, all you have to do is configure the files properly! I've provided a link to the repo below.

github.com/sectasy0/pyi18n

Collapse
 
yawaramin profile image
Yawar Amin

That's a nice project. What does the workflow look like for adding a new string to be translated? E.g. I add a new string like _(locale, 'messages.special_offer'), does PyI18n automatically extract and add this key into all my existing locale files e.g. locale/en.yml, locale/pl.yml?

Collapse
 
sectasy0 profile image
sectasy • Edited

Nope, this package ain't a translation package, it's an internationalization library (i18n) you have to define your translations yourself in the YAML/JSON files, then those files will be loaded at the start of your application and using a function call of e.g. _('en', 'messages.special_offer') you will get the translation that is under the given path for a specific defined locale, in this case, you will get the string 'special offer text' from en.yml. I don't know why but many people don't understand what that library actually does, maybe better descriptions are required? If you got an idea how to improve descriptions to help other people understand I will be happy if you create a pull request :)

en:
    messages:
        special_offer: 'special offer text'
Enter fullscreen mode Exit fullscreen mode
Thread Thread
 
yawaramin profile image
Yawar Amin

I looked for the translation functionality but couldn't find it, so thought I'd double-check by asking. With podb I am actually more focused on the entire i18n/l10n workflow, not just the i18n part of it. (Leaving aside the actual sending/uploading files to translation platforms, downloading translated .po files, etc.) I think that's what differentiates it from PyI18n.

Thread Thread
 
sectasy0 profile image
sectasy

So these translations in your case are translated automatically? And in that case, might these translations not be correct?

Thread Thread
 
yawaramin profile image
Yawar Amin • Edited

No, as I mentioned earlier, podb is focusing on the i18n/l10n workflow of tagging strings, exporting untranslated strings to .po files to send for translation, and importing translations back from .po files. It is not deciding what the actually translated texts will be, that is left to humans to decide.

Thread Thread
 
sectasy0 profile image
sectasy

Ahh okay, Now I understand, so is similar to my library, but mine don't support Date and time, currency format control and other things, only interface translation.

Thread Thread
 
yawaramin profile image
Yawar Amin

Neither does podb ๐Ÿ™‚ I was referring only to the translation into the user's preferred language as 'localization'. I didn't mean date/time/currency (although technically it would be possible).