DEV Community

Cover image for Database full of source texts and their translations – that's a translation memory!

Database full of source texts and their translations – that's a translation memory!

minna_xd profile image Minna Nurmiluoto Updated on ・6 min read

In my previous post, I told about my work as a localisation engineer and how the majority of my work is preparing different file formats to be translated in a translation tool with a translation memory attached.

Using a translation tool and storing all your translations in a translation memory may help you be more efficient and consistent: Updates to a text are simple to process because you get all the unchanged parts straight from the memory without having to browse the finished translations from last time and compare which bits you can re-use. Translation memories also help you keep terminology consistent because you have an easily searchable database containing relevant material.

How it works

A translation memory (TM) stores all translations as units with source and target language – often with a lot of other information. The translation units usually consist of one segment: a sentence ending in a full stop, a question mark or an exclamation mark, or otherwise delimited unit of text, such as a heading, a spreadsheet name, a list item or a UI string.

When translating, the translation tool searches the TM for exact or near matches (often called fuzzy matches) and presents them to the translator. The translator can then select the best one as is, edit as necessary, or write a new translation from scratch.

Things to consider

Before you start using a translation memory, stop for a moment to think about how you should set it up and when you actually should use it.

What information to store and where

The basic bits of information that often get stored along with a source and target text are a timestamp (created/modified/used) and the creator/modifier of the translation unit.

If you decide to set up a translation memory for yourself, you'll want to think if you want to store all your work in one database or several smaller ones (per customer/topic, for example). Assuming you won't be doing a lot of translating, I would perhaps use just one, main database but in that case I recommend using any labelling at your disposal in the specific tool. Usually, the tool adds selected labels automatically after you have entered the settings at the beginning of your work.

The labels may be text fields that you can attach to each translation unit, so you should jot down the topic or context somehow:

Source: Tables
Target: Pöydät 🇫🇮
Customer: Furniture Emporium
Document: Grand opening leaflet

Source: Tables
Target: Taulukot 🇫🇮
Document: Spreadsheet tutorial

Labels can also help distinguish your work if you are translating for, say, two competing companies that have different terminologies. It would look bad if you used the wrong terminology!

In a more professional setting, you'll most likely get access to an existing TM that you are expected to use. Even in that case, you can often attach your own TMs if you have collected some useful content in them.

Text types

Translation memories are not suited to all kinds of texts. When you are using one, the text is usually split into sentence-size segments. It is an additional step to merge these segments if you feel like changing the sentence structure a lot in your translation. Therefore, it is often said – and I believe it's true – that more creative or marketing texts shouldn't be translated in such a restrictive way.

Translation memories work really well for technical, repetitive texts where it is important that the message is conveyed in a clear and consistent manner. Texts types like those are, for example, user manuals and other documentation or legal documents. For software translations, I would also use a translation memory but the text type is so concise that I would refrain from a lot of automatic processing. What I mean is that you shouldn't go ahead and insert previous translations "blindly" without making sure the selected translation is correct for the context. For example, buttons and dialogue titles may have different styles, sometimes even different parts of speech such as nouns for titles and verbs for buttons!

Translation tools

In my work at a translation company, I mostly used SDL Trados Studio which is a popular tool but quite pricey for hobby use. Therefore, I'm sure I'll be looking into some free open source solutions just out of curiosity.

Have you tried out some translation tools? Any you would recommend?

SDL Trados Studio 🔗

SDL Trados Studio is like an IDE: it has everything integrated. You can attach translation memories, of course, and terminologies created in another SDL tool, Multiterm. You are able to process various file formats or even create your own "filters" (instructions for the tool on how to process a text file). But it's not cheap, so sometimes even freelancers are a bit hesitant to purchase it unless a proper workload is guaranteed.

Matecat 🔗

When I was starting my Hacktoberfest translat-a-thon, I wanted to utilize a translation tool. For my very first pull request, I started working on OmniaWrite which stores translations in a JSON file. I had searched what free tools people recommend and had found Matecat, which claims to support JSON. I signed up, set up the environment which looked nice and clear – but then it was unable to process this specific JSON format. I was too eager to get started with the translation that I didn't have any patience to investigate what was wrong.

Since I have signed up for Matecat already, I'll be sure to try it out properly.

OmegaT 🔗

Another recommended tool that I looked into during Hacktoberfest was OmegaT. I remember playing with it in the past at work for some reason. The tool looks rugged but paired with Okapi Framework's ready-made filter plugins or other utilities, I believe it could be quite powerful to handle a lot of different file formats.

However, in my impatient Hacktoberfest state, I didn't want to spend time learning the quirks and setting up the environment (software files usually require a bit of a setup compared to e.g. Word files).

Wordfast Anywhere 🔗

Wordfast is an old product family which has a free online version, Wordfast Anywhere. I haven't found the catch yet – except the fact that the list of supported file types is limited. However, if you're able to process your source file into XLIFF (XML Localization Interchange File Format) which is a standard file format for localization, you're able to translate it with this.

Term bases

As a good sidekick to a translation memory, you can have a termbase. The simplest termbase contains just a term and its translations – actually, you don't even need translations if the termbase helps in authoring texts – but you can also have context and other information, and sometimes info on forbidden terms that writers shouldn't use. For example, referring to the fairly recent change in Git terminology, you could store information that 'master' is a deprecated term and 'main' is preferred. When connected to an editing tool (for translation or other), QA functionality could warn if the writer has used the incorrect term by accident.

The term entries could be something like this:

Term 🇬🇧: main
Term 🇫🇮: pää-
Status: Preferred

Term 🇬🇧: master
Term 🇫🇮: isäntä-
Status: Deprecated

Languages are full of synonyms to choose from and termbases help with keeping track of those choices for consistent communication.

Not a jack of all trades but the master of some

A translation memory is not the solution to everything, of course, but when you want to keep consistency in repetitive or continuously updated text types, its memory is definitely better than yours!

Cover photo by Jan Antonin Kolar on Unsplash


Editor guide
matrixx profile image
Saija Saarenpää

Thanks for writing this post! This answered all my questions about translation memories which I was left interested after your previous post. I need to investigate if one of the free ones would work with some tools I used for translating open source projects. So far the projects I've been translating have been quite small, but already those have had some repetitive cases like the same text in a menu and in a button so I guess those would already benefit from having a TM as a helper.

minna_xd profile image
Minna Nurmiluoto Author

That's one reason why I didn't push myself to find a TM solution for Hacktoberfest: the projects I contributed to were manageably small. But even in small projects you easily have repetition so you'll have to take time to cross-check stuff. A TM could be of help, definitely!

In the past Trados had a translation integration to the clipboard. Something like that could be quite handy in a situation where you couldn't be bothered to fiddle with the file type settings. 😄 Who wants to code one with me? (I'll have to check if such a utility exists for modern systems. I bet it does.)

matrixx profile image
Saija Saarenpää

If there's a gap in tools I would be happy to help to create one. My current problem with finding a good hobby project has been in that everything seems to be invented already. Thus I've been creating a clone of one existing game lately. Would totally love to work on a fresh idea that would fill a void.

Thread Thread
minna_xd profile image
Minna Nurmiluoto Author

I can find this type of clipboard translation functionality in proprietary tools (and I see "T-Window for Clipboard" is still a part of SDL Trados, excellent! Only if I had Trados for personal use 😄). For hobby use there wouldn't have to be any complicated algorithms for the fuzzy matching, I would think. Some sort of "edit distance" that sorts the suggestions from best to worst would suffice. 🤔

Thread Thread
matrixx profile image
Saija Saarenpää

Haha, I would find implementing fuzzy algorithms only as a nice challenge! 😀👌