In my previous post, I told about my work as a localisation engineer and how the majority of my work is preparing different file formats to be translated in a translation tool with a translation memory attached.
Using a translation tool and storing all your translations in a translation memory may help you be more efficient and consistent: Updates to a text are simple to process because you get all the unchanged parts straight from the memory without having to browse the finished translations from last time and compare which bits you can re-use. Translation memories also help you keep terminology consistent because you have an easily searchable database containing relevant material.
A translation memory (TM) stores all translations as units with source and target language – often with a lot of other information. The translation units usually consist of one segment: a sentence ending in a full stop, a question mark or an exclamation mark, or otherwise delimited unit of text, such as a heading, a spreadsheet name, a list item or a UI string.
When translating, the translation tool searches the TM for exact or near matches (often called fuzzy matches) and presents them to the translator. The translator can then select the best one as is, edit as necessary, or write a new translation from scratch.
Before you start using a translation memory, stop for a moment to think about how you should set it up and when you actually should use it.
The basic bits of information that often get stored along with a source and target text are a timestamp (created/modified/used) and the creator/modifier of the translation unit.
If you decide to set up a translation memory for yourself, you'll want to think if you want to store all your work in one database or several smaller ones (per customer/topic, for example). Assuming you won't be doing a lot of translating, I would perhaps use just one, main database but in that case I recommend using any labelling at your disposal in the specific tool. Usually, the tool adds selected labels automatically after you have entered the settings at the beginning of your work.
The labels may be text fields that you can attach to each translation unit, so you should jot down the topic or context somehow:
Target: Pöydät 🇫🇮
Customer: Furniture Emporium
Document: Grand opening leaflet
Target: Taulukot 🇫🇮
Document: Spreadsheet tutorial
Labels can also help distinguish your work if you are translating for, say, two competing companies that have different terminologies. It would look bad if you used the wrong terminology!
In a more professional setting, you'll most likely get access to an existing TM that you are expected to use. Even in that case, you can often attach your own TMs if you have collected some useful content in them.
Translation memories are not suited to all kinds of texts. When you are using one, the text is usually split into sentence-size segments. It is an additional step to merge these segments if you feel like changing the sentence structure a lot in your translation. Therefore, it is often said – and I believe it's true – that more creative or marketing texts shouldn't be translated in such a restrictive way.
Translation memories work really well for technical, repetitive texts where it is important that the message is conveyed in a clear and consistent manner. Texts types like those are, for example, user manuals and other documentation or legal documents. For software translations, I would also use a translation memory but the text type is so concise that I would refrain from a lot of automatic processing. What I mean is that you shouldn't go ahead and insert previous translations "blindly" without making sure the selected translation is correct for the context. For example, buttons and dialogue titles may have different styles, sometimes even different parts of speech such as nouns for titles and verbs for buttons!
In my work at a translation company, I mostly used SDL Trados Studio which is a popular tool but quite pricey for hobby use. Therefore, I'm sure I'll be looking into some free open source solutions just out of curiosity.
Have you tried out some translation tools? Any you would recommend?
SDL Trados Studio 🔗
SDL Trados Studio is like an IDE: it has everything integrated. You can attach translation memories, of course, and terminologies created in another SDL tool, Multiterm. You are able to process various file formats or even create your own "filters" (instructions for the tool on how to process a text file). But it's not cheap, so sometimes even freelancers are a bit hesitant to purchase it unless a proper workload is guaranteed.
When I was starting my Hacktoberfest translat-a-thon, I wanted to utilize a translation tool. For my very first pull request, I started working on OmniaWrite which stores translations in a JSON file. I had searched what free tools people recommend and had found Matecat, which claims to support JSON. I signed up, set up the environment which looked nice and clear – but then it was unable to process this specific JSON format. I was too eager to get started with the translation that I didn't have any patience to investigate what was wrong.
Since I have signed up for Matecat already, I'll be sure to try it out properly.
Another recommended tool that I looked into during Hacktoberfest was OmegaT. I remember playing with it in the past at work for some reason. The tool looks rugged but paired with Okapi Framework's ready-made filter plugins or other utilities, I believe it could be quite powerful to handle a lot of different file formats.
However, in my impatient Hacktoberfest state, I didn't want to spend time learning the quirks and setting up the environment (software files usually require a bit of a setup compared to e.g. Word files).
Wordfast Anywhere 🔗
Wordfast is an old product family which has a free online version, Wordfast Anywhere. I haven't found the catch yet – except the fact that the list of supported file types is limited. However, if you're able to process your source file into XLIFF (XML Localization Interchange File Format) which is a standard file format for localization, you're able to translate it with this.
As a good sidekick to a translation memory, you can have a termbase. The simplest termbase contains just a term and its translations – actually, you don't even need translations if the termbase helps in authoring texts – but you can also have context and other information, and sometimes info on forbidden terms that writers shouldn't use. For example, referring to the fairly recent change in Git terminology, you could store information that 'master' is a deprecated term and 'main' is preferred. When connected to an editing tool (for translation or other), QA functionality could warn if the writer has used the incorrect term by accident.
The term entries could be something like this:
Term 🇬🇧: main
Term 🇫🇮: pää-
Term 🇬🇧: master
Term 🇫🇮: isäntä-
Languages are full of synonyms to choose from and termbases help with keeping track of those choices for consistent communication.
A translation memory is not the solution to everything, of course, but when you want to keep consistency in repetitive or continuously updated text types, its memory is definitely better than yours!