This article is part of a tutorial series on txtai, an AI-powered semantic search platform.
A Spanish language translation of this article is available. Thank you to Chema BescΓ³s for providing this!
This article covers machine translation backed by Hugging Face models. The quality of machine translation via cloud services has come a very long way and produces high quality results. This article shows how the models from Hugging Face give developers a reasonable alternative for local machine translation.
Install dependencies
Install txtai
and all dependencies. Since this article is using optional pipelines, we need to install the pipeline extras package.
pip install txtai[pipeline]
Create a Translation instance
The Translation instance is the main entrypoint for translating text between languages. The pipeline abstracts translating text into a one line call!
The pipeline has logic to detect the input language, load the relevant model that handles translating from source to target language and return results. The translation pipeline also has built-in logic to handle splitting large text blocks into smaller sections the models can handle.
from txtai.pipeline import Translation
# Create translation model
translate = Translation()
Translate text
The example below shows how to translate text from English to Spanish. This text is then translated back to English.
translation = translate("This is a test translation into Spanish", "es")
translation
Esta es una traducciΓ³n de prueba al espaΓ±ol
translate(translation, "en")
This is a test translation into Spanish
Translating multiple languages in a single call
The section below translates a single English sentence into 5 different languages. The results are then passed to a single translation call to translate back into English. The pipeline detects each input language and is able to load the relevant translation models.
languages = ["fr", "es", "de", "hi", "ja"]
translations = [translate("The sky is blue, the stars are far", language) for language in languages]
english = translate(translations, "en")
for x, text in enumerate(translations):
print("Original Language: %s" % languages[x])
print("Translation: %s" % text)
print("Back to English: %s" % english[x])
print()
Original Language: fr
Translation: Le ciel est bleu, les Γ©toiles sont loin
Back to English: The sky is blue, the stars are far away
Original Language: es
Translation: El cielo es azul, las estrellas estΓ‘n lejos.
Back to English: The sky is blue, the stars are far away.
Original Language: de
Translation: Der Himmel ist blau, die Sterne sind weit
Back to English: The sky is blue, the stars are wide
Original Language: hi
Translation: ΰ€ΰ€ΰ€Ύΰ€Ά ΰ€¨ΰ₯ΰ€²ΰ€Ύ ΰ€Ήΰ₯, ΰ€€ΰ€Ύΰ€°ΰ₯ ΰ€¦ΰ₯ΰ€° ΰ€Ήΰ₯ΰ€
Back to English: Sky is blue, stars are away
Original Language: ja
Translation: 倩γ―ιγγζγ―ι γγ
Back to English: The heavens are blue and the stars are far away.
The translation quality overall is very high. Machine translation has made giant leaps and strides the last couple of years. These models give developers a solid alternative to cloud translation services if translating on local servers is preferred.
Discussion (3)
Oh, wow! Thanks! That's an excellent article! Does anyone know if you have to pay to have access to this search platform? I'd like to give it a go and try coding this, but that is highly dependent on the price I have to pay to access the platform. By the way, regarding translations, I've recently had a patent translated by those from circletranslations.com, and they did a fantastic job, so I thought I would give them a shout-out here cause they genuinely deserve it. If anyone needs to translate a patent, they're the first ones to seek help from!
Thanks. txtai is open-source. The translation models are all from Hugging Face (hf.co/models) and are also open.
Give it a try!