loading...

Auto Translating Search with Algolia & IBM Watson

martyndavies profile image Martyn Davies ・6 min read

Out of the box, Algolia supports search in multiple languages, even on the same index. This is fantastic if you already have all the translations in place, but what if you don't?

It's a known fact that if you have users in other parts of the world they will appreciate any efforts that you put in to provide them with native language support.

I'm going to show you how you can use IBM Watson's language translation service to automatically translate your key search field and write the translations back to your index.

In this example, we'll be using NodeJS, but the concept applies no matter what language you're using on the server side. To highlight this, I've included an example written in Go in the GitHub repository.

Context

It's really up to you which text you want to translate but for this example, let's assume that we have an Algolia index full of holiday rental properties and the initial objects look like this:

{
  "apartment_name": "Maison Majestique",
  "city": "Toulouse",
  "country": "France",
  "description_en": "Three bedrooms and two bathrooms. Located 5 minutes walk to all major tourism areas.",
  "objectID": "60329230"
}

Right now, the descriptions are all in English but there's an increasing amount of traffic coming to the website from Spain. We're seeing more an more attempts to search in Spanish in our Algolia dashboard.

So the time has come to automate translating the description from English to Spanish.

Prerequisites

For all of these examples, and the scripts you can take away and use, we're using the IBM Watson Language Translation service.

In order to use this yourself, you'll need to register with IBM Coud, and then spin up a new instance of Language Translator.

Once it's ready, you'll need to grab the credentials and keep them handy. Unfortunately, the user experience in the IBM console isn't as easy as it could be, so to help you out, this is what you're looking for:

IBM Credentials

Let's dig in, shall we?

How the translator works

IBM provides a series of full-featured SDKs for just about every language. However, previous experience with using their NodeJS SDK showed me that, in the case of the translation service, making a standard HTTP request to their API would be around the same amount of code, and likely faster to return results.

Here's an example of a translation request using Axios as our HTTP request module in NodeJS:

var axios = require('axios');

axios({
  method: 'post',
  url: 'https://gateway.watsonplatform.net/language-translator/api/v2/translate',
  data: {
    text: 'I am text, please translate me', // The words to be translated
    source: 'en', // The language they are in
    target: 'es' // The language you want them to be
  },
  headers: { Accept: 'application/json' },
  auth: { username: "ibm_username", password: "ibm_password" }
})
.then(function(response) {
  console.log(response);
})
.catch(function(err) {
  console.log(err);
});

The response that comes back from this request contains an array of objects, one of which is the translated text you're looking for.

For whatever reason it's nested quite deeply, so to actually extract the string of text, you would be looking for this:

...
function(response) {
  var translation = response.data.translations[0].translation
}

😐

Either way, it's not a huge amount of code. So very quickly we're up and running with our translations.

So, how do we get this information back into Algolia and start using it to impress our audience?

There are a number of different ways to do this and we're going to take a look at two of them.

The first is to update an object with a new translation immediately after it is indexed:

Example - Using waitTask

Algolia provides a method called waitTask that allows you to ensure an object has been indexed before performing the next line of code.

You can use this as a way of triggering an action on an object, like adding a new field with a translated string in it:

The code above certainly achieves what we need but there are some downsides to doing it this way.

  • Getting the translation is dependent on calling a 3rd party service that may not always respond in a timely manner.
  • We're assuming that the Spanish translation needs to be created immediately after the object is first indexed. Realistically, does it need to happen this quickly?
  • It's not very clean. You can abstract away to a function in an external module that returns a promise, and then just pass the objectID over to that.

If you're thinking about abstracting this code away into a module, then consider the next option: Not performing the translation immediately, and instead opting to have this added as part of a background task, CRON job or worker queue.

Example - Translate in the background with CRON, queues or magic

For this example, I've created a script very similar to what you see above, but that could be run on the command line instead, like this:

$ node translate.js 345645 es

Here we're asking Node to run translate.js and pass in the Algolia objectID of 345645 and a targeted language of es (for Spanish).

The translate.js script looks like this:

This script is more suited to use with CRON, but you could also have the algoliaObjectID and targetLanguage variables set from anywhere, including the contents of a message that's being held in a queue. The choice of how you get those arguments in is really up to you.

Which way is better?

Both approaches have their merits.

The first example is low overhead - you're not messing around with queues or external files and things being triggered outside of the flow of the code.

The second example gives you greater control outside of the main application logic you created and you aren't as likely to run into issues with requests (in this case from IBM) that might fail, or take a long time to return a result.

If you're learning to use Algolia or working on projects that have a lower throughput of objects to be indexed, then the first example is perfectly fine to use.

If you're working on something that's more established, or you don't like the idea of having such a large amount of logic sitting in the callback of your index.addObject method then, by all means, use the second approach and tailor it however you like.

How to search across multiple languages

I won't go into too much depth on front-end search using Algolia because there's plenty to read on that in the documentation.

Let's say your index settings are set to have description_en as the main search field:

index.setSettings({
  searchableAttributes: "description_en"
});

All you would need to do to make your new description_es field searchable would be to make a small change:

index.setSettings({
  searchableAttributes: ["description_en, description_es"]
});

Then you're all set! There's also a very handy Helper Widget that can also be used to offer up search choices (including languages) to the user to select for themselves.

You said something about Go?

Yes, given this example is very NodeJS heavy, I wanted to create the sample script in another language so you could see the similarities, but also to demonstrate that this can be achieved in whatever language you want.

Check out the Go example on GitHub.

All the scripts, including a version of what you see outlined here written using ES6 principles, can be found over on GitHub.

Posted on by:

martyndavies profile

Martyn Davies

@martyndavies

At this point, I'm probably 85% JavaScript and 15% dog memes. I hack on code and write about it, for a living!

Discussion

markdown guide