DEV Community

Furkan Kalkan
Furkan Kalkan

Posted on

Sorting Multilanguage Text Properly on OpenSearch

If you have multi-language or non-English content and use OpenSearch, default sort method will not sort the content alphabetically. Default sort method use Unicode values of characters in order to sort texts alphabetically, it works in English but fails in most non-English languages. OpenSearch documentation did not address this problem. In Elasticsearch documentation there is plugin named analysis-icu mentioned to solve this issue 1. This plugin supported by OpenSearch too 2. There is not much information about OpenSearch specific version of this plugin but usage is same as Elasticsearch one:

  1. Install the plugin on each node(s):

    /usr/share/opensearch/bin/opensearch-plugin install analysis-icu --batch
    


    You can use init containers method to make this if you use Kubernetes. Don't forget the mount the plugin directory /usr/share/opensearch/plugins/ on both container.

  2. After installation of plugin, restart your nodes.

  3. Add sort subfield to your fields 3 :

{
  "mappings": {
    "properties": {
      "title": {   
        "type": "text",
        "fields": {
          "sort": {  
            "type": "icu_collation_keyword",
            "index": false
          }
        }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

You can add language and country parameters after type if your content in single language. Also you can add numeric: true parameter to sort numbers in text in correct order.

Since our subfield used only for sorting, use "index": false to turn off indexing of field.


  1. https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu.html  

  2. https://opensearch.org/docs/1.3/install-and-configure/plugins/#bundled-plugins 

  3. https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-collation-keyword-field.html 

Top comments (0)