DEV Community

loading...

Updating the mapping of an elasticsearch index

Aneesh Makala
A contemplative software engineer with a bias for Python.
・3 min read

Okay, so you've set up elasticsearch. You've indexed your data. Search is super fast. All's good. But, suddenly, you have a requirement for which you need to change the mapping of your index. Maybe you need to use a different analyser, or maybe it's as simple as adding a new field to your document, which requires you to add the associated static mapping.

If you find yourself in such a situation, here are a few approaches you can take -

  • Approach 1 - with downtime; index from external data source.
    • This assumes that you have an external data source such as a database from which you can index data all over again, as if you were doing it for the first time.
    • When to use?
      • This approach only makes sense for testing purposes in local or in staging. This should not be used in a production environment because downtime isn't really desirable.
    • Steps
      • Delete the index using the Delete API
      • Create the index, and set the new mapping using the PUT Mapping API
      • Index documents from external data source. You could do this using the Bulk API
  • Approach 2 - without downtime; index from external data source
    • When to use?
      • You could use this approach in production, but if you have a large number of documents, indexing from an external data source like a DB can be a time-consuming process.
    • Steps
      • If not done already, create an alias index_alias for your existing index (old_index) and change your code to use the alias instead of old_index directly.
      • Create a new index new_index
      • Index documents from external data source. You could do this using the Bulk API
      • Move the alias index_alias from old_index to new_index.
    • Caveats
      • While the downtime is essentially zero, there could still be consistency issues
      • Indexing from an external data source like a DB can be a time-consuming process if you have a large number of documents.
  • Approach 3 - without downtime; index from elasticsearch
    • When to use?
    • Steps
      • If not done already, create an alias index_alias for your existing index (old_index) and change your code to use the alias instead of old_index directly.
      • Create a new index new_index
      • use elasticsearch reindex API to copy docs from old_index to new_index.
      • Move the alias index_alias from old_index to new_index.
    • Caveats
  • Approach 4 - without downtime; update existing index
    • When to use?
      • Can be used in production when you want to merely add a new field mapping.
    • Steps
      • Update mappings of index online using PUT mapping API.
      • Use _update_by_query API with params
        • conflicts=proceed
          • In the context of just picking up an online mapping change, documents which have been updated during the process, and therefore have a version conflict, would have picked up the new mapping anyway. Hence, version conflicts can be ignored.
        • wait_for_completion=false so that it runs as a background task
        • refresh so that all shards of the index are updated when the request completes.
    • Caveats

Discussion (0)