DEV Community

Hasanul Islam
Hasanul Islam

Posted on • Updated on

Elasticsearch: Snapshot and Restore with AWS S3

Elasticsearch provides very easy solution to backup and restore. For this tutorial, we will store the backup at AWS S3. We will take snapshots, restore the snapshot, and create a cron job to take snapshot daily in this tutorial. This tutorial is divided into the following sections:

Snapshot :

Snapshot is a backup taken from a running Elasticsearch cluster. We can take a snapshot of individual indices or of the entire cluster. Snapshots are incremental, which means each snapshot of an index only stores data that is not part of an earlier snapshot.

Snapshot Repository :

Snapshot repository is a container that stores snapshot. Snapshots can be stored in either local or remote repositories. Remote repositories can reside on AWS S3, HDFS, Azure, Google Cloud Storage, and other platforms supported by a repository plugin.

To retrieve information about all registered snapshot repositories:

curl -X GET "localhost:9200/_snapshot/_all?pretty"
Enter fullscreen mode Exit fullscreen mode

Registering Snapshot Repository :

We must register a repository to take snapshots and restore from it. To register AWS S3 as a snapshot repository, we will follow the following steps:

AWS Setup :

  • S3 Bucket: In this guide, we will create an S3 bucket named S3-BUCKET-NAME.
  • Custom Policy: We will create a custom policy with the following policy document:
{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::S3-BUCKET-NAME"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::S3-BUCKET-NAME/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}
Enter fullscreen mode Exit fullscreen mode
  • IAM User: Then, we will create an IAM user attaching the custom policy. We need to collect the ACCESS_KEY_ID and SECRET_ACCESS_KEY.

S3 Elasticsearch Plugin Installation :

  • Install S3 plugin:
    cd /usr/share/elasticsearch
    sudo bin/elasticsearch-plugin install --batch repository-s3
Enter fullscreen mode Exit fullscreen mode
  • For easy setup, set -Des.allow_insecure_settings=true to /etc/elasticsearch/jvm.options. For more secure setup, we can use elasticsearch-keystore.

Snapshot Repository Registration :
To store backup in this S3 bucket, we must need to register this bucket as a snapshot repository. We can register this bucket as snapshot registory from command line:

curl -X PUT "localhost:9200/_snapshot/REPOSITORY_NAME?pretty" -H 'Content-Type: application/json' -d'
{
 "type": "s3",
 "settings": {
   "bucket": "S3-BUCKET-NAME",
   "region": "AWS_REGION",
   "access_key": "ACCESS_KEY_ID",
   "secret_key": "SECRET_ACCESS_KEY"
 }
}
'
Enter fullscreen mode Exit fullscreen mode

Taking Snapshot :

We can take a snapshot from running elasticsearch cluser by the following command, Here, SNAPSHOT_NAME is unique per REPOSITORY_NAME.

curl -X PUT "localhost:9200/_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME?wait_for_completion=true&pretty" -H 'Content-Type:application/json' -d'
{
  "indices": "index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": false
 }
'
Enter fullscreen mode Exit fullscreen mode

All snapshots currently stored in the repository can be listed using the following command:

curl -X GET "localhost:9200/_snapshot/my_backup/_all?pretty"

Enter fullscreen mode Exit fullscreen mode

Restoring From Snapshot :

To restore indices from S3, we can do this following:

curl -X POST "localhost:9200/_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME/_restore?pretty" -H 'Content-Type: application/json' -d'
{
  "indices": "index_1,index_2",
  "ignore_unavailable": true,
  "include_global_state": false,              
  "rename_pattern": "index_(.+)",
  "rename_replacement": "restored_index_$1",
  "include_aliases": false
}
'
Enter fullscreen mode Exit fullscreen mode

Monitoring Snapshot and Restore Progress :

We can monitor the status of the snapshot by following command:

curl -X GET "localhost:9200/_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME/_status?pretty"
Enter fullscreen mode Exit fullscreen mode

Daily Backup :

  • Creating a Bash Script: We will create a bash script e.g. daily_elastic_search_backup.sh as the following:
#!/bin/bash

TODAY=$(date +'%Y.%m.%d')
echo Today $TODAY indices will be stored in S3.

ELASTIC_SEARCH_HOST="localhost"
ELASTIC_SEARCH_PORT="9200"
REPOSITORY_NAME="REPOSITORY_NAME"
SNAPSHOT_NAME="snapshot-"$TODAY

echo Starting Snapshot $SNAPSHOT_NAME

curl -X PUT "$ELASTIC_SEARCH_HOST:$ELASTIC_SEARCH_PORT/_snapshot/$REPOSITORY_NAME/$SNAPSHOT_NAME?wait_for_completion=true" -H 'Content-Type:application/json' -d'
{
  "indices": "index-1,index-2",
  "ignore_unavailable": true,
  "include_global_state": false
 }
'
echo Successfully completed storing "$SNAPSHOT_NAME" in S3
Enter fullscreen mode Exit fullscreen mode
  • Adding the Script to Crontab: We can now add following line to crontab to backup every day 12am UTC:
0 0 * * * /home/ubuntu/daily_elastic_search_backup.sh > /home/ubuntu/daily_elastic_search_backup.log 2>&1
Enter fullscreen mode Exit fullscreen mode

Top comments (6)

Collapse
 
lazycat profile image
Sujit

thanks for the tutorial.
if you have time kindly update steps with the latest version of elasticsearch.

Collapse
 
toughcoding profile image
toughcoding

I did recently such article for elasticsearch 8.1

Collapse
 
mustafaqamaruddin profile image
Mustafa Qamar-ud-Din

The SLM kind of takes care of the automation now; elastic.co/guide/en/elasticsearch/...

Collapse
 
aathith_r profile image
AATHITH RAJENDRAN

In "Snapshot Repository Registration" any way to hide access key n secrete key?? @hasanul

Collapse
 
toughcoding profile image
toughcoding

it should be stored in elasticsearch-keystore file which can be password protected as well. recently wrote an article about it.
in repository definition you only keep "bucket" "client" "region" and "endpoint"