DEV Community

Denis Anisimov
Denis Anisimov

Posted on

Automatic Firestore Backups

Cloud Firestore has an export/import functionality using CLI and REST APIs that allows you to make simple backups to a Cloud Storage bucket. It basically looks like this:

gcloud beta firestore export gs://BUCKET_NAME --project PROJECT_ID
Enter fullscreen mode Exit fullscreen mode

If you are as clunky as me and prone to messing up with database you'd probably want to have it backed up somewhat often. But unfortunately, even with Firestore in GA, there is no managed automatic backups option yet. Here is what my solution looked like for a while:

Backup reminders in Google Calendar

Not very DRY, is it? Here is a better way using only Google Cloud Platform services and a tiny bit of coding.

Overall idea

  1. Cloud Scheduler issues a message to a Cloud PubSub topic periodically based on a cron schedule.
  2. Cloud Function is triggered by the message and executes a call to the Firestore REST API
  3. Firestore export API starts a long-running operation that saves a backup to a specified bucket
  4. Cloud Storage stores backups organized by timestamps

Prerequisites

This post assumes that you have your Firebase project set up, have gcloud command-line tool installed and know your way around Cloud Console.

Cloud Scheduler

Cloud Scheduler is a very simple service by GCP, but nevertheless very useful and much awaited by the server-less crowds. With it it's finally possible to define simple cron-like jobs that can trigger HTTP functions or emit a PubSub message. For use with Cloud Functions a PubSub way is preferred, as HTTP jobs don't have any authentication.

The following command creates a PubSub job scheduled at midnight everyday. The message body is not used in our case, so can be arbitrary.

gcloud scheduler jobs create         \
    pubsub firestore-backup          \
    --schedule "0 0 * * *"           \
    --topic "firestore-backup-daily" \
    --message-body "scheduled"            
Enter fullscreen mode Exit fullscreen mode

Once you execute this command your job will run at the specified time.

Cloud Function

The cloud function is triggered on PubSub message and makes a request to Firestore REST API. To make a request with proper authentication we need to have an OAuth access token. Luckily for us Cloud Functions have a default service account that can be easily used to generate a token.

import * as functions from 'firebase-functions';
import * as firebase from 'firebase-admin';
import * as request from 'request-promise';

firebase.initializeApp();

export const backupOnPubSub = functions.pubsub.topic('firestore-backup-daily').onPublish(async () => {
  // Get bucket name from cloud functions config
  // Should be in the format 'gs://BUCKET_NAME'
  //
  // Can be set with the following command:
  //   firebase functions:config:set backup.bucket="gs://BUCKET_NAME" 
  const bucket = functions.config().backup.bucket;

  // Firebase/GCP project ID is available as an env variable
  const projectId = process.env.GCLOUD_PROJECT;

  console.info(`Exporting firestore database in project ${projectId} to bucket ${bucket}`);

  // Use default service account to request OAuth access token to authenticate with REST API
  // Default service account must have an appropriate role assigned or
  // the request authentication will fail
  const { access_token: accessToken } = await firebase.credential.applicationDefault().getAccessToken();

  const uri = `https://firestore.googleapis.com/v1/projects/${projectId}/databases/(default):exportDocuments`;
  const result = await request({
    method: 'POST',
    uri,
    auth: { bearer: accessToken },
    body: {
      outputUriPrefix: bucket,
    },
    json: true,
  });

  // The returned operation name can be used to track the result of the long-running operation
  //   gcloud beta firestore operations describe "OPERATION_NAME"
  const { name } = result;
  console.info(`Export operation started ${name}`);
});
Enter fullscreen mode Exit fullscreen mode

Don't forget to set the config and deploy

firebase functions:config:set backup.bucket="gs://BUCKET_NAME"
firebase deploy --only functions:backupOnPubSub
Enter fullscreen mode Exit fullscreen mode

Firestore REST API permissions

Import/Export Rest API requires appropriate OAuth scopes to be enabled. You can achieve that by adding Cloud Datastore Import Export Admin role to the default service account in your project.

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member serviceAccount:PROJECT_ID@appspot.gserviceaccount.com \
    --role roles/datastore.importExportAdmin
Enter fullscreen mode Exit fullscreen mode

Cloud Storage

In case you don't have a bucket yet you should create one. I've found that Firestore import/export requires at least regional storage class, coldline or nearline won't work.

gsutil mb gs://BUCKET_NAME/
Enter fullscreen mode Exit fullscreen mode

Testing everything together

Once everything is set up you can trigger the whole process from the Cloud Scheduler console by clicking "Run now" button

Cloud Scheduler console

Check logs in the Firebase Console or Stackdriver
Backup function logs

Check export operation status

gcloud beta firestore operations describe "projects/PROJECT_ID/databases/(default)/operations/ASA3NDEwOTg0NjExChp0bHVhZmVkBxJsYXJ0bmVjc3Utc2Jvai1uaW1kYRQKLRI"
done: true
metadata:
  '@type': type.googleapis.com/google.firestore.admin.v1.ExportDocumentsMetadata
  endTime: '2019-03-09T21:04:39.263534Z'
  operationState: SUCCESSFUL
  outputUriPrefix: gs://PROJECT_ID.appspot.com/2019-03-09T21:04:32_85602
  progressBytes:
    completedWork: '5360'
    estimatedWork: '4160'
  progressDocuments:
    completedWork: '40'
    estimatedWork: '40'
  startTime: '2019-03-09T21:04:32.862729Z'
Enter fullscreen mode Exit fullscreen mode

And finally check the backup in the bucket
Backup folder in the bucket

Next steps

Automatic backup is only useful if it's working reliably. It is a good idea to have monitoring and alerting for your backup operations.

One way to do this is to save the last export operation to Firestore and schedule a job some time after to check the result of the long-running operation. If the results is not successful it can send an email or log an error that will trigger an alert through Stackdriver Error Reporting.

Another improvement is enabling of automatic deletion of old backups after a certain time. This could be achieved with lifecycle management.

Closing remarks

Would love to hear you thoughts on this approach and how it could be improved.

Happy coding! Make backups :)

Top comments (0)