DEV Community

Talha Munir 🇵🇸
Talha Munir 🇵🇸

Posted on

Backing up and restoring in EDB BigAnimal


BigAnimal backs up the data in your PostgreSQL clusters. BigAnimal uses any of the following storage solutions which certainly depends on the cloud your cluster is deployed on.

  • AWS uses Amazon S3 (standard tier) as an Object storage solution
  • Azure uses Azure blob storage as an Object storage solution.
  • Google Cloud uses cloud storage as an Object storage solution.

The organization that has purchased all the subscriptions is responsible for all the charges related to the object storage solution.

PostgreSQL clusters in BigAnimal are continuously backed up through a combination of base backups and transaction log archiving. When a new cluster is created an initial base backup is taken. After that whenever the WAL file is closed, which by default happens after every 5 minutes. It is uploaded to the cloud object storage solution. If your cluster has faraway replicas the application will copy the WAL files from your cloud object storage solution and asynchronously transfer them to the faraway replicas. Your organization is solely responsible for the charges that are associated with the cloud object storage solution.

Replication lags with faraway replicas:

With faraway replicas, the primary server writes to the archive, which is moved to the object store. The replica reads from the object store as files arrive. Prior to the archive replicating across to the replica, the log file must write at least 16MB of data to the WAL to cut a new archive file. If the time interval for closing a WAL file is too large, it can introduce a delay until the replica receives the latest record. The amount of time that it takes to fill up the 16 MB log file and copy it to the archive is the replication lag. The replication lag is the period of time when data can be lost. The time totally depends on the amount of activity in your database. You need to measure the replication lag to determine if it's acceptable from a possible data-loss perspective. If it isn't acceptable, consider using a distributed high-availability cluster.



Top comments (0)