DEV Community

Cover image for Resetting RDS retention period using Step Functions & Lambda
neetu-mallan for AWS Community Builders

Posted on

Resetting RDS retention period using Step Functions & Lambda

What is Retention Period?

Retention Period on RDS Instances decides how long automated backups have to be stored.
While creating a DB instance, by default the automated backup frequency is set to 7 days, it means a backup of the instance is stored till 7 days.
In our case, lot of times folks who created the instance forgot to reset this in case the backups weren't needed.

Image description

The retention period can be set to 0 in which case no backups will be taken for the DB instance especially in Dev envs where we had no need to store backups of databases.

When we took the clean up task every week manually, we saw that the process to set this retention period to 0 was tedious as we had to wait for the instance to start in case it was in stopped state then modify it, wait for the modification to complete & then stop the instance.

Image description

Repeat this over for a lot of instances & you could easily spend half a day doing this.

Image description

To automate this, I thought of going with a Serverless solution comprising of Step Functions, Lambda & Eventbridge scheduler.

Prerequisites:

  1. Role for the step Functions to invoke the said Lambda functions & to perform start, stop & modify actions on the RDS instances

Image description

  1. IAM Role for Lambda functions to allow sending callback response to Step Functions(sendTaskSuccess & sendTaskFailure)

Image description

The Step Functions has the foll. states-

  1. Lambda Invocation
  2. Map state
  3. A Choice State.
  4. Workflow 1 when the DB is in stopped state
  5. Workflow 2 when the DB is in starting/rebooting/backing-up state.
  6. Workflow 3 when the DB is in available state.

Image description

Lets go over each state using a short demo:

1.Lambda Invocation : [AWS SDK Integrations]

Lambda Invocation State

2.Map & Choice states :

Map & Choice States

3.DB is in Stopped State:

Workflow 1 Explanation

In the Lambda function we discussed in the demo, the SDK waiter API is implemented for the instance as shown in the below code snippet.

Image description

In the same function to perform the modify action I had to introduce a lag of a few seconds before calling the waiter API, this is because after firing the modify_DB_instance SDK call, the operation is started after a lag of few seconds.

Image description

Note: If the lag is not introduced, as the DB is in available state, the waiter call will be skipped and the token returned thus moving to the next state without applying the modification.

Lets see how the execution flow works for a single DB instance in Stopped state.

Demo: Step Function execution for Instance in Stopped state

Image description

The entire execution for the single instance takes around 14 minutes.

4.DB is in starting/rebooting states:

Workflow 2 Explanation

Let's see how the execution flow works for a single DB instance in Rebooting state.

Demo for Instance in Rebooting State

The entire execution for the single instance takes almost 5 minutes.

5.DB is in Available State:

Workflow 3 Explanation

Lets see how the execution flow works for a single DB instance in Available state.

Demo: Step Function execution for Instance in Available state

The entire execution for the single instance takes almost 3 seconds.

We can introduce a Eventbridge schedule to execute this step function twice a week or once a month to automate the process further.

I have shown a single use case of setting RDS Retention Period to 0 using Step Functions but this can be adopted for other use-cases too where we need to perform a bulk-modify action on RDS instances.

What did I learn from this exercise?
Callback pattern are a powerful feature of Step Functions. I had spent some time adding wait states after calling the start/modify SDK API call but quickly realised that this time could not be predicted as it varied based on the DB engines. Thats when callback pattern came to my rescue

Image description

Please find the code in my github repo:

https://github.com/neetu-mallan/retentionperiodreset

What Resources did I use?

To learn & understand the step functions I have gone through the AWS documentation:

https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html

https://www.youtube.com/watch?v=jXxKRd_9nC0 -- this Step Functions Crash course by Manoj Fernando really helped me in understanding how step functions help in practical use cases.

  • The below 2 links helped me understand the callback pattern implementation:

https://docs.aws.amazon.com/step-functions/latest/dg/callback-task-sample-sqs.html

https://docs.aws.amazon.com/step-functions/latest/dg/connect-to-resource.html

Top comments (3)

Collapse
 
felipegutierrez profile image
Felipe Oliveira Gutierrez

Thanks for this demo. I am learning AWS and this use case sounds very necessary.
One note. I think it is not a good idea to have a sleep function to wait for the instance to start. It can have some flaws. I think if you add a loop with a check of the instance state and a sleep inside it the waiting operation would be less error prone.

Collapse
 
neetumallan profile image
neetu-mallan

Sure Felipe!! Thanks for the comments will change the wait into a loop & re test!!

Collapse
 
avinashdalvi_ profile image
Avinash Dalvi

Detailed blog. Thanks for writing. Different way to managed retention period. πŸ‘πŸ»πŸ‘πŸ»πŸ‘πŸ»