Abstract
- AWS Fault Injection Simulator now supports Spot Interruptions, now you can trigger the interruption of an Amazon EC2 Spot Instance using AWS Fault Injection Simulator (FIS).
- With FIS, you can test the resiliency of your workload and validate that your application is reacting to the interruption notices that EC2 sends before terminating your instances.
- This blog guide you step-by-step to create FIS Experiment templates using AWS CDK
Table Of Contents
- Overview of EC2 spot instance
- Simulate Spot Interruptions architect
- Create Lambda function - send slack
- Create event rule of spot interruption
- Create FIS service role
- Create FIS Experiment Template
- Start experiment template
- Conclusion
🚀 Overview of EC2 spot instance
- Amazon EC2 Spot Instances reduce the cost up to 90% but can be interrupted or reclaimed at any time with warning in 2 mins.
- We can use
aws-node-termination-handler
to ensures that the Kubernetes control plane responds appropriately to events that can cause your EC2 instance to become unavailable
🚀 Simulate Spot Interruptions architect
- Starting the FIS experiment which sends
send-spot-instance-interruptions
event. - Use cloudwatch event rule to catch
EC2 Spot Instance Interruption Warning
event and then trigger lambda function for sending slack notifications. -
aws-node-termination-handler
kubernetes DaemonSet also takes action when catching the event
Now we start creating CDK stacks
🚀 Create Lambda function - send slack
-
Lambda handler parses the event to send slack message which contains event detail-type, instance ID and action
app.py
import requests from datetime import datetime import json def send_slack(msg): """ Send payload to slack """ webhook_url = "https://hooks.slack.com/services/******" footer_icon = 'https://cdkworkshop.com/images/new-cdk-logo.png' color = '#36C5F0' level = ':white_check_mark: INFO :white_check_mark:' curr_time = datetime.now().strftime('%Y-%m-%d %H:%M:%S') payload = {"username": "Test", "attachments": [{ "pretext": level, "color": color, "text": f"{msg}", "footer": f"{curr_time}", "footer_icon": footer_icon}]} requests.post(webhook_url, data=json.dumps(payload), headers={'Content-Type': 'application/json'}) def handler(event, context): detail_type = event.get('detail-type', '') instance_id = event['detail']['instance-id'] action = event['detail']['instance-action'] message = f'{detail_type}\nresource: {instance_id}, action: *{action}*' send_slack(message)
-
Lambda stack
lambda.ts
const send_slack = new lambda.Function(this, 'slackLambda', { description: 'Send Event message to slack', runtime: lambda.Runtime.PYTHON_3_8, code: lambda.Code.fromAsset('lambda-code/app.zip'), handler: 'app.handler', functionName: 'send-slack-spot-event' });
🚀 Create event rule of spot interruption
-
The event listens to
EC2 Spot Instance Interruption Warning
to trigger the above lambda functionevent.ts
const spot_event = new event.Rule(this, 'SpotEventRule', { description: 'Spot termination event rule', ruleName: 'spot-event', eventPattern: { source: ['aws.ec2'], detailType: ['EC2 Spot Instance Interruption Warning'], detail: { 'instance-action': ['terminate'] } } }); spot_event.addTarget(new event_target.LambdaFunction(send_slack));
🚀 Create FIS service role
-
IAM role for AWS FIS permissions to handle the target resources here is EC2 instance
fis_role.ts
const fis_role = new iam.Role(this, 'FisRole', { roleName: 'spot-fis-test', assumedBy: new iam.ServicePrincipal('fis.amazonaws.com') }); const ec2_policy_sts = new iam.PolicyStatement({ sid: 'SpotFisTest', effect: iam.Effect.ALLOW, actions: [ 'ec2:DescribeInstances', 'ec2:StopInstances', 'ec2:SendSpotInstanceInterruptions' ], resources: ['arn:aws:ec2:ap-northeast-1:*:instance/*'], conditions: { 'StringEquals': {'aws:RequestedRegion': props?.env?.region} } }); fis_role.addToPolicy(ec2_policy_sts);
🚀 Create FIS Experiment Template
-
The experiment template includes:
- Action:
send-spot-instance-interruptions
, parameter:durationBeforeInterruption
PT2M
- Targets:
- Resource type:
aws:ec2:spot-instance
- Resource filters:
State.Name=running
- Selection mode:
COUNT(1)
- Action:
-
Stack
fis.ts
const target: fis.CfnExperimentTemplate.ExperimentTemplateTargetProperty = { resourceType: 'aws:ec2:spot-instance', resourceTags: {'eks:nodegroup-name': 'eks-airflow-nodegroup-pet'}, selectionMode: 'COUNT(1)', filters: [{ path: 'State.Name', values: ['running'] }] }; const action: fis.CfnExperimentTemplate.ExperimentTemplateActionProperty = { actionId: 'aws:ec2:send-spot-instance-interruptions', parameters: {'durationBeforeInterruption': 'PT2M'}, targets: {'SpotInstances': 'spot-fis-target'} }; const fis_exp = new fis.CfnExperimentTemplate(this, 'FisExperiment', { description: 'Spot Interruption Simulate', roleArn: fis_role.roleArn, tags: { 'Name': 'spot-interrupt-test', 'cdk': 'fis-stack' }, stopConditions: [ {source: 'none'} ], targets: {'spot-fis-target': target}, actions: {'send-spot-instance-interruptions': action} });
🚀 Start experiment template
- Start
- Complete
- Slack notify the event and
aws-node-termination-handler
action either
🚀 Conclution
- This kind of FIS experiment help us to test the scenario of spot interruption to check
aws-node-termination-handler
and fault tolerance of application - We should also know about FIS pricing. The AWS FIS price is
$0.10
per action-minute.
Top comments (0)