When you face SSH issues over EC2 Instance👨💻☁️, What procedure do you follow ?
- Stop the Instance --> detach root volume --> attach volume to rescue Instance --> and then further steps to troubleshoot issue
- "This procedure will save a lot of time which we invest with manual process. AWS Systems Manager Automation Document - "AWSSupport-ExecuteEC2Rescue" does all the steps for you in an automated fashion."
Here I will discuss about most common SSH issue on Linux instance.
Also, I will guide you through Console way to get familiar with the workflow. If you're interested in using AWS CLI, please check out my Blog Post
- This Automation document executes EC2Rescue for Linux on an offline instance (which does not need to have the SSM agent installed or be user-accessible) by creating rescue resources, moving the root volume of the target instance to the rescue instance, and later reattaching the root volume to the original instance. - All will be done via an automated set of steps in the document, which are as listed here:
To know more on this document walkthrough, please check here.
Before you begin with next steps, you should have:
- Required: "Instance ID" of the unreachable instance. You will specify this ID in the procedure.
- In addition, there are some optional parameters, which you can refer here
- The IAM role for this execution. If no role is specified, AWS Systems Manager Automation will use your IAM permissions of the user logged in to execute this document. To know more on granting permissions by using IAM policies, please refer here
Now, I will show you how to use AWS Systems Manager Automation Document - "AWSSupport-ExecuteEC2Rescue" in real time use-case for a common Linux SSH issue.
Checking SSH verbose output:
# ssh -vvv -i "eu-west-2_key_pair.pem" firstname.lastname@example.org
Getting permission denied error😖😖
Now , let’s use "AWSSupport-ExecuteEC2Rescue" Automation Document to Fix this issue:
• Open the AWS Systems Manager via EC2 Console -- Type "Systems Manager"
Some Information on Input Parameters:💭💭
Mandatory / Required:
- UnreachableInstanceId : (Required) ID of your unreachable EC2 instance. IMPORTANT: AWS Systems Manager Automation stops this instance, and creates an AMI before attempting any operations. Data stored in instance store volumes will be lost. The public IP address will change if you are not using an Elastic IP.
- EC2RescueInstanceType : (Required) The EC2 instance type for the EC2Rescue instance. Recommended size: t2.small. (by default it is auto-selected )
Optional but could be really useful:
- LogDestination: (Optional) S3 bucket name in your account where you want to upload the troubleshooting logs. > Make sure the bucket policy does not grant unnecessary read/write > permissions to parties that do not need access to the collected logs.
• Then procedure runs EC2Rescue for Linux over helper Instance to Fix the Issue and you can also track the steps below:
Please Note –
“It may show a step failed for Windows since the Instance is Linux,
So don't worry about it”🙆♂️👍
• Now, checking the Instance state again -
You can see the Rescue / SSM enabled Helper Instance has been terminated and Automation have Started the Problematic Unreachable Instance again after fixing the issue:
I am able to SSH and Issue has been fixed now. ✅✅🏁🏁
Hence, I have also identified what fixes have been applied by EC2Rescue for Linux over instance to fix the issue in an automated way.
AWSSupport-ExecuteEC2Rescue is a new Automation document that automates all the steps required to fix common issues on your unreachable Windows & Linux instance using respective EC2Rescue for Linux & EC2Rescue for Windows tool tools, which is a framework for executing diagnostic and troubleshooting modules for analyzing and remediating issues.
Overall, It will save a lot of time which is being invested with manual process of detach, attach volume and further steps to troubleshoot issues. This document does all the steps for you in an automated way in just few minutes.
With the integration between CloudWatch Events and Systems Manager
Automation, you can run AWSSupport-ExecuteEC2Rescue automatically in
response to an event in your infrastructure.
Also, I would like to mention that above I've guided you through Console way to get familiar with the workflow. If you're interested in using AWS CLI, please check out my Blog Post
Thanks for reading.
Any feedback, please write it to me here in comments..
Also, 🤝🤗You can connect with me🤝🤗