DEV Community

Cover image for 👨‍💻👩‍💻How to use AWS Systems Manager Automation Document🧐to fix✅SSH issue on a Linux EC2 Instance☁️📢
Dinesh Rathee
Dinesh Rathee

Posted on • Updated on

👨‍💻👩‍💻How to use AWS Systems Manager Automation Document🧐to fix✅SSH issue on a Linux EC2 Instance☁️📢

When you face SSH issues over EC2 Instance👨‍💻☁️, What procedure do you follow ?

SSH issue

  • Stop the Instance --> detach root volume --> attach volume to rescue Instance --> and then further steps to troubleshoot issue

Right ?🤔💭

Here's something to Automate this procedure⏩⏩

  • "This procedure will save a lot of time which we invest with manual process. AWS Systems Manager Automation Document - "AWSSupport-ExecuteEC2Rescue" does all the steps for you in an automated fashion."

How it works ?👨‍🏭

  • It uses EC2Rescue for Linux & EC2Rescue for Windows tool, which is a framework for executing diagnostic and troubleshooting modules for analyzing and remediating issues on Windows and Linux instances.

Here I will discuss about most common SSH issue on Linux instance.


Also, I will guide you through Console way to get familiar with the workflow. If you're interested in using AWS CLI, please check out my Blog Post


How "AWSSupport-ExecuteEC2Rescue" works in the background ?💻

  • This Automation document executes EC2Rescue for Linux on an offline instance (which does not need to have the SSM agent installed or be user-accessible) by creating rescue resources, moving the root volume of the target instance to the rescue instance, and later reattaching the root volume to the original instance. - All will be done via an automated set of steps in the document, which are as listed here:

How AWS Systems Manager Automation Document - "AWSSupport-ExecuteEC2Rescue" works in the background

To know more on this document walkthrough, please check here.

Prerequisites🔎⚙️

Before you begin with next steps, you should have:

  • Required: "Instance ID" of the unreachable instance. You will specify this ID in the procedure.
  • In addition, there are some optional parameters, which you can refer here
  • The IAM role for this execution. If no role is specified, AWS Systems Manager Automation will use your IAM permissions of the user logged in to execute this document. To know more on granting permissions by using IAM policies, please refer here

I will explain more in details with below real time example.🎯🎯

Now, I will show you how to use AWS Systems Manager Automation Document - "AWSSupport-ExecuteEC2Rescue" in real time use-case for a common Linux SSH issue.
2

Issue😨😨:-

(I have changed permissions of */home directory to 777** and now I am not able to SSH and getting “Permission denied")*🚫😜
3

Checking SSH verbose output:

# ssh -vvv -i "eu-west-2_key_pair.pem" ec2-user@ec2-xxxxxxxxxxx.eu-west-2.compute.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

4

Getting permission denied error😖😖

Now , let’s use "AWSSupport-ExecuteEC2Rescue" Automation Document to Fix this issue:

• Open the AWS Systems Manager via EC2 Console -- Type "Systems Manager"

5

• In the navigation pane, choose Automation ----> Choose Execute Automation.
6

• In the Automation document section categories, You can choose "Self service support workflows" and check for the Document named - "AWSSupport-ExecuteEC2Rescue" like below and click “Next”
7

• Please provide the Instance ID of unreachable Instance (Required) and click Execute button to start Automation workflow:
8

Some Information on Input Parameters:💭💭

Mandatory / Required:

  • UnreachableInstanceId : (Required) ID of your unreachable EC2 instance. IMPORTANT: AWS Systems Manager Automation stops this instance, and creates an AMI before attempting any operations. Data stored in instance store volumes will be lost. The public IP address will change if you are not using an Elastic IP.
  • EC2RescueInstanceType : (Required) The EC2 instance type for the EC2Rescue instance. Recommended size: t2.small. (by default it is auto-selected )

Optional but could be really useful:

  • LogDestination: (Optional) S3 bucket name in your account where you want to upload the troubleshooting logs. > Make sure the bucket policy does not grant unnecessary read/write > permissions to parties that do not need access to the collected logs.

Now, Let’s proceed further
• Once you click on "Execute" Button, the Automation will start and you can the Status “In Progress”:
9

• To see more details you can click on the "Execution ID"
10

Checking what it does in the background:

• Procedure creates an SSM helper Instance / recovery Instance with name tag AWS-Support-EC2Rescue-I-xxxx
11

• Creates an AMI as well for Backup Purpose before it executes further steps:
12

• Stops the Problematic Unreachable Instance and Detaches the Root Volume from it and Attaches on the SSM enabled Recovery/Helper Instance:
13

• Then procedure runs EC2Rescue for Linux over helper Instance to Fix the Issue and you can also track the steps below:
14

• You can always check the details of each "Execution ID" and its Associated steps using “Step ID” , For example, checking details of “Step-2” below:
15

Please Note

“It may show a step failed for Windows since the Instance is Linux,
So don't worry about it”🙆‍♂️👍

16

• You can monitor the Overall Status of the Procedure using the Execution Status tab under Automation Executions and wait for it to get “Success” which marks it as complete.
17

Now, checking the Instance state again - You can see the Rescue / SSM enabled Helper Instance has been terminated and Automation have Started the Problematic Unreachable Instance again after fixing the issue:
18

Now, Let's Try to connect to the Instance:👩‍💻🤔
19

I am able to SSH and Issue has been fixed now. ✅✅🏁🏁

Interested to know about logs to see what was fixed on Instance to gain access?❓🙋‍♂️

• You can check the details of Step ID “getEC2RescueForLinuxResult” which will provide you a location of the log presence on instance.
20

• Please make a note of the location displayed there for example, In my case - The output logs are located in: /var/tmp/ec2rl/2020-06-25T01_44_08.560197
21

• Now, I will check the Log location on instance and see detailed information as follows:
22

Hence, I have also identified what fixes have been applied by EC2Rescue for Linux over instance to fix the issue in an automated way.

Conclusion🎯🧐📚🤖

AWSSupport-ExecuteEC2Rescue is a new Automation document that automates all the steps required to fix common issues on your unreachable Windows & Linux instance using respective EC2Rescue for Linux & EC2Rescue for Windows tool tools, which is a framework for executing diagnostic and troubleshooting modules for analyzing and remediating issues.

Overall, It will save a lot of time which is being invested with manual process of detach, attach volume and further steps to troubleshoot issues. This document does all the steps for you in an automated way in just few minutes.

Tip

With the integration between CloudWatch Events and Systems Manager
Automation, you can run AWSSupport-ExecuteEC2Rescue automatically in
response to an event in your infrastructure.

SSH issue Fixed


Also, I would like to mention that above I've guided you through Console way to get familiar with the workflow. If you're interested in using AWS CLI, please check out my Blog Post


Thanks for reading.
Any feedback, please write it to me here in comments..
Also, 🤝🤗You can connect with me🤝🤗

Top comments (4)

Collapse
 
andrewbrown profile image
Andrew Brown 🇨🇦

Loving the detail. Not many examples for SSM services, so keep it going.

Collapse
 
dineshrathee12 profile image
Dinesh Rathee

Thanks Andrew for the encouragement :) Started playing around it, will write up more in future.
You may also check recent on on AWS Systems Manager Automation to Install a LAMP Web Server & Hosting a WordPress Blog on Amazon Linux or Amazon Linux 2 which is basically an automated version of Documentation , Check details on my Blog post

Collapse
 
mazroof profile image
mazroof

Great Article Wel explained and easy to fallow.. Great

Collapse
 
dineshrathee12 profile image
Dinesh Rathee

Thanks for taking a look on this @mazroof :)