DEV Community

Incident Response on AWS

When it comes to services, more specifically security services, AWS delivers many. With an incident response (IR) plan or strategy, you may be considering many AWS services, but maybe aren't sure which services provide what benefit or help you or your organization with your own security journey.
Challenge/Objective:
As an AWS customer, I was looking for a kind of overview or mapping of each AWS service and where it could be used within a people, process, technology, and partners framework, the primary focus or need is more around IR. I am a visual learner, someone who learns best by doing, not just seeing or hearing others do and explain.
Solution(s):
After some reading and further research, I will walk you through the various AWS services, mapped against some functions which are part of the NIST cybersecurity framework in terms of the identification, detection/analysis, responding/containing/eradication, and recovery. I haven't included all of the NIST CSF functions, my focus here will be more around IR.
These are the various steps, or functions I will be taking you through, mapping back to and including some AWS services. As I gave example actions, remember, this isn't an exhaustive list, but just some ideas to get you thinking of what you may do in your own environments or AWS accounts.
Preparing
People: Does the team have the proper training in terms of AWS, or cloud in general? Performing a skill gap analysis per engineer or team will provide you with the areas which may need some additional training.
Process: Create, and iterate on an IR plan and a strategy that goes out two or three years. With any good plan, it only becomes better and then maybe even best with continuous testing, reviewing, and updating. IR plans, like Disaster Recovery or business continuity plans, are not plans we create once and then store away for the eventual need.
Technology: From an AWS cloud security perspective, have you followed the AWS well-architected framework, specifically the security pillar. Enabling various AWS security services, using security findings and other security data to make data-driven informed decisions, or taking actions swiftly based on some alerts, or other trends. Following AWS security best practices, but setting up and configuring various AWS accounts for logging, and log archives or an account to be used for security operations for example.
Partners: I included this, those familiar with ITIL will relate, various AWS Services including the security services have integrations that can be leveraged. You need to be thinking about what existing AWS or third-party services that you already own and use today. Back to your IR plan and strategy, do you want to continue to use a mix of services and partners, or are you looking for more of an “ALL IN” state? What security findings or security data are you sending to AWS native security services, are you looking to send Security findings or data out to third parties?
Identification
What systems, accounts, data, processes, regions do we need to protect. Think in terms of the various processes and assets if you aren't sure and are just starting to develop and plan and strategy
Detection
This is probably one of my favorite functions, with AWS Security Services, I have a few that I enable and use in just about every AWS account that I set up and configure. Of course, AWS services do cost money, so it's important to have a clear understanding of your own or your customer's requirements. You are looking for the right balance, where you are looking at quantitative risk for example. We security data, security findings, we have verifiable data, we can analyze the effects of the risk. By ranking the severity of security findings, we can use qualitative risk analysis to try and determine the probability and prioritize the various risks that people can understand outside of security and technology for example. Consider potential opportunities where automation can safely be used, maybe that's in AWS Cloudwatch with Lambda or AWS Security Hub Findings with remediation actions for example.
Setup and configure AWS CloudTrail. Consider using Cloudtrail insights which are designed to automatically analyze management events from your CloudTrail trails. This helps establish a baseline for what's normal behavior in your AWS account(s) and then provides more focused Insights events when it detects unusual patterns.
Enable AWS GuardDuty & AWS Security Hub. Setup and configure a separate AWS Security Operations account with a delegated admin user account.
Monitor the AWS Security Hub and AWS GuardDuty Findings.
Setup and configure Dashboards/visualizations for the security findings and other security data coming out of AWS Security Hub.
Setup and configure Amazon Inspector which is similar to AWS GuardDuty in terms of automation is an automated vulnerability management service that continually scans AWS workloads for software vulnerabilities. One key difference between AWS GuardDuty and AWS Inspector is that Inspector doesn't have the machine learning aspects. I would recommend that you have Inspector set up from the start as you deploy your applications. Then enable AWS GuardDuty immediately after to help ensure you receive the alerts on potential threats.
Analysis
Querying the CloudTrail logs using Athena, maybe you already have a SIEM/MDR solution, then send the logs there.
Using AWS detective for situations where you may not have the resources or specific talent in-house to do some of the investigations and Triage of security findings.
Containment
Disabling, or rotating IAM credentials
EC2 isolation using security groups and or NACLs. It's always a great idea to take notes, screenshots, etc so that you can quickly and easily revert changes at a later point as needed.
System/Data-volume backups using snapshots.
Eradication
Leverage AWS systems manager to patch operating systems, other software, securely run commands on the instances.
If you have an existing on-premises or hybrid patching solution, that's ok too, but of course, that won't include the AWS SSM capabilities.
Recovery
The focus is returning systems or environments to an operational state, before an incident or other such event.
If you are using IaC (Infrastructure as Code) for example, perhaps you have AWS Cloud-formation, Ansible, or Terraform scripts you will use to simply redeploy the various environments which were impacted.
Perhaps this doesn't apply if you are going to redeploy using IaC, or maybe this applies partially, but if you modified NACLs or Security groups during the containment process, you may need to restore them back to a prior or original state.
Post-incident/event activity
Write-ups and documentation help capture the important aspects around who, what, why, when, how.
Other lessons learned, corrective actions, continuous improvement ideas for the future.
Playbooks
Gives the team(s) central, up-to-date, technical information when they respond to an incident.
Various what-if scenarios, coupled with qualitative risk analysis, teams can be prepared for the higher likelihood and higher impact incidents or events such as an AWS IAM user access key becoming comprised, or some aspect of AWS instances becoming comprised and being used for Crypto mining.

Discussion (0)