DEV Community

Winnie Kiragu
Winnie Kiragu

Posted on • Edited on

Monitoring in AWS

Monitoring Containers in AWS

Monitoring is crucial for all deployments as it lets you debug and gain a deeper understanding of the application you are trying to develop/maintain. This article is meant to help you get comfortable with logging/tracing on AWS for serverless infrastructure.

For AWS serverless infrastructure like Lambda/App Runner, troubleshooting may be one of the most difficult tasks, especially for those coming from a more traditional VM-based development infrastructure as you cannot log on to the VM or docker images.

There are several monitoring services that you can use that are easily integrated and managed depending on your use-case or application needs. While working with container workloads on AWS, one of the easiest ways to get started with monitoring is to leverage AWS monitoring tools. You can always use these third-party services. However, be cognisant that they do not increase the operational complexity for maintaining or deploying your application.

To begin, let’s be in the know that AWS CloudWatch collects monitoring and operational data in the form of logs, traces and metrics.


A) Logs

  • AWS documentation of log concepts cant be beat. They really do capture everything in a very easy to understand way. And as always, its best practice to always refer to the official documentation.
  • Truthfully, the most difficult thing about navigating this is finding the particular cloudwatch group/stream. To ensure you dont get into this loophole, always ensure you label your log groups properly when creating them.
  • Enabling logging through AWS Cloudwatch is automated for you if you are working with serverless infrastructure e.g ECS, EKS, Lambda or AWS App Runner. It's as simple as clicking on a checkbox to enable logging.
  • Self managed infrastructure requires abit more configuration, but be sure to label the log groups to also distinguish log groups for your dev environments and for your prod environments.
  • Tagging resources is also a great way to easily track the correct log groups, so review AWS documentation on tagging and evaluate if you may need to adapt anything in your current tagging processes.

B) Traces

  • A trace collects all the segments generated by a single request. That request is typically an HTTP GET or POST request that travels through a load balancer, hits your application code, and generates downstream calls to other AWS services or external web APIs.
  • X-Ray is also located on the AWS Cloudwatch dashboard under the Xray traces tab.
  • AWS X-Ray provides a visual map of successes and failures and lets you drill into individual traces for an execution and drill down into the details of how long each leg of the execution took.
  • You can only view your application traces if you had enabled your application to collect this data. Dont be weary though, AWS Xray does not incur any additional fee or committment fee for either storage of these traces or for retrieving/viewing this data.
  • The only downside is that as of now is that it is not able to to enable tracing in quite a few AWS services, but if you are setting up your infrastructure with AWS Lambda, this is something you may want to try out.

C) Metrics

  • A metric represents a time-ordered set of data points that are published to CloudWatch.
  • Think of a metric as a variable to monitor, and the data points as representing the values of that variable over time. For example, the CPU usage of a particular EC2 instance is one metric provided by Amazon EC2.
  • Metrics belong to namespaces ie Ec2, Lambda. What this means is that collected metrics are available in CloudWatch automatic dashboards, and also viewable in the Metrics section of the CloudWatch console. Within these dashboards, they are organized into namespaces. Check it out...lolz (see me trying to get you to log into AWS there...haha)
  • I've mentioned something pretty cool here that I want to also talk more about; i.e CLoudwatch container insights.

CloudWatch Container Insights

  • CloudWatch Container Insights makes it easy to collect metrics like CPU, memory, disk, and network utilization, as well as log information, in one centralized location.
  • AWS documentation on using container insights defines that this service can be used for your containerized applications and microservices, meaning this AWS functionality is easily integrated with AWS ECS, EKS, Lightsail, Lambda, App Runner and most recently Kubernetes platforms on EC2.
  • The metrics that Container Insights collects are available in CloudWatch automatic dashboards, and also viewable in the Metrics section of the CloudWatch console.
  • SO what's different here, If I can view the metrics on the metrics page too? - I know, it seems redundant, but the difference comes in the categorization....viewing metrics on the Metrics section categorizes these into namespaces i.e Lambda, Ec2, Lightsail, but for the metrics you view on the cloudwatch container insights section, you can view all metrics for all applications running in containers across the entire AWS services.
  • What does that even mean, right? - In simple terms, If I wanted to evaluate all applications running in all my ECS clusters, instead of viewing each applications metrics individually, Container metrics allows me to pull up all of these stats in one place. There I can evaluate how applications are fairing with resources allocated to them.

All this great, but how will Cloudwatch Container insights affect my wallet?

  • Metrics that cloudWatch container insights collects are limited i.e it does not automatically create all possible metrics from the log data, however, you can view additional metrics and additional levels of granularity to your log data then use CloudWatch Logs Insights to analyze the raw performance log events.
  • Be careful though as additional metrics collected by Container Insights are charged as custom metrics. Always make it a point to review Amazon CloudWatch Pricing before committing to any additional service as good practice.

Please take a moment to go through Cloudwatch container Insights, but remember as with any other resource on AWS to decommission your resources after testing to avoid unnecessary costs.

Feel free to go through these reference materials to familiarize yourself better with AWS Cloudwatch and its offerings:

These are a few of the tools I have learnt to use within the AWS eco-system to monitor the containerized applications I deploy. Thankyou for getting to the end of this section of my blog.


Monitoring Databases in AWS

RDS is one of the most expensive resources you can use within the aws eco-system. Especially if not managed right.

I want to recap abit before I get into this second segment of monitoring to help you understand why I felt this necessary to add this segment here.

What are the 3 AWS pricing principles? - compute, storage, and outbound data transfer. Go ahead google it....go on (lolz)

So

- Every database we create by default incurrs costs.
- The more data we store in each of these databases(per GB), then the more we have to cough up to AWS for costs.
- Also let's be cognisant that every production database might have atleast 1 snapshot taken ,let's say once a month for this article, but we all know we take waaaaay more snapshots per month.
- These snapshots also incur costs, esp if the snapshot also holds alot of data just like the prod db.
Enter fullscreen mode Exit fullscreen mode

What I am trying to point out here is that we need to understand how our database is performing and re-evaluate allocated resources frequently also to avoid unneccessary costs.

So let's expound on this abit more, shall we?

What to Monitor

There are three different types of monitoring that can be captured here:

  1. service monitoring via CloudWatch,
  2. database monitoring via Performance Insights (also submitted to CloudWatch),
  3. OS monitoring with Enhanced Monitoring.

RDS as a AWS service publishes different types of metrics to CloudWatch in this respect. These broken down can be looked at as:

  • Metrics for your DB instances, e.g., CPU utilization, the number of database connections, available memory, network throughput, and read latency.
  • Performance Insights metrics, e.g., the number of active sessions for the database engine.
  • Real-time data about the operating system on which the DB instances are run — more on this later.

Service Monitoring via CloudWatch

  • By default, RDS sends metrics to CloudWatch in one-minute intervals. These metrics are stored for 15 days, enabling you to run analytics for historical data to gain service performance insights.

  • To configure your RDS instance to log to Cloudwatch, there are some things you need to do on the AWS Management console. Kindly follow this aws tutorial to set this up.

  • Usage metrics for RDS service quotes in your AWS account, e.g., the total allocated storage of all your database instances or the number of instances itself.

Performance Insights - Monitoring Database Load

  • RDS’s default metrics only help you to visualize and analyze the general load on the database, but it does not provide you with detailed insights about the cause of the load for certain types of workloads.

  • With Performance Insights, you’re able to filter loads in a very fine-grained manner, for example, by using SQL statements. This will help you to determine major contributors to heavy loads or bottlenecks affecting your service’s performance.

  • Performance Insights need to be enabled explicitly for your DB instance or Multi-AZ cluster. If you want to keep data collected by Performance Insights for longer than seven days, you’ll receive an additional charge.

Enhanced Monitoring - Monitoring Operation System

  • In addition to monitoring your database instances, you can also monitor the underlying operating system.
  • The major difference between the default CloudWatch monitoring and Enhanced Monitoring lies in the collection of metrics: Enhanced Monitoring directly collects statistics via an agent running on the DB instance instead of the hypervisor that creates and runs the virtual machines.

  • Enhanced Monitoring collects a lot of additional metrics from the OS in real time. This is useful if you’re interested in the different processes or threads that are using the CPU.

  • An important fact here is that Enhanced Monitoring collects its metrics in CloudWatch logs. This means the data transfer and storage of CloudWatch Logs will increase, and you’ll receive an additional charge. Shorter monitoring intervals (meaning a higher frequency of monitoring) or a higher number of DB instances will increase pricing.

So now that we have an inkling on what you need to review for the RDS service, kindly make a point to go and check this out and adapt effectively.

It's always a pleasure to learn and walk you through my thoughts. Kindly drop a comment if you have any other thoughts or questions.

Top comments (0)