This is a step-by-step guide that will walk you through the process of building an AWS Kinesis Data Streams application.
Set Up AWS Kinesis Data Streams service
When creating AWS Kinesis Data Streams service there are two capacity modes to choose from:
- On demand, should be used when your data stream’s throughput requirements are unpredictable and variable. With on-demand mode, your data stream’s capacity scales automatically.
- Provisioned: should be used when you can reliably estimate throughput requirements of your data stream. With provisioned mode, your data stream’s capacity is fixed.
For this example I am using a Kinesis Data Streams service with On-Demand capacity mode, so the data stream’s capacity will scale automatically. At time of creation I went with default settings for the Kinesis Data Stream service.
Build a Data Producer
We need to create an application that will simulate data production. To send the data we’ll utilize AWS Kinesis SDK so that we can interact with the Kinesis API.
In my example I am using AWS SDK for .NET, and I have created a Lambda function as a producer, but this can be any type of application or script, and of course it can run outside of AWS.
The AWS Kinesis SDK offers two methods for sending the data, PutRecord for sending a single record, and PutRecords for sending a batch of records. For the Lambda function to be able to send data to Kinesis Data Streams, it is needed to have the PutRecord and/or PutRecords permission assigned based on which API method you’ll be using. Here we’ll show the usage of the PutRecords method since sending a batch is more fun.
Each PutRecords request can support up to 500 records. Each record in the request can be as large as 1 MiB, up to a limit of 5 MiB for the entire request, including partition keys. To send a batch of records we’ll need to make sure that a single PutRecords request does not go over the maximum number of records or the maximum size limit, see the code sample below.
Build a Data Consumer
The Amazon Kinesis Data Streams integrates with many AWS and third party services for consuming the data (more info in my previous post), also a custom application can be created to read the data by utilizing Kinesis Data Stream API.
Here as a data consumer, I have created an AWS Lambda that integrates with the Kinesis Data Stream. The Kinesis Data Stream service is added as a trigger for the function. Also proper permissions (kinesis GetRecords, ListStreams, DescribeStream etc.) are needed to be assigned to the function so that it can be triggered by Kinesis Data Stream service .
In the code below the messages are being read from the stream.
Monitor and Troubleshoot
For monitoring and troubleshooting the Data Streams Service offers an automatic dashboard:
The dashboard is based on metrics that automatically are sent to CloudWatch from the Data Streams service. Additionally you can leverage the metrics to create your own dashboards or create alarms.
Beside this I am using CloudWatch logs for the custom logging from the producer and consumer Lambda functions.
Conclusion
In the preceding discussion, we’ve walked through the steps of creating a Data Streaming app, tackling the generation and transmission of streaming data, and finally, the consumption of that data.
You can find the complete code on my GitHub repository.
Top comments (0)