DEV Community

Cover image for Data Streaming Hands-On: High-scale data streaming with AWS Kinesis Data Streams
Anita Andonoska for AWS Community Builders

Posted on • Originally published at Medium

Data Streaming Hands-On: High-scale data streaming with AWS Kinesis Data Streams

In my previous post, we saw how to create a data streaming app on AWS, with Kinesis Data Streams and Lambda. Here we’ll see high-scale data streaming in action by exploring Kinesis Data Streams on-demand scaling.

Kinesis Data Streams Scaling

A shard serves as a base throughput unit of Kinesis Data Streams. A shard supports 1 MB/s and 1,000 records/s for writes and 2 MB/s for reads. By default, new data streams created with the on-demand capacity mode have 4 MB/s of write and 8 MB/s of read throughput. That means we have 4 shards, each one allowing 1MB/second and 1000 records per second for writes, and 2MB/seconds for reads.

By default, On-Demand capacity mode can automatically scale up to 200MB/s of write and 400MB/s of read throughput. That is the default limit per account but you can request a limit increase through AWS Support to enable your On-Demand stream to scale up to 1 GB/s write and 2 GB/s read throughput.

According to the information provided in this AWS blog, scaling in on-demand capacity mode occurs at the individual shard level. When the average utilization of an ingest shard reaches 50% (0.5 MB/s or 500 records/s) within a minute, the shard is split into two. If random values are used as partition keys, all stream shards will have event traffic distribution, and scaling will occur simultaneously across all shards. Depending on the number of shards being scaled, it can take up to 15 minutes to split the shards.

As the load increases, Kinesis Data Streams adjusts the number of shards in the stream by monitoring the ingest throughput at the shard level. Nevertheless, under certain circumstances, producers might encounter “WriteThroughputExceeded” and “Rate Exceeded” errors, even when operating in on-demand capacity mode. According to AWS documentation, you may encounter "ProvisionedThroughputExceeded" exceptions if your traffic increases by more than double the previous peak within a 15-minute window. Further information on various scaling scenarios can be found here.

Load Generator app

I aimed to replicate the scenario of multiple producers sending data to the stream, and to achieve this, I developed a data streaming application for load generation. This application is built with Lambda functions serving as producers, utilizing Kinesis Data Streams with on-demand capacity, and Lambda functions as consumers. For a deeper understanding of the setup, you can refer to the specifics outlined in my earlier publication.

To generate the load, an additional Lambda function called the Producer Orchestrator has been introduced. This Orchestrator function is configured to accept three parameters: the duration of the load in milliseconds, the number of producers, and the number of records each producer will push to the Data Streams service. Depending on the specified number of producers, multiple instances of Lambda producers will be created. To ensure the data to be evenly distributed across the shards, a unique partition key is assigned to each record.

The code for the app can be found here.


As previously mentioned, scaling occurs when the average ingest shard utilization reaches 50% (equivalent to 0.5 MB/s or 500 records/s) within a minute. To initiate the testing, I began with a small load to align with the initial capacity of the stream, set at 4MB/s of write throughput. Initially, I started with 4 producers, each generating 1K records and then reached 80 producers with 6K records each. As the load increased, the number of shards scaled rapidly, reaching 12, 28, 60, 96, and ultimately 100 shards. This scaling was triggered by a brief surge in load lasting only a minute or two. My objective was to observe the scaling process, hence the load increase was not gradual, resulting in some records being throttled.


The testing conducted on scaling scenarios in Kinesis Data Streams revealed that scaling occurs swiftly when the average ingest shard utilization reaches the predefined threshold. The stream's capacity quickly adapted, with the number of shards increasing significantly in response to increased load. Even a brief (around 1 minute) surge in load triggered scaling actions, demonstrating the auto scaling capability of Kinesis Data Streams.

Top comments (0)