DEV Community

Cover image for Kinesis Producers

Kinesis Producers

Kinesis Producers

A producer for Amazon Kinesis Data Streams is an application that feeds user data records into a Kinesis data stream (also called data ingestion). The Kinesis Producer Library (KPL) makes it easier to construct producer applications by allowing developers to achieve high write throughput to a Kinesis data stream.

There are different methods to stream data into Amazon kinesis streams:

  • Kinesis SDK
  • Kinesis Producer Library (KPL)
  • Kinesis Agent

Other third-party libraries include:

Spark, Log4J, Appenders, Flume, Kafka Connect, NiFi

Kinesis Producer SDK - PutRecord(s)

  • PutRecord (one record) and PutRecords (many records) APIs are utilized.
  • PutRecords leverages batching and enhances performance, resulting in fewer HTTP calls.
  • AWS Mobile SDKs: Android, iOS, etc...
  • Managed Amazon Web Services sources for Kinesis Data Streams:

    • AWS IoT
    • CloudWatch Logs
    • Kinesis Data Analytics

Use cases:
low throughput, higher latency, simple API, AWS Lambda

Kinesis Producer Library (KPL)

  • Easy to use and highly configurable C++/Java library
  • Used for building high-performance, long-running producers
  • Automated and configurable retry mechanism
  • Synchronous or Asynchronous APIs (better performance for async)
  • Submits metrics to CloudWatch for monitoring.
  • Batching (both turned on by default) – increase throughput, decrease cost:
    • Collect Records and Write to multiple shards in the same PutRecords API call.
    • Aggregate – increased latency.

Kinesis Producer Library (KPL) Batching

By inserting some delay using RecordMaxBufferedTime, batching efficiency can be impacted (default 100ms)

Image description

NOTE: When not to use the Kinesis Producer Library

  • The KPL can incur an additional processing delay of up to RecordMaxBufferedTime within the library (user-configurable)
  • Larger values ​​of RecordMaxBufferedTime result in higher packing efficiencies and better performance
  • Applications that cannot tolerate this additional delay may need to use the AWS SDK directly

Image description

Kinesis Agent

Monitor Log files and sends them to Kinesis Data Streams
Java-based agent, built on top of KPL
Install in Linux-based server environments

Features:

  • Write from multiple directories and write to multiple streams
  • Routing feature based on directory/log file
  • Pre-process data before sending to streams (single line, CSV to JSON, log to JSON)
  • The agent handles file rotation, checkpointing, and retry upon failures
  • Emits metrics to CloudWatch for monitoring

AWS Kinesis API - Exceptions

  • Provisioned Throughput Exceeded Exceptions
  • Happens when sending more data (exceeding MB/s or TPS for any shard)
  • Make sure you don't have a hot shard (such as your partition key is bad and too many data goes to that partition) Solution:

    • Retries with backoff
    • Increase shards (scaling)
    • Ensure your partition key is a good one

Top comments (0)