hridyesh bisht for AWS Community Builders

Posted on Sep 5, 2022

Things to know before Streaming data

#aws #datascience #data #streaming

Consider times in your life when someone said something that left you speechless. It’s the ideal moment for a witty comeback, but you have nothing to say. You think of the perfect response after walking away, but it is too late. The moment has passed us. This is an example of how some data degrades value over time.

Some data comes as an unending stream of events and is best analysed while in flight. They process raw data in real-time using streams, and you save only the information and insight that is useful. Streaming data architecture enables developers to analyse time-sensitive data with greater value to generate a real-time situation.

This blog will cover the introduction to streaming data, components of streaming data architecture, integrating batch processing with stream processing, and in depth about Amazon kinesis services such as Kinesis Video Streams, Kinesis Data Streams, Kinesis Data Firehose, and Kinesis Data Analytics .

Q.What do you mean by stream processing?

Stream processing involves ingesting a continuous data stream and analysing, filtering, transforming, or improving the data in real time. This improves visibility into various areas of data activity, such as service consumption, server usage, and device geolocation.

Businesses, for example, can continuously analyse social media streams to watch changes in public attitude toward their brands and products and respond promptly.

Image credits: https://f.hubspotusercontent10.net/hubfs/4757017/stream_processing_3-01.jpg

Stream processing services and architectures are becoming increasingly popular because they enable developers to mix data feed from multiple sources, and since not all data is produced equally and its value changes.

Q.What is batch processing ?

Before stream processing, vast amounts of data were often stored in a database and processed all at once. They examined this data using batch processing because, as the name implies, they performed it all in one “batch.”

Batch processing collects, stores, and analyses data in fixed-size pieces regularly. The schedule depends on the frequency of data gathering and the related value of the insight gained. This value lies at the heart of stream processing.

There are two issues related to batch processing that impact the value of data

Batch processing systems divide data into consistent and evenly spaced time intervals. This results in a consistent workload that is predictable but not intelligent. Sessions that begin in one batch may finish up in another. This complicates the examination of connected transactions.
They have optimised batch architectures to handle enormous amounts of data at once. As a result, an analysis job may have to wait for long periods of time because the queue must be full before processing can begin. While the batch job’s size is consistent, the time in each batch of data is not.

Batch processing is built around a data-at-rest architecture. Before processing can begin, the collection has to be stopped and we must store the data. Subsequent batches of collected data bring the need to create an aggregate across multiple batches. In contrast to this, streaming architectures handle never-ending data flows naturally and with grace. Using streams, patterns detected, results inspected, and we can examine simultaneously multiple streams.

Image credits: https://www.researchgate.net/profile/Olawande-Daramola/publication/333653951/figure/tbl1/AS:767176877281282@1559920629763/Comparison-between-batch-processing-and-streaming-processing-82.png

I believe it is crucial to emphasise that batch processing is still required. Stream processing is a supplement to batch computing. Some forms of information require real-time data processing because the data has an actionable value at the collected time and its value diminishes rapidly. Steam processing was developed to solve latency, session boundaries, and unpredictable load.

Q. What are Components of Stream application?

Generally speaking, streaming data frameworks are described as having five layers; the Source, Stream Ingestion, Stream Storage, Stream Processing, and the Destination.

Data is generated by one or more sources or producers including mobile devices, meters in smart homes, click streams, IoT sensors, or logs.
Data is gathered at the Stream Ingestion Layer by one or more producers, structured as Data Records, and placed in a data stream.
1. They convert it to a common message format and actively stream it.
We store the data in the Data Stream. Before we can evaluate data with SQL-based analytics tools, data streams from one or more message brokers are gathered, converted, and formatted.
1. The outcome could be an API call, an action, a visualisation, an alert, or, in some situations, the creation of a new data stream.
2. The Stream Processing Layer is managed by Consumers. Consumers access streams, read data, then process data contained inside a stream.
3. The Consumers deliver Data Records to the fifth layer, the destination. Such as a Data Lake or Data Warehouse, durable storage, such as Amazon S3, or Amazon Redshift.

DEV Community

Things to know before Streaming data

Q.What do you mean by stream processing?

Q.What is batch processing ?

Q. What are Components of Stream application?

Q. HOW IMPORTANT IS STREAM PROCESSING?

Q. What is Amazon Kinesis?

A. Kinsesis Video Streams:

B. Kinsesis Data Streams:

C. Kinsesis Data Firehouse:

D. Kinsesis Data Analytics

For more information refer,

Top comments (0)

Read next

Multi-Region Distributed SQL Transaction Latency

Handling Paginated Results Seamlessly with AWS Step Functions

Tiny AI Safety Guard Matches Larger Models with 98% Accuracy, Runs on Phones

Top 7 Data Careers You Should Know About in 2025