DEV Community

Cover image for What does 'batching' mean when we're talking about Apache KafkaⓇ?
Lucia Cerchie
Lucia Cerchie

Posted on


What does 'batching' mean when we're talking about Apache KafkaⓇ?

Today I learned that when you hear the word 'batch' in the context of Apache Kafka, it can mean one of two things:

  1. A reference to batch-only data processing systems. Batch-only systems process data in a bounded way. That means that there's a start time and an end-time. Whether this batching is done in large or micro-batches, it is processed all at once. That's in contrast to the continuous data streaming that Apache Kafka enables, in which data is processed in event-sized pieces.

  2. Within the data streaming context, there's something called producer batching. It's a bit of a misnomer because it's not really related to the batch-only data processing systems. A Kafka producer, the client that publishes records to the Kafka cluster, compresses messages via a process called batching to increase throughput. This batching is part of the process handling data at once and in event-sized pieces, so it doesn't mean the same thing as batch-only data processing.

In conclusion, 'batching' means, in a very general way, 'grouping stuff together'. But 'producer batching' and 'batch-only data processing systems' do not share the term in any significant sense, because they are referring to the completely different functions I described above.

Top comments (0)