Apache Kafka is currently a very popular Pub-Sub system. It is being leveraged as a data streaming platform as well as a message broker. It is often used to for real time event processing, high-throughput, low-latency data streams that are easily scalable. Kafka provides resistance against node failures, durability, scalability and persistence along with data delivery guarantees. However, just like any other tool Kafka needs optimization becauase a small configuration mishaps may lead to a big disaster. In this blog, we’ll focus on best practices to avoid mishaps in a kafka environment.
1)Use the default settings at the Broker level: Kafka is so powerful that it can process and handle a very large amount of data just with the default settings at the Broker level. However, custom configuration should be applied at the topic level based on your needs.
2)Plan your server restarts: Kafka is a stateful service which means the it keeps track of the state of interaction. Randomly restarting your kafka brokers may lead to data loss.
3)Plan for retention: Ensuring the correct retention space by identifying the producer byte rate is another Kafka best practice. The data rate dictates how much retention space is needed to guarantee retention for a given amount of time.
4)Educate Application Developers: This is the most important but least implemented best practice in the kafka world. If one can educate developers about the kafka api then issues like high latency, low throughput, long recovery time, data loss, duplication etc can be addressed from the get go.
5)Manage your Partition count: Kafka is designed for parallel processing and, like the act of parallelization itself, fully utilizing it requires a balancing act. Partition count is a topic-level setting, and the more partitions the greater parallelization and throughput. However, partitions also mean more replication latency, rebalances, and open server files. Also, keep in mind that partition can only be increased it can't be decreased therefore always start from the lowest possible number.
If you want to learn more in depth about Kafka you can always buy my course from Udemy Kafka Broker Administration.
6)Configure Multiple Protocol Listeners: Goal of this is to configure and isolate Kafka with security and data segregation in mind. By defining multiple-listeners you can isolate
Kafka:Client security, Inter broker communication, Broker-Connect, Broker-Schema registry, Broker-REST, REST-Client etc.
Security options and protocols with Kafka:
- SSL/SASL: Authentication of clients to brokers, inter broker, brokers to tools.
- SSL: Encryption of data between clients to brokers, between broker and tools to brokers
- SASL types: SASL/GSSAPI (Kerberos), SASL/PLAIN, SASL/SCRAM-SHA-512/SCRAM-SHA-256, SASL_AUTHBEARER
- Zookeeper security: Authentication for clients (Brokers, tools, producers, consumers), Authorization with ACL.
7)Monitor your brokers for network IO: for both transmit-receive and packet loss. Also, watch disk I/O for skewed disks, CPU usage, disk utilization etc. Accordingly, plan for scaling your clusters if needed.
8)Configure Producer for Acknowledgement: If you really care about data loss then either you set it to acks=0 or acks=all but if you really don't care about the data loss then acks=0 is fine.
9)Configure retries on producers: The default value is 3, which is often too low. The right value will depend on your application; for applications where data-loss cannot be tolerated, consider Integer.MAX_VALUE (effectively, infinity). This is effective in transient errors.
10)Monitor Kafka JMX Metrics: such as number of produced messages, under replication, disk skew, producer byte rate, fetch rate etc.