How to choose the no of partitions for a kafka topic

apache-kafkakafka-consumer-apikafka-producer-api

We have 3 zk nodes cluster and 7 brokers. Now we have to create a topic and have to create partitions for this topic.

But I did not find any formula to decide that how much partitions should I create for this topic.
Rate of producer is 5k messages/sec and size of each message is 130 Bytes.

Thanks In Advance

Best Answer

I can't give you a definitive answer, there are many patterns and constraints that can affect the answer, but here are some of the things you might want to take into account:

  • The unit of parallelism is the partition, so if you know the average processing time per message, then you should be able to calculate the number of partitions required to keep up. For example if each message takes 100ms to process and you receive 5k a second then you'll need at least 50 partitions. Add a percentage more that that to cope with peaks and variable infrastructure performance. Queuing Theory can give you the math to calculate your parallelism needs.

  • How bursty is your traffic and what latency constraints do you have? Considering the last point, if you also have latency requirements then you may need to scale out your partitions to cope with your peak rate of traffic.

  • If you use any data locality patterns or require ordering of messages then you need to consider future traffic growth. For example, you deal with customer data and use your customer id as a partition key, and depend on each customer always being routed to the same partition. Perhaps for event sourcing or simply to ensure each change is applied in the right order. Well, if you add new partitions later on to cope with a higher rate of messages, then each customer will likely be routed to a different partition now. This can introduce a few headaches regarding guaranteed message ordering as a customer exists on two partitions. So you want to create enough partitions for future growth. Just remember that is easy to scale out and in consumers, but partitions need some planning, so go on the safe side and be future proof.

  • Having thousands of partitions can increase overall latency.

Related Topic