Benchmarking Apache Kafka on OpenStack VM’s

I have been wanting to test Apache Kafka for sometime now and finally got around to it! In this blog post I give a very short introduction on what is Kafka, installation & configuration of Kafka cluster and finally benchmarking few near real-world scenarios on OpenStack VM’s

What is Apache Kafka?

Apache Kafka is a distributed publish-subscribe message broker. The core abstraction in Kafka is the topic. A topic is nothing but an append log to which producers publish their records and consumers read from it. A Kafka topic is divided into multiple partitions, these partitions are distributed across servers with in a Kafka cluster, which makes it scalable and each partition is replicated across a configurable number of servers for fault tolerance and messages in the topic are retained for a specific period of time for durability. The whole thing looks like below

kafka_arc

You can read more about Kafka in the official documentation

Installation and Configuration

The installation is quite straightforward, you can have one broker per server (recommended) or multiple brokers per server. You also need zookeeper for the coordination of the Kafka cluster, the good news is the Zookeeper is bundled with Kafka package along with handy startup and shutdown scripts. It is also recommended to install Zookeeper outside the Kafka cluster.

Zookeeper Installation

# download and unpack the software

mkdir /software
 cd /software
 curl -O http://www-eu.apache.org/dist/kafka/0.10.0.1/kafka_2.11-0.10.0.1.tgz
 tar xvfz kafka_2.11-0.10.0.1.tgz

# change essential configuration in config/zookeeper.properties

dataDir - data directory for zookeeper

# start zookeeper

cd $KAFKA_HOME
 bin/zookeeper-server-start.sh -daemon config/zookeeper.properties

Kafka Installation

# download and unpack the software

mkdir /software
 cd /software
 curl -O http://www-eu.apache.org/dist/kafka/0.10.0.1/kafka_2.11-0.10.0.1.tgz
 tar xvfz kafka_2.11-0.10.0.1.tgz

# change essential configuration config/server.properties

zookeeper.connect - points to the zookeeper; e.g host:2181
 broker.id - broker id must be unique across kafka cluster; e.g 101
 log.dirs - location to store the topics; e.g /kdata/logs
 log.retention.hours - topics retention time

# start kafka

cd $KAFKA_HOME
 $KAFKA_HOME/bin/kafka-server-start.sh -daemon config/server.properties

As you have seen its fairly easy to setup a Kafka cluster

Benchmarking Kafka on OpenStack VM’s

Now that all the infra work is out of the way, let’s try to run some tests to demonstrate (or more likely to understand) how Kafka works and behaves!

Test setup

Kafka cluster is running on three VM’s; the spec of each VM is 4 VCPUs, 7.3 GB RAM, 100GB volume with ext4 filesystem and 10Gb ethernet

Zookeeper running on separate VM; the spec of this VM is 2 VCPUs, 3.7 GB RAM, 100GB volume with ext4 filesystem and 10Gb ethernet

I have ran the following dd commands to determine the disk throughput

[root@habench103 kafka_2.11-0.10.0.1]# dd if=/dev/zero of=/kdata/test.dd bs=100M count=100 oflag=dsync
 100+0 records in
 100+0 records out
 10485760000 bytes (10 GB) copied, 144.381 s, 72.6 MB/s
 

[root@habench103 kafka_2.11-0.10.0.1]# dd if=/dev/zero of=/kdata/test.dd bs=1G count=10 oflag=dsync
 10+0 records in
 10+0 records out
 10737418240 bytes (11 GB) copied, 138.556 s, 77.5 MB/s

I have also tested with perf and it confirms that VM’s have 10Gb ethernet. The following tests are run during a normal working and repeated several times to remove any outliers.

Single producer publishing 500 byte messages with no replication

The objective of this test is understand the scalability of Kafka cluster. I start with 1 node Kafka cluster and then add additional nodes one by one.

# create kafka topic
bin/kafka-topics.sh --zookeeper habench001:2181 --create --topic test --partitions 48 --replication-factor 1
# run the producer to publish events to Kafka topic
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic test --num-records 50000000 --record-size 500 --throughput -1 --producer-props acks=1 bootstrap.servers=habench102:9092 buffer.memory=104857600 batch.size=9000

kafka1

It is evident that Kafka throughput scales by adding additional nodes

Single producer publishing 1000 byte messages with no replication

This test is repetition of the above with a different message size to undestand the impact of the message size. Again I start with 1 node Kafka cluster and then add additional nodes one by one

# create kafka topic
bin/kafka-topics.sh --zookeeper habench001:2181 --create --topic test --partitions 48 --replication-factor 1
# run the producer to publish events to Kafka topic
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic test --num-records 50000000 --record-size 1000 --throughput -1 --producer-props acks=1 bootstrap.servers=habench102:9092 buffer.memory=104857600 batch.size=9000

kafka2

Again the throughput scales by adding additional nodes and larger message sizes result in higher throughput than smaller sizes which is understandable due to overhead of processing more messages.

Single producer publishing 500 byte messages with (3x) and with out replication

The objective of this test is to understand the cost of the replication

# create kafka topic (with replication)
bin/kafka-topics.sh --zookeeper habench001:2181 --create --topic testr3 --partitions 48 --replication-factor 3
# publish messages to kafka topic with required settings
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic testr3 --num-records 30000000 --record-size 500 --throughput -1 --producer-props acks=1 bootstrap.servers=habench101:9092 buffer.memory=104857600 batch.size=6000

kafka3

There seems to cost involved in replicating the topic partitions, roughly about 20-25% percent. Although not bad it’s worth noting that replication is not free!

Three producers, 3x async replication with different message sizes

The object of the test is to understand the effect of the message size on the producer throughput

# publish 200 byte messages to kafka topic
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic testr3 --num-records 30000000 --record-size 200 --throughput -1 --producer-props acks=1 bootstrap.servers=habench101:9092 buffer.memory=104857600 batch.size=6000
# publish 500 byte messages to kafka topic
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic testr3 --num-records 15000000 --record-size 500 --throughput -1 --producer-props acks=1 bootstrap.servers=habench101:9092 buffer.memory=104857600 batch.size=6000

# publish 1000 byte messages to kafka topic
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic testr3 --num-records 10000000 --record-size 1000 --throughput -1 --producer-props acks=1 bootstrap.servers=habench101:9092 buffer.memory=104857600 batch.size=6000

kafka4

As can be seen from the above chart the throughput dropped down significantly when message size is 1000 bytes. Although this is not expected, I got the same results even after repeating the tests few times

Throughput with single and multiple producers

The objective of this test is see the benefits of multiple producer even when a single producer is not limited by network bandwidth.

500 byte message size and 3x async replication with single and multiple producers

kafka5

In our case, since we have 10Gb ethernet we are not bottle necked by network throughput for single producer, however its worth nothing that producer throughput can be increased by adding more producers

Producers and Consumers in harmony

Finally in the real production deployments it is quite natural to have producers and consumers writing and reading data at the same time. The result of running a combination of producers and consumers is as below

One Producer and One Consumer Three Producers and Three Consumers
Throughput (MB/sec) 124.47 155
Messages/sec 261038 330009
Consumer throughput (MB/sec) 83.7167 364

Learnings & Conclusion

Number of topic partitions dearly impacts both producer and consumer throughputs.

The message size and batch size should also be looked at, although not much can be done by the message size, the batch size should be tuned in-line with the latency requirements to achieve maximum throughput

The producers will need in the order of couple of GB memory and the CPUs on the kafka cluster were fairly quiet.

num.io.threads on the kafka broker can be tuned to further optimize the throughput and this again depends on the storage you have.

The hardware requirements for the Kafka cluster are not very demanding and it seems to scale very well.

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s