Message Broker Comparison – RabbitMQ, Kafka, ActiveMQ, Kestrel
Message brokers are an important component of web technology today, but what is their impact on cluster and data performance or a backlog of messages.
Message Broker Comparison
When reviewing various message brokers, there are some of the important issues:
- Their behaviour when there is a backlog of messages,
- Their ability to create a cluster, and
- Their ability to protect data without blocking publishers if there is a failure of a node in a cluster.
RabbitMQ has been developed by, and is maintained by Pivotal. This review centres on version 3.2.2, which is on a Cent05.6 server. RabbitMQ’s web site contains lots of useful information, and there is a lot of literature available. Erlang is not a common programming language, but works well for RabbitMQ. It is a well-known message broker, which is popular, with many strong features as well.
One reviewer reported that the installation was straight forward. Firstly, Erlang version R14B needs to be installed. It comes from epel and the RabbitMQ rpm. Although this reviewer experienced a small issue it was easily resolved, and the management plugin was installed.
There are many adjustable parameters available in the rabbitmq.config file. In this case, the reviewer has used defaults. For the client’s Application Programming Interface (API) RabbitMQ is able to support many languages and standard protocols, such as STOMP (which is available via a plugin). You can use the client’s API or the web interface to create topics and queues. If there are additional nodes, you can cluster them, and queues and topics can be used for other services.
This reviewer started by creating 4 queues, then wrote a ruby client and inserted messages. The publishing rate was about 20k/s. Multiple threads were used, but there were some stalls caused by vm_memory_high_watermark. Although the reviewer had enough space on the disc, the memory usage increased. During the load the CPU was quite high, around 40 and 50% on 8 cores VM. The independent reviewer concluded that he had been able to set up a replicated queue on two nodes and insert objects, but his requirements were still not met. During the review, a mistake was made and a resync was necessary. The resync was very slow. Although it has lots of features and the performance was reasonable, it didn’t suit the reviewer’s requirements.
Kafka is written in Java and although it was originally designed by LinkedIn, it is now part of Apache. One reviewer has looked at it and was very impressed. He notes the usefulness of the architecture. Messages are stored in flat files and consumers are able to ask messages that are based on an offset. It is similar to a MySQL server (as the producer) saving messages and consumers are able to ask messages based on an offset. Because the server is fairly simple, it is really fast. You can save old messages on a time basis.
Zookeeper is used on the Kafka server. It provides cluster membership and routing; consumers can also us the same program to synchronise. If you don’t know Zookeeper, it is similar to Corosync, and is a synchronous distributed storage system.
As far as features go, Kafka is a bit short on them. Although there is no web fronted built-ins, you can access some through the ecosystem. There is no routing, and rules are not available and stats are just JMX. On the other hand the performance reached a publishing speed of 165k messages through one thread. Consuming was basically disk bound, and on the server. The performance with 3M messages was great, and that was minus the coordination of Zookeeper. The CPU usage and memory were only modest.
To check kafka’s ability to cluster, the reviewer created a replica queue, added messages, stopped the replica, added more messages and restarted. Kafka only too seconds to complete the resync.
Kafka suited the requirements, had good performance and low usage of resources.
ActiveMQ is another popular message broker. It has some impressive features, and although it is written in Java, like Kafka, it is more comparable to the standard set by RabbitMQ. The storage backend provides some HA, replication is supported at Level DB, but the reviewer encountered some issues. The reviewer did not require full HA, and to ensure the publisher wasn’t blocked, he used a mesh of brokers rather than the replication provided by the storage backend.
When using the mesh of brokers it is understood that you connect to one of them, where a message is published or consumed. Although you don’t know where the queue is located, that is, which node, the broker you are using knows and connects, and sends your request. Additional help comes from being able to specify all the available brokers and the client’s library will select for you. If the one you are connected to fails, it will reconnect you to another.
In this trial, the reviewer had an insert rate of 5000msg/s over 15 threads. One consumer could read 2000msg/s. The reviewer allowed the program to run and received 150M messages. Unfortunately, the reviewer then lost the web interface and the publishing rage slowed down considerably.
Although ActiveMQ had lots of features, and performed well, it didn’t really meet the reviewer’s requirements.
Kestrel is more like Kafka than the other two examples reviewed. The Kestrel message broker is written in scala, and speaks the memcached protocol. The queue name comes from the key and the object is the message. Although Kestrel is particularly simple, the queues are in a configuration file, but Kestrel allows you to specify storage limits per queue and expiration and behaviour when the limits are reached. One of the reviewer’s requirements was to never block publishers, and using the setting, “discardOldWhenFull = true” this requirement was met.
Kestrel is somewhat limited with its ability to cluster. However they can publish availability to Zookeeper and publishers and consumers will be informed that the server is missing and adjust accordingly. This will be more complicated if there are many Kestrel servers and they all have the same queue defined. Consumers will need to query the broker in order to get a message returned, so keeping an order will be more difficult.
The reviewer sent a few bash scripts and used nc to publish the messages, and it reached 10k messages/s which is particularly impressive. Over time the rate became static, and is probably limited by the need for a reconnection for each message. If consumers are present the publishing rate decreases, but not by much. The main difficulty found was when a particularly big number of messages expired, the server froze. The reviewer notes that could have been because he neglected to set ‘maxExpireSweep’ to a large enough number.
Overall, Kestrel is simple but performs well.
Overall the reviewer found Kafka to be the best fit for the particular requirements of the review. There is a guarantee of service being available with non-blocking. Messages are easily replicated making for higher availability of data. The performance of Kafka is good and use of resources is modest.
I hope this message broker comparison will help you choose the best message broker in your case.