Apache Kafka vs RabbitMQ | Performance, Requirements, and Design
RabbitMQ vs Kafka – Performance
When making decisions, humans often use their emotions, but for experts who need to make complex decisions that could have a long term impact, facts need to be at the basis of decision making.
There are so many messaging technologies available today, how can you choose which one is the best and should you choose RabbitMQ or Apache Kafka?
Origins – RabbitMQ vs Kafka
RabbitMQ was one of the first open source message brokers to have a good level of features. It is a traditional message broker. Originally, it was developed to implement AMQP, which has strong routing features. Java has messaging standards, but it doesn’t have the cross language flexibility of AMQP.
Apache Kafka on the other hand is developed in Scala and actually started at LinkedIn. It was a way of connecting different internal systems. When this was happening LinkedIn needed to adapt its capabilities and move away from the approaches that were monolithic. Kafka is part of the Apache Software Foundation system of products. It is useful for event-driven architecture.
Architecture and Design – Kafka vs RabbitMQ
RabbitMQ’s design is that of a general purpose message broker. It employs variations of point to point, pub-sub and request/reply patterns of communication. It is a mature system and if configured correctly it performs well. It is well supported by various languages with many plugins available.
In RabbitMQ communication can be either synchronous or asynchronous. Messages are sent to exchanges by publishers and messages are retrieved from queues by consumers. Producers are not burdened with routing decisions that are hardcoded, because of the decoupling procedures via exchanges.
RabbitMQ offers a number of distributed deployment scenarios. It can use multi-node clusters of cluster federations, and does not depend on external services.
Kafka, on the other hand, is designed to be durable, fast and scalable allowing for a high volume of messages. Kafka has a message store, like a log, which is run in a service cluster, and it stores records in categories, which are called topics.
In comparison to RabbitMQ, Kafka uses a dumb broker and a smart consumer to read the buffer. Each message includes a key, a value and a timestamp. Kafka doesn’t try to track all the messages; it retains all messages for a certain amount of time. Consumers are responsible for tracking their location. This means that Kafka is able to support many consumers and a large amount of data, at a very small cost. In order to do this Kafka needs external services, such as Apache Zookeeper, in order to run effectively.
Apache Kafka vs Rabbit MQ – Requirements
Once a shared database becomes unfeasible, developers begin to explore messaging. The broker is part of Apache Kafka, and that is one of the most popular parts of Apache Kafka, as it has been designed for stream processing. Kafka Streams has recently been added to Apache Kafka. There is good documentation available, but one that can cause confusion is messaging. Which messaging scenarios are best to use with Kafka?
- Streaming from A to B, no complicated routing, maximal throughput, delivered at least once in partitioned order.
- If your application requires access to stream history.
- Stream processing storage.
- Event sourcing.
Rabbit MQ is more a general purpose messaging system. It is more often used for speedy responses rather than those procedures that are resource heavy. RabbitMQ is also useful for distributing messages to multiple recipients. If the requirements are more than throughput RabbitMQ can offer a lot. The following scenarios are best for use with RabbitMQ:
- Application needs to work with existing protocols such as AMQP.
- If you need fine grained control of consistency per message, but not that Kafka has improved this recently.
- If application requires variety in type of messaging.
- Complex routing.
With additional software RabbitMQ can address some of the strong cases for Kafka, as listed above.
Apache Kafka vs RabbitMQ Developers
Java, Spring, .NET, Ruby and many others are supported by RabbitMQ and many others are supported vis community plugins. The client libraries of RabbitMQ are well documented and mature.
In comparison, Apache Kafka has made progress in this area, and although it only uses Java there is a growing adapter SDK, which allows you to build you own system integration. Because of this, many other software providers ensure that both RabbitMQ and Apache Kafka will work well on their technology.
Apache Kafka vs RabbitMQ – Security and Operations
Security and operations are a strength for RabbitMQ. The management plugin provides good management and monitoring. It support authentication using a x509 certification rather than username and password. You can add additional methods via a plugin.
This poses a problem for Apache Kafka. In the most recent release, Kafka 0.9, they have added some additional security measures. This has made an improvement on earlier versions, which didn’t work so well for sharing and multi-tenancy.
Kafka uses a management CLi, of shell scripts, property files and JSON files. Because they emit metrics through Yammer/JMX they do not maintain history, which means you need to use another monitoring system. They use Apache Zookeeper, but many users view this requirement with scepticism, but it allows clustering benefits for its users. The 3-node cluster on Kafka still functions after 2 failures, but you many need additional support for additional failures, so you need additional support through ~8 servers.
Kafka vs RabbitMQ Performance
Kafka is the leader here, with 100k/sec it is often the reason people choose Kafka.
Messaging per second is difficult to work out, because they depend on the environment, hardware, the nature of the workload and which delivery guarantees are used.
With RabbitMQ, 20K messages per second is easy, with little demanded in guarantees. A single queue will never do more work than it can get with CPU cycles to work with. In general RabbitMQ users have excellent performance with cluster between three to seven nodes.
In the end understanding the business use is the most important factor in making the right choice for your own situation.