Using Apache Kafka to implement event-driven microservices

When talking about microservices architecture, most people think of a network of stateless services which communicate through HTTP (one may call it RESTful or not, depending on how much of a nitpicker one is).

But there is another way, which may be more suitable depending on the use case at hand.
I am talking about event-driven microservices, where in addition to the classic request-response pattern, services publish messages which represent events (facts) and subscribe to topics (or queues depending on the terminology used) to receive events/messages.

To fully understand and embrace this new software design paradigm is not straight-forward but it is totally worth it (at least looking into it).
There are several interconnected concepts which need to be explored in order to discover the advantages of event-driven design and the evolutionary path which led to it, for example:

Log (including log-structured storage engine and write-ahead log)
Materialized View
Event Sourcing
Command Query Responsibility Segregation (CQRS)
Stream processing
"Inside-out" databases (a.k.a. "un-bundled" databases)

I'd like to point you to the following books to get familiar with those topics:

Designing Data-Intensive Applications by Martin Kleppmann - this is a very comprehensive book, it starts covering single-node application concepts, then distributed systems and finally batch and stream processing.
Designing Event-Driven Systems by Ben Stopford - it focuses on Apache Kafka as the backbone of event-driven systems.
Kafka: The Definitive Guide by Neha Narkhede, Gwen Shapira, et al. - a more practical book than the other two.

I read those three books and then I started building a simple PoC since learning new design ideas is great but it is not completed until you also put them in practice.
Also, I was not happy with the examples of event-driven applications/services available online, I found them too simplistic and not properly explained, so I decided to create one.

The proof of concept

The source code is split in two GitHub repositories (as per the Clean Architecture):

transfers_api → contains Java entities and Avro data definition files
transfers_recording_service → contains the business logic and the Kafka-related code

The proof of concept service keeps track of the balance available in bank accounts (like a ledger).
It listens for Transfer messages on a Kafka topic and when one is received, it updates the balance of the related account by publishing a new AccountBalance message on another Kafka topic.

Please note that each entity type is represented by two different classes:

one is generated by Apache Avro and it's used for serialization and deserialization (so they can be sent and received from Kafka) → see avro directory.
the other one is a POJO which may contain some convenience constructors and does not depend on Avro → see net.devaction.entity package.

The net.devaction.kafka.avro.util package holds converters to move back and forth from one data representation to the other.
In the beginning, Apache Kafka may seem overwhelming, even though it resembles a classic messaging broker such as ActiveMQ or RabbitMQ, it is much more than that and it works very differently internally.
Also, there are several Kafka client APIs, which adds more confusion to the learner.
We are going to focus on the following three:

The Producer API and the Consumer API
The Streams API

The Producer and the Consumer APIs are lower level and the Streams API is built on top of them.
Both sets of APIs have advantages and disadvantages.
The Producer/Consumer API provides finer control to the application developer at the cost of higher complexity.
On the other hand, the Streams API is not as flexible but it allows the implementation of some standard operations more easily and it requires much less code.

The "transfers recording" example/PoC service can be started in one of the following two modes:

The (explicit) polling mode which uses the Producer API, the Consumer API and also the Streams API (to create a queryable materialized view backed by a local data store, more on that later) → see TransfersRecordingServicePolling.java
The "join streams" mode which only uses the Streams API → see TransfersRecordingServiceJoinStreams.java

The two modes provide exactly the same functionality which is quite convenient for comparison purposes.

The (explicit) polling mode

It has four main components:

A consumer which listens on the "transfers" topic → see TransferConsumer.java
A ReadOnlyKeyValueStore (which is part of the Streams API) to materialized the "account-balances" topic data into a queryable view, so we can use the accountId value to retrieve the latest/current balance of a specific account → see AccountBalanceRetrieverImpl.java.
Please note that the accountId value is extracted from the "transfer" data message received by the consumer.
The business logic which creates a new/updated AccountBalanceEntity object from the received TransferEntity object and the current AccountBalanceEntity present in Kafka → see NewAccountBalanceProvider.java
A producer which publishes the updated balance by sending a message to the "account-balances" topic → and the local data store will get updated accordingly.

The "join streams" mode

As we said before, this second operating mode only uses the Streams API/DSL and taking advantage of it, we can code at a higher level of abstraction:

Instead of a consumer, we use a KStream → see TransfersRecordingServiceJoinStreams.java
Instead of a ReadOnlyKeyValueStore, we use a KTable → see TransfersRecordingServiceJoinStreams.java again.
The business logic remains the same but now we call it from the ValueJoiner → see TransferAndAccountBalanceJoiner.java
Instead of a producer, we redirect the output of the streams join to the desired topic (i.e., "account-balances") → see TransfersRecordingServiceJoinStreams.java once more → and the KTable will get updated with each message, closing the loop.

We can see that the code is much more compact than in the previous mode.
We do not need to explicitly map the KStream key to the KTable key, that's exactly what the join does (see line 11 in the code snippet below). Hence, we need to choose the Kafka keys accordingly. In this case, both message keys represent the account id.

Diagram

Running the code

To build and run the PoC application, in addition to Maven and Java, we also need a Kafka broker.

I decided to install the Confluent Platform which includes a Kafka broker (or a cluster depending on the configuration chosen) with some example topics and pre-configured integration with ElasticSearch and Kibana. But more importantly, it also includes an admin Web UI called "Control Center" which comes in very handy.

I hit a few bumps when running the Confluent Platform for the first time on my Fedora 30 computer.
Namely, I had to manually install a couple of software packages (i.e., "jot" and "jq").
And I had to separately install the Confluent CLI.
I also had to perform some several changes to some properties files and bash scripts to be able to run the Confluent Platform using a non-root user, here are the changes, please modify the configuration values as per your environment.

Watch the following YouTube video to get all the details including starting the Confluent Platform and running the example Streams application:

Search This Blog