Skip to main content

Using Apache Kafka to implement event-driven microservices

When talking about microservices architecture, most people think of a network of stateless services which communicate through HTTP (one may call it RESTful or not, depending on how much of a nitpicker one is).

But there is another way, which may be more suitable depending on the use case at hand.

I am talking about event-driven microservices, where in addition to the classic request-response pattern, services publish messages which represent events (facts) and subscribe to topics (or queues depending on the terminology used) to receive events/messages.

To fully understand and embrace this new software design paradigm is not straight-forward but it is totally worth it (at least looking into it).

There are several interconnected concepts which need to be explored in order to discover the advantages of event-driven design and the evolutionary path which led to it, for example:
  • Log (including log-structured storage engine and write-ahead log)
  • Materialized View
  • Event Sourcing
  • Command Query Responsibility Segregation (CQRS)
  • Stream processing
  • "Inside-out" databases (a.k.a. "un-bundled" databases)
I'd like to point you to the following books to get familiar with those topics:
I read those three books and then I started building a simple PoC since learning new design ideas is great but it is not completed until you also put them in practice.
Also, I was not happy with the examples of event-driven applications/services available online,  I found them too simplistic and not properly explained, so I decided to create one.

The proof of concept

The source code is split in two GitHub repositories (as per the Clean Architecture):
The proof of concept service keeps track of the balance available in bank accounts (like a ledger).
It listens for Transfer messages on a Kafka topic and when one is received, it updates the balance of the related account by publishing a new AccountBalance message on another Kafka topic.

Please note that each entity type is represented by two different classes: 

  • one is generated by Apache Avro and it's used for serialization and deserialization (so they can be sent and received from Kafka) → see avro directory.
  • the other one is a POJO which may contain some convenience constructors and does not depend on Avro → see net.devaction.entity package.
The net.devaction.kafka.avro.util package holds converters to move back and forth from one data representation to the other.
In the beginning, Apache Kafka may seem overwhelming, even though it resembles a classic messaging broker such as ActiveMQ or RabbitMQ, it is much more than that and it works very differently internally.
Also, there are several Kafka client APIs, which adds more confusion to the learner.
We are going to focus on the following three:
The Producer and the Consumer APIs are lower level and the Streams API is built on top of them.
Both sets of APIs have advantages and disadvantages.
The Producer/Consumer API provides finer control to the application developer at the cost of higher complexity.
On the other hand, the Streams API is not as flexible but it allows the implementation of some standard operations more easily and it requires much less code.


The "transfers recording" example/PoC service can be started in one of the following two modes:

The two modes provide exactly the same functionality which is quite convenient for comparison purposes.

The (explicit) polling mode

It has four main components: 
  • A consumer which listens on the "transfers" topic → see TransferConsumer.java
  • ReadOnlyKeyValueStore (which is part of the Streams API) to materialized the "account-balances" topic data into a queryable view, so we can use the accountId value to retrieve the latest/current balance of a specific account → see AccountBalanceRetrieverImpl.java.
    Please note that the accountId value is extracted from the "transfer" data message received by the consumer.
  • The business logic which creates a new/updated AccountBalanceEntity object from the received TransferEntity object and the current AccountBalanceEntity present in Kafka → see NewAccountBalanceProvider.java
  • A producer which publishes the updated balance by sending a message to the "account-balances" topic → and the local data store will get updated accordingly.

The "join streams" mode

As we said before, this second operating mode only uses the Streams API/DSL and taking advantage of it, we can code at a higher level of abstraction:
We can see that the code is much more compact than in the previous mode.
We do not need to explicitly map the KStream key to the KTable key, that's exactly what the join does (see line 11 in the code snippet below). Hence, we need to choose the Kafka keys accordingly. 
In this case, both message keys represent the account id.





Diagram

Running the code

To build and run the PoC application, in addition to Maven and Java, we also need a Kafka broker.
I decided to install the Confluent Platform which includes a Kafka broker (or a cluster depending on the configuration chosen) with some example topics and pre-configured integration with ElasticSearch and Kibana. But more importantly, it also includes an admin Web UI called "Control Center" which comes in very handy. 

I hit a few bumps when running the Confluent Platform for the first time on my Fedora 30 computer.
Namely, I had to manually install a couple of software packages (i.e., "jot" and "jq").
And I had to separately install the Confluent CLI.
I also had to perform some several changes to some properties files and bash scripts to be able to run the Confluent Platform using a non-root user, here are the changes, please modify the configuration values as per your environment. 

Watch the following YouTube video to get all the details including starting the Confluent Platform and running the example Streams application:

 

Comments

Popular posts from this blog

Shared Ledger Simulator

I have been interested in the shared/distributed ledger technology (a.k.a. block chain, a.k.a. the magic behind cryptocurrencies) for more than a year already but recently I had finally put real time and effort into it.

And since I believe that the best way to understand something is to get your hands dirty, that is what I have done, after I got a grasp of the core principles (or that is what I thought back then), I decided to code my own shared ledger simulator.

Initially, I also considered to look into the main existing code bases (e.g., the source code of the main/official Bitcoin client/miner or Ethereum's) since they are open source, but seeing code like thisput me off... That file is 3229 lines big!!! Plus it is written in C++.
Do not get me wrong, I truly believe Satoshi Nakamoto (i.e., whoever hides behind that name) is a genius and also a great software developer, but he/she/they for sure did not have readability as their main priority.

I also noticed that some other people h…

Vert.x microservices: an (opinionated) application

First of all, sorry for the tautology in the title, a library can be either opinionated or un-opinionated (such as Vert.x is), but an application can only be opinionated.
However, I decided to include "opinionated" to help getting my point across even though is redundant.
The Motivation  I am a big fan of Vert.x and the official documentation is quite good, yet it is not straight-forward to understand how it works and how to use it.
There are a lot of blogs and articles describing Vert.x terminology and its concurrency model.
There are also tons of "Hello World" Vert.x applications on Github, and the rest seem to be just variations of the already typical "Web chat application using Vert.x".
On top of that, many of them are outdated (using AngularJS instead of Angular2+, for example).

The only two exceptions which I found are:
vertx-microservices-workshop: a demo application by the Vert.x development team.
ngrx-realtime-app: proof of concept application focu…