Skip to main content

Shared Ledger Simulator

I have been interested in the shared/distributed ledger technology (a.k.a. block chain, a.k.a. the magic behind cryptocurrencies) for more than a year already but recently I had finally put real time and effort into it.

And since I believe that the best way to understand something is to get your hands dirty, that is what I have done, after I got a grasp of the core principles (or that is what I thought back then), I decided to code my own shared ledger simulator.

Initially, I also considered to look into the main existing code bases (e.g., the source code of the main/official Bitcoin client/miner or Ethereum's) since they are open source, but seeing code like this put me off... That file is 3229 lines big!!! Plus it is written in C++.
Do not get me wrong, I truly believe Satoshi Nakamoto (i.e., whoever hides behind that name) is a genius and also a great software developer, but he/she/they for sure did not have readability as their main priority.

I also noticed that some other people had the same idea before but I did not really like how they were implemented, for example:
This one --> as pointed out in one of the comments, it does not include the hash of the previous block in the current block... trying to make things simple is not an excuse for this, if you do not have this core feature, then what you have is not a shared ledger implementation at all, period.

Or this other one is not exactly what I had in mind, I wanted my running code to demonstrate how the algorithm works by generating multiple blocks and transactions in an automated way (more on this later). 

So I chose Java to code my shared ledger simulator from scratch mainly because it is my favorite programming language and also because it has better support for cryptography than many other languages out there. The complete source is available on GitHub under the MIT license.

Assumptions and compromises taken 
I wanted my implementation to be compliant with the core shared ledger principles while keeping things simple (or at least not too complicated), so these are the main characteristics/assumptions/compromises I made (in no particular order):

1.- Bitcoin (and Ethereum too) use Elliptic Curve Digital Signature Algorithm but I used good old RSA instead. Conceptually there is not much difference, since both rely in the same common concepts such as private key, public key and signature

2.- The addresses are 
represented as arrays of bytes and each address is unequivocally associated to an RSA key pair. For simplicity, I took the address as the last 20 bytes (or 40 hexadecimal digits/characters) of the public key. 

3.- There are only two type of nodes: full nodes (i.e., miners) and thin clients (which submit transactions). Each node runs on a different thread, analogously to real life where each node runs on a different computer which may be geographically very distant from other nodes.

4.- To simulate the network, I used two simple abstract data types, and Every thread/node has access to both ADTs:
- Full Node (Miners) have "read and add" access to BlocksInTime and "read only" access to TransactionsInTime
- Thin clients have "read only" access to BlocksInTime (to check when is safe to consider a submitted transaction as confirmed, e.g., after five blocks, but please note that this functionality has not been implemented yet) and "add" access to TransactionsInTime

5.- Every transaction must be signed (by a thin client) using the private key associated with the source address of the transaction --> this is one of the core principles of shared ledger

6.- Thin clients do not explicitly set a transaction fee, the difference between total inputs and total outputs of the transaction is considered to be the transaction fee --> exactly how Bitcoin works

7.- The higher the fee is, the higher chance for the transaction to be included in the next block by the lucky miner, i.e., transactions with higher fee have higher priority.

8.- Signing a transaction is implemented with the help of the standard Java serialization, once we get the array of bytes which represent an unsigned transaction, the bytes are signed using the private key of the thin client and the resulting signature is added ("stamped") to the signed transaction object, you may check this method.

9.- The number of maximum transactions than can fit in a block is set through a property in the properties file (Bitcoin limits the size of the block in bytes but I just limited the number of transactions for the sake of simplicity), so if the transaction fee is too low, it may take longer (or even forever) to be included in a mined block
but please note that submitted transactions never really expire, more info here.

10.- Every block (except the genesis block) must contain the hash of its parent block --> this is another core principle of shared ledger

11.- Every block contains the address of the miner, and includes a transaction with no source address which sends the miner's reward amount to that address, this is how new coins are being generated, thus the origin of every single coin/amount in every transaction can be traced to a mining reward transaction.

12.- In Bitcoin, the number of coins is limited (to 21 million) and at some point in the future (the year 2140) no more bitcoins will be created from mining, this limitation was not implemented in the simulator for the sake of simplicity and because I did not considerate it essential.  

13.- In the current implementation, a very basic (but reliable) Proof of Work mechanism is used when generating new blocks, I may add other mechanisms (e.g., Proof of Stake or something else which I already have in mind) in the future. 
For the sake of simplicity, the number of leading zeros in the block hash required is just set through a property value, instead of being auto-adjusted by the network to maintain an approximate constant frequency of blocks generation.

14.- The most difficult part to come up with was the algorithm for a full node (miner) to decide which block is the latest valid one from the longest chain (yes, there are multiple chains of blocks, I actually think that the term block chain is somehow misleading, block tree would be a better one). I had to put into practice a little of Graph theory (more precisely Depth First Search/Traversal algorithm) to calculate the distance (a.k.a. depth in a tree node) of a given block from the Genesis block. 
This may actually deserve its own post.

15.- There is no single data structure to permanently store the balances (i.e., what amount is available on what address/account). Balances are calculated on the fly by checking every single transaction present in a block chain --> this is another core principle of share ledger which my implementation honors.

16- I added several "sleep statements" across the source code in order to slow down the execution for didactic purposes.

Running the code 
So let's see the code in action.
In the this video you can see the execution of the which showcases two full nodes (miners) competing to generate the next block and also two thin clients submitting transactions to the network which then are included into the blocks.
You may execute the code yourself in your computer and even modify the timings of the main events (e.g., when to start a miner to join the network and start mining) or change the value of the properties to see how that impacts the outcome of the execution (i.e., the blocks generated)
Also, please let me know in case I missed anything or any of my assumptions or compromises made are not accurate or properly explained.


Popular posts from this blog

Kafka + WebSockets + Angular: event-driven microservices all the way to the frontend

In the the initial post of the  Event-driven microservices with Kafka series (see here  or  here ), I talked about the advantages of using event-driven communication and Kafka to implement stateful microservices instead of the standard stateless RESTful ones. I also presented the architecture and the source code of a related proof of concept application. In this post, I would like to show how to extend the asynchronous event-driven communication all the way from Kafka to the Web frontend passing through the Java backend. Hence, in the first post of this series, we got rid of HTTP as the communication protocol among microservices in the backend, and now we are also replacing it (with WebSockets) as the communication protocol between the frontend and the backend. Ok, but why would you do that? Because it provides a better experience to the end user!. Using WebSockets you can build legit  real-time user interfaces, the updates are pushed immediately from the server to the client

A Java dev journey to full-stack: first chapter

The Motivation I am an experienced Java developer and (surprise!) I like Java. I know it is not perfect but it works just fine for me (I enjoy type-safety and I do not consider verbosity a disadvantage, quite the opposite). I also know that some people dislike Java, which is also fine. But recently I decided to step out of my confort zone as developer, my goal isn't to be one of the "cool kids" neither trying to monetize a new skill in the job market. I have a quite practical motivation: I want to be able to build more (different) stuff. That's exactly the same reason why I learnt Android development by myself a couple of years ago. Web applications are ubiquitous, even more than native mobile apps, and thanks to cloud computing, one can easily and inexpensively release their idea/app to the World Wide Web. I already did some Web development in the past, in the bad old days of JSP and JSF, but the process was slow and painful. Nowadays the Web landscape h

Using Apache Kafka to implement event-driven microservices

When talking about microservices architecture, most people think of a network of stateless services which communicate through HTTP (one may call it RESTful or not, depending on how much of a nitpicker one is). But there is another way, which may be more suitable depending on the use case at hand. I am talking about event-driven microservices, where in addition to the classic request-response pattern, services publish messages which represent events (facts) and subscribe to topics (or queues depending on the terminology used) to receive events/messages. To fully understand and embrace this new software design paradigm is not straight-forward but it is totally worth it (at least looking into it). There are several interconnected concepts which need to be explored in order to discover the advantages of event-driven design and the evolutionary path which led to it, for example: Log (including log-structured storage engine and write-ahead log) Materialized View Event Sourcing C o