CRDTs and Distributed Consistency - The complex distributed counter

9 min read

August 18, 2022

In part one of this series we built a simple distributed counter, specifically a G-Counter, which has some hard limitations, one of them being that it can only be incremented and the other one is that we need to know beforehand how many agents are going to be affecting it. In this part we will extend the counter to solve both problems.

CRDTs and Distributed Consistency - The complex distributed counter

This is the second part of a series of posts on CRDTs and Distributed Consistency. If you have not read the first part I recommend you start there. You can find the first part here.

Extending a G-Counter to allow increments and decrements

The GCounter class we built in the previous post only allowed increments to be performed, now we want to allow increments and decrements. We will need to think a little bit outside the box to achieve this because our implementation depends on the Math.max function to decide whether an update coming from another agent is newer than the state we currently have.

Let's see the problem with an example, we can assume we remove the increment only restriction by commenting out the code as follows:


This happened because when merging the states [3, 0] and [0, -1] for the client1 and client2 respectively, the max between 0 and -1 is 0 and there is no way to know which value is newer. If we have the restriction of only allowing increments the rule for deciding how to solve a conflict is trivial because we know that whatever value is bigger will be newer.

In order to solve this we need to think on the independent operations of increasing and decreasing the count. How did each client reached their independent count state?

In the case of client1 it reached its state by doing three increments of 1. For client2 the steps were one increment of 3 and two increments of -2. If we re-write the steps for client2 we can say it reached the state by doing one increment of 3 and two decrements of 2. Let's see this in a different way by placing each operation on different arrays:


You might have noticed that now, after we changed the wording of the operations, both the increments and decrements are now positive numbers and they can be both represented as independent counts! And guess what? We already know how to represent positive counts, we do it with a G-Counter.

So, in order to remove the increment only restriction we simply need to have two counts, one for the increments and one for the decrements and to get the final result we can just subtract them from each other. This is called a PN-Counter (Positive/Negative-Counter) and the implementation is really simple because we already did most of the work with the G-Counter:


We can see it working with an example, we will use the same we used to derive the idea for the counter:


Extending a G-Counter or PN-Counter to allow arbitrary clients

The second restriction that our original counter had is that it would only work if we know how many agents can be affecting the counter beforehand, that is a big restriction for any distributed system that could automatically scale up or down without notice. Let's try to remove that restriction.

If we look at the implementation of the G-Counter we will notice that we choose our data structure for the counts to be an array, that is a very efficient structure but it also is the one forcing us to use index-based IDs.


This is a problem, imagine you have two new agents "joining the counter" at the same time, if we want to keep them independent, each one of them would add one more place into the array and assign themselves the n+1 id and cause a collision, this breaks the counter.

It now becomes clear that we need to identify the agents with unique IDs and the IDs can't be index-based, we can use whatever we want but we also need to change the state structure, in this case we will use a Map to store the id-count pair.


If we continue adapting the code we will end up with a very similar implementation, but that will instead of assuming we have always values for all clients it will use defaults for unknown clients, see the full code below with comments:


We can test it with an example:


And that's it! We just solved both problems, our counter now supports increments and decrements and also can accommodate any number of clients as they join the count. There are some improvements that can be done to the solution in terms of performance, but for the purposes of this post the solution is enough.

Conclusions

We started with a very simple distributed counter and we have evolved it to be a more powerful counter that supports increments, decrements and an arbitrary number of clients. One problem that you might have noticed here is that there is no way of cleaning up old or dead clients. That can be achieved but it imposes many challenges and more advanced concepts as "tombstones" or "remove sets". You can read more details on the Wikipedia page for CRDTs.

Following this same approach you can build CRDTs that are increasingly more complex but at the same time they provide features that can allow you to build truly distributed apps.

In the next part of the series I will build an example that uses a peer-to-peer approach to share the state between agents.

Thanks to Martín Feldman, Martín Fitipaldo and Nicolás Wolman from our team that helped me go through the second part of this post.