So…who gets the cake, and what do we eat? Two stories of time, order, and consensus
Winner takes the cake
Imagine yourself sitting in your room furiously working away at your homework for the day, as the voice of your mother echoes in your head:
“I want you and your brother to go to each of your rooms. Whichever one of you finishes both of your homework assignments first gets dessert with dinner.”
You race through the math problems with one hand, while haphazardly writing your English essay with the other, marveling at your inexplicable ability to learn how to write ambidextrously provided the right motivation…
…and yet, as you run down to the dinner table to get your prize, you see your brother quickly finishing up the piece of cake that was left out for the winner.
“Sorry son, you finished second, better luck next time.”
As you begin walking back up to your room, you think to yourself, what did my mom say was the challenge? And then it hits you!
“Mom, you didn’t say whichever one of us finished our homework first and came down would get the desert, you said, whichever one of us finished the homework first would get it.”
“What’s the difference?! your brother finished before you, and came down to collect the desert.”
“No, he didn’t finish before me, he simply got here first. After all, my room is all the way at the end of the hallway, while his is right by the dinner table. You don’t know who really finished the homework first, do you?”
In that moment, believe it or not, you not only found yourself enjoying a second piece of desert while your mom groaned in frustration, you also came across a crucial problem that sets the stage for the complications of distributed systems.
One can only reason about the causal order of events in a distributed system (or in a race to get the cake) with the following conditions, which define the comes before relation between two events:
- Given events A and B which occur in a single process, we can say A comes before B if it occurred prior to B
- Given a message sent from one process to another, we can say the sending of the message A comes before the receiving of the message B
- The comes before relation of events is transitive, so if A comes before B, and B comes before C, then A comes before C
If A doesn’t come before B, and B doesn’t come before A, then A and B are said to be concurrent.
In this high-stakes cake-on-the-line scenario, we can describe each of the three rooms as a separate process, and we can define the following events:
- The completion of a piece of homework
- Leaving a room
- Arriving in a new room
Going back to the comes before relation definition, we can further describe arriving in a new room as receiving the message sent by leaving a room, and thus, if we go from room A to room B, leaving room A comes before arriving in room B.
We can then describe the following comes before ordering of several subsets of events:
- We know that you finished either your math homework or reading homework first, and thus the one finished first comes before the one finished second
- Likewise, we know the same comes before ordering above for your brothers completion of the two pieces of homework
- We know that each of you finished both pieces of homework prior to leaving your respective rooms, so each piece of homework finished for your comes before you leaving your room, and the same for your brother
- Your brother arriving at the dinner table comes before your arrival at the dinner table
What we cannot say is the following
- We do not know whether either of your pieces of homework being completed comes before your brother, or the other way around. In fact, we have no way to order these 4 events.
- Although we know that your brother reached the dinner table before you, this does not indicate anything about events prior, and thus we cannot say with certainty that your brother leaving his room comes before you leaving your room.
In short, what we have here is a tough question, if we can only order events which somehow causally affect each other (either one being the tail end of the other e.g. leaving one room, arriving in another, or because they are a string of events in one contained room/process), how can we have a total ordering of events? Will we ever know who truly deserves the piece of cake?!
The plot thickens
Now, imagine many years later sitting at the dinner table with your roommates, trying to decide where to go for dinner that night.
On one end of the table you have your friend droning on and on about the hole in the wall taco place he found that’s gonna be the next big thing. Sitting next to you, your other roommate is sprinting through Yelp suggestions like an olympic sprinter, only to suggest the same Thai place she always wants, and finally, your last roommate is reminding everyone that the grill next door has $1 beers during happy hour, so who cares how bad their dinner menu is?
You may eventually come together and agree on one of the three, or maybe you’ll give up on the pursuit and skip eating dinner together, or maybe, you’ll somehow end up getting dinner together, albeit at an entirely different place that nobody even wanted to go to.
Just like the case of the cake, this ridiculous case of indecision and argument is the crux of a crucial problem in distributed systems, known as consensus.
Consensus can be defined most simply as a group of individuals coming to a unanimous agreement on a given topic. To give the definition a bit more rigor, we can define it as the following three characteristics
- Agreement: All individuals decide on the same value
- Validity: The decision agreed upon was suggested by one of the individuals
- Termination: All individuals involved eventually come to a decision
Let’s get technical with time
At this point, we’re about 1200 words into an “article about distributed systems”, and you may have ingested more about how to prioritize cheap beer vs. good food, or the unfairness of only having 1 piece of cake for 2 hungry kids.
Let’s start with looking at time and order concretely, and first, let’s consider, why don’t we simply timestamp every operation taken and allow this to drive our decisions. In short, this is a problem of synchronization of physical clocks, which while likely sufficient for this cake example, is often a synchronization problem that breaks down at scale, or at least, requires significant steps to keep in sync. More on this in the paper referenced at the end.
We can instead define what we will call a logical clock. This clock, rather than being based on a notion of time, is simply a timer, with each event’s logical timestamp being based on the following:
- If the prior event is within the same process, the timestamp of the event is +1 of the prior event’s timestamp
- If the prior event is the sending of a message from another process (for which this event is the receiving of the message), the timestamp of the event is the +1 of MAX(previous event in the same process, sending of message from the other process)
Above is one example of events across 3 processes and the logical timestamp associated with each event, which can be likened very similarly to the events in each room, and the messages generated by moving from one room to another.
Now that we have a numerical clock of sorts, we need to revisit the comes before relation to define a comes before order of logical clocks.
- For events A and B in the same process, A comes before B if the timestamp of A is less than that of B
- For events A and B, where A is the sending of a message, and B is the receiving of a message, A comes before B
- Logical clock order is transitive, A comes before B implies A comes before C if B comes before C
This looks essentially the same to the comes before relation we described earlier, with the only nuance being that we now have a numerical means for ordering events.
Now, we have a “clock” of sorts, but unlike a clock that you or I would be familiar with, this clock doesn’t always tell me if one event comes before another, and thus, only provides a partial order of events.
For example, in the cake case, the completion of the first assignment by you and your brother both have the same timestamp of 1, but we really don’t know which came before the other. The completion of the second homework assignments by each of you also are not discernible, and to make matters even worse, the second assignment for each of you has a timestamp of 2, but that still does not mean that we can say that the completion of your first assignment comes before the completion his second assignment, or vice versa. What kind of clock is this!?
A total order (of sorts)
While we may be satisfied at this point, we may decide we need some sort of total ordering of events. Now, we have pretty clearly laid out we cannot have a “correct” notion of the total order of events given the constraints we have laid out for logical clocks thus far, but we could at least define an arbitrary partial ordering if we were to extend logical clocks with one more rule:
- Given events A and B in separate processes, where the logical timestamp of A = B, we can say A comes before B if the priority of A’s process is greater than the priority of B’s process
Let’s unpack this. We are essentially saying, based on process priority, we can now order events which did not previously have a comes before relation in our partial ordering. What does process priority even mean here? Maybe it could be ordering based on process ID or node ID for a distributed system, or maybe for your mom, it could be saying your brother finished his second assignment before your second assignment and thus is the winner, because he is younger! A resoundingly unfair way to come to a total ordering you may think to yourself!
That being said, allowing for an arbitrary total ordering of all events in a system can be crucial for solving problems where the priority that is defined for resolving these conflicts is a reasonable factor of the system, or it is irrelevant to the behavior of the system how we choose to disseminate conflicts.
More background about extending the partial ordering of logical clocks to an arbitrary total ordering, and how we can further extend this arbitrary total ordering to define a distributing locking algorithm among a set of processes is explained in the paper I refer to at the end.
Coming back to consensus
In essence, the goal of consensus primarily comes down to the first of the 3 conditions we previously described, while the latter two bolster the value of having consensus, by ruling out some faulty edge cases from being considered as “consensus”.
Consider a distributed algorithm running over a set of nodes, with a hardcoded value of 1 set to be agreed upon by all of the processes executing this algorithm. While we could agree that these processes are satisfying the agreement and termination conditions of consensus, we would not be able to say that they satisfied the validity condition of consensus, since no processes are actually suggesting any values for consideration in consensus, and rather, this value is predetermined.
Now, consider another algorithm running over a set of nodes. In the case of this algorithm, when it comes time to decide, we may end up sitting idly, with all processes never reconvening and in a way, agreeing as a group to “do nothing”, with each process making this suggestion themself. In this case, while agreement and validity are trivially satisfied, the lack of termination strikes this down as a genuine form of consensus.
Consensus in theory is a seemingly simplistic issue, but the complexity of this mechanism is created by the flakiness in nature of failures of processes in a system, versus network partitions or high latency in the system, and the difficulty in telling the difference between these two. Furthermore, it is a crucial problem because being able to come to a consensus reliably would allow for a distributed set of processes to unilaterally agree on decisions per the requests of a leader, which is the equivalent of being able to “reliably broadcast” decisions across a distributed system. (which is a powerful operation in a distributed system, and one that is extremely difficult to accomplish)
That wasn’t too bad, was it?
Though I’ve admittedly trivialized the importance of time, order and consensus in a distributed system, mostly to provide a slightly lighter read, that doesn’t take away from the universally spanning impact of these simple ideas in any web-scale system. Distributed databases of course are a canonical example for the importance of these building blocks, but the effects are everywhere, from instant messaging, to banking and payment, to multiplayer gaming, and even to everyone’s favorite flavor of the month technology, blockchain.
If we all live our lives based on establishing a sense of order and being able to agree on things, it’s no wonder the systems that power our world attempt to do so too.
To learn more about time and ordering of events, the canonical paper is here.