DEV Community

Sriram R

Posted on • Originally published at Medium

Vector Clocks

The main limitation of Lamport Clocks was that they were unable to assist us in determining the causal relationship between two events.

If you're not sure what a causal relationship is, read my article on Message Ordering first.

The reason for this is that we had a single timestamp that was being updated globally, which did not provide us with a complete picture of the sequence of events that occurred.

Vector Clocks

Instead of each node maintaining a single timestamp, we maintain a list of timestamps - one for each node in every node.

For example, if we have a three-node system, the vector timestamp would look like [ T1, T2, T3 ], where T1 is the logical time of Node 1, T2 is the logical time of Node 2, and T3 is the logical time of Node 3.

The entire list forms a single logical clock, known as vector clocks.

How do Vector Clocks work?

1. When a node boots up, it determines the number of nodes in the cluster (N) and fills an array of size N with 0's.
2. Before executing an event, the node increments the clock of that node's time in the list. `N[i] = N[i] + 1`
3. When a message is sent, the time for that node is incremented, and the vector clock is attached to the message.
4. When a message is received,
1. It performs the action
2. Update each list element by taking the maximum of that element's own list and the list received.

We say Event A caused Event B if A's vector timestamp is strictly less than B's vector timestamp.
We can also say that Event A and Event B are concurrent if their timestamps cannot be compared.

Pseudocode

``````on initialisation at node Ni do
T := [0,0,..0] //One element for each node
end on

on any event occuring at node Ni do
T[i] := T[i] + 1
end on

on request to send message m from node Ni do
T[i] := T[i] + 1
send_message(m, T)
end on

on receiving message (m, T') at node Ni do
T[j] := max(T[j], T'[j]) for j in (1..n)
T[i] := T[i] + 1;
process_message(m)
end on
``````

Issues with Vector Clocks

The main problem with vector clocks is that they require a large amount of storage because each node must store N timestamps based on the number of nodes.

Furthermore, as the number of nodes increases, we will be sending a significant amount of data because the entire vector clock with all node timestamps must be sent even if only two nodes are interacting.