Any computational system can be modelled as a sequence of executed actions. An action is defined by a change in the state of the system — or, more generally, in the world: reading from memory, writing to a file, switching on a lamp. In a distributed system, actions execute in multiple locations, and in this context they are often represented as events: sending or receiving messages, changing a record in a database.
The physical distribution of events can vary hugely: events may occur on different processes running on the same machine, or be geographically spread across the globe. The key property of a distributed system is that no process has immediate access to the state of another — information spreads only through communication.
Events are the observable units of a distributed computation. Understanding how they relate to one another — especially which events could have caused which — is essential to designing correct distributed algorithms.
Not all events are related, but some events can cause or influence how other events occur. For example, a reply to a received email is influenced by that message — and possibly by other prior messages received. Detecting potential cause-and-effect relations between events is fundamental to the design of distributed algorithms: maintaining consistency in replicated databases, performing garbage collection, building checkpoints, and analysing concurrency.
To make sense of cause-and-effect relations within a distributed system, we should limit their scope to what can be perceived inside the system itself — this is internal causality. Processes observe events such as message sends and receives, and from these observations they can deduce causal links.
However, a distributed system also interacts with the outside physical world, where other cause-and-effect relations exist at large — external causality. Consider Alice reserving a table for dinner for two and then telling Bob; Bob then reserves two movie tickets for after dinner. Even if the restaurant and cinema booking systems are part of the same distributed system, they know nothing of the external causal relation ("Alice told Bob"). The two reservation systems see concurrent events even though a human observer knows they are causally linked.
External causality cannot be detected by a distributed system without domain-specific knowledge. The only general approximation available is physical time — if event A happened at 7 PM and event B at 8 PM, A might have caused B — but this is heuristic at best.
In 1978, Leslie Lamport introduced the happens-before relation, denoted with the arrow operator a → b, meaning "a happens before b" — all processes agree that a occurs first, then b occurs. This relation is the foundation of all logical clock theory.
The relation is directly observable in two situations:
The relation is transitive: if a → b and b → c, then a → c. This gives us a partial order over events in the system.
When neither a → b nor b → a can be established, the events are said to be concurrent, written a ∥ b. Concurrency does not mean the events happened at the same physical time; it means we have no way to determine a causal ordering between them.
sequenceDiagram
participant P1 as Process P1 (Alice)
participant P2 as Process P2 (Bob)
participant P3 as Process P3 (Chris)
Note over P1: a1 → Check fridge
P1->>P2: a2: "Dinner?" (ts=C(a2))
Note over P2: b1 → Receive msg
P2->>P1: b2: "Yes, let's do it" (ts=C(b2))
Note over P3: c1 → "Bored..."
P3->>P1: c2: "Can I join?" (ts=C(c2))
Note over P1: a3 → Receive replies
In this diagram, the arrows represent message sends. The happens-before relation includes: a1 → a2 → b1 → b2 → a3, and also c1 → c2 → a3. But c1 is concurrent with a1 and a2 — no message connects Chris's boredom to Alice's initial check.
In a distributed computation, a simple way to check whether an event c could have caused another event e is to find at least one directed path from c to e following the happens-before relation. If such a path exists, then c → e and c is a possible cause for e.
Given a space-time diagram of a distributed execution, identify which events are causally related and which are concurrent. For any pair of events, determine whether there exists a causal path connecting them.
A causal history is a richer representation: each process keeps track of the set of all events it knows about. When a message is sent, the sender includes its causal history; the receiver merges the received history with its own. The causal history of an event is the set of all events that happened before it.
For the scenario above:
Causal histories grow without bound, which makes them impractical as a data structure. However, the idea of tracking what each process knows about every other process directly motivates vector clocks.
Physical time could serve as an approximation for causality: something happening before in time might be the cause of something happening after. However, this approach loses track of explicit connections between events across processes — a message takes an unknown amount of time, and clocks on different machines drift.
A logical notion of time can capture causality without requiring synchronised physical clocks. Logical clocks assign a time value to each event such that the ordering of time values respects the happens-before relation.
The core insight is that we do not need to know when an event happened in physical time — we only need to know whether it could have caused another event. A logical clock suffices for this purpose.
Let H be the domain of events and T the domain of (logical) time values. A logical clock is a function C: H → T assigning a time value to each event, such that all processes agree on the value assigned to each event.
The fundamental property that every logical clock must satisfy is the clock consistency condition (monotonicity):
∀ a, b ∈ H, a → b ⇒ C(a) < C(b)
This ensures that if a causally precedes b, the logical clock reflects this by giving a a smaller timestamp. However, the converse does not necessarily hold: C(a) < C(b) does not imply a → b, because unrelated events can still receive ordered timestamps.
If the system additionally satisfies:
∀ a, b ∈ H, a → b ⇔ C(a) < C(b)
...then the clock system is said to be strongly consistent. Scalar (Lamport) clocks are not strongly consistent; vector clocks are.
Implementing logical clocks requires addressing two issues:
Every process p_i maintains two conceptual views of time:
Different logical clock systems (scalar, vector, matrix) differ in their representation and their update protocol, but all implement the same two rules — one for updating the local clock, one for incorporating information from received messages — and all ensure the fundamental monotonicity property.
Lamport's logical clock algorithm is the simplest form of logical time. Each process p_i maintains a single integer lc_i (its local scalar clock) and a local increment δ_i (typically 1).
The algorithm's beauty is its minimalism: a single integer per process, with three straightforward rules. Despite its simplicity, it guarantees the clock consistency condition.
Because lc_i acts as both the local clock and the global timestamp, every event at p_i gets C(a) = lc_i(a). This merging of local and global time is what makes the system efficient but also what limits it — as we will see.
The max operation on receive is what propagates causal information. When P2 receives a message with timestamp 3 but its own clock is only at 2, the max operation pulls the clock forward to 3, acknowledging that the sender had seen more events.
Step through a three-process distributed execution and watch the Lamport clocks evolve. Click a process button to advance its next event. If a receive step tries to consume a message that hasn't been sent yet, it will block (highlighted in amber) — demonstrating that happens-before imposes a real ordering constraint.
Scalar clocks can be used to establish a total order among all events in the system, even concurrent ones, by introducing a tie-breaking mechanism. When two events a, b have equal timestamps (C(a) = C(b)), we break ties using the process identifiers (which are linearly ordered).
This is crucial for liveness properties. Many distributed algorithms rely on totally ordered event queues: they need to decide which event to process next even when events are concurrent. Since concurrent events are causally independent, no causality relation is violated by ordering them arbitrarily.
For example, in Figure 1 of the slide reference, the third event of p_1 has the same timestamp (3) as the second event of p_2. With tie-breaking by process ID (assuming p_1 < p_2), the total order places p_1's event first.
Although scalar clocks satisfy a → b ⇒ C(a) < C(b), the converse is false:
C(a) < C(b) ✘⇒ a → b
Scalar clocks are not strongly consistent. This is because each process squashes its local logical clock and its view of the global clock into a single integer. When process p_2 receives a message from p_1 with timestamp 2, it updates its clock to max(lc2, 2) = 2 — but it "forgets" that this value of 2 came from p_1's event, not from its own execution. Causal dependency information among events at different processes is lost.
Consider three processes. p_1's third event has a scalar timestamp of 3, while p_3's fourth event has a timestamp of 4. Despite 3 < 4, the events may be concurrent — no message path connects them. The scalar clock gives the illusion of ordering where none exists causally.
This limitation motivates the need for a richer structure: vector clocks.
In essence, with scalar clocks a → b implies C(a) < C(b), but C(a) < C(b) does not imply a → b. This means time values can be totally ordered when events are not — comparing scalar timestamps of concurrent events is meaningless.
What we need is a clock system that can answer two questions for any pair of events a, b:
A vector clock for a system of n processes is an array (vector) of n logical clocks, one for each process. Each process p_i maintains a vector vc_i where:
The hypothesis is that the number of processes n is known and fixed. We could use smaller vectors for subsets of processes, but all correctness properties are proven when every process maintains an n-element vector.
The protocol for managing vector clocks follows the same structure as scalar clocks but operates on entire vectors:
The critical difference from scalar clocks is on receive: the element-wise max operation merges the sender's entire knowledge into the receiver's vector. This preserves causal dependency information: after receiving, p_j knows not just the sender's latest event count, but also what the sender knew about all other processes.
sequenceDiagram
participant P1 as p₁
participant P2 as p₂
participant P3 as p₃
Note over P1: vc₁=[1,0,0]
Note over P1: vc₁=[2,0,0]
P1->>P2: msg with ts=[2,0,0]
Note over P2: vc₂=[2,1,0] after receive
Note over P2: vc₂=[2,2,0]
P2->>P3: msg with ts=[2,2,0]
Note over P3: vc₃=[2,2,1] after receive
Notice how the vector preserves the chain: p_3 ends up knowing that p_1 has had 2 events and p_2 has had 2 events. With a scalar clock, p_3 would have a single number (say, 5) and could not distinguish which process contributed which events.
Vector timestamps are compared component-wise. For two vector timestamps vh and vk:
vh < vk ⇔ (∀i, vh_i ≤ vk_i) ∧ (∃i', vh_{i'} < vk_{i'})
This ordering is antisymmetric and transitive. But most importantly, it captures causality exactly:
vc(m1) < vc(m2) ⇔ m1 → m2
Vector clocks provide strong consistency — the bidirectional implication that scalar clocks lack. If two vector timestamps are incomparable (neither < nor > nor =), the corresponding events are concurrent.
Given two vector timestamps, determine the causal relationship between their events. For example, [2,1,0] and [2,2,0]: is the first event earlier, later, or concurrent with the second?
| Property | Scalar (Lamport) | Vector |
|---|---|---|
| Data per process | 1 integer | n integers |
| Message overhead | 1 integer | n integers |
| Consistency | Weak: a→b ⇒ C(a)<C(b) | Strong: a→b ⇔ vc(a)<vc(b) |
| Concurrency detection | Not possible | Incomparable ⇒ concurrent |
Fire events across three processes and watch the vector clock matrix update in real time. Each process shows its current vector; the matrix below gives the full picture. The event log tracks every action and its causal effect.
Try this: fire P1 Internal (vc1 becomes [1,0,0]), then P1 Send to P2 (ts=[2,0,0]), then P2 Recv from P1 (vc2 becomes [2,1,0]). The vector preserves the knowledge that P2 knows about P1's two events. Now fire P3 Internal a few times (vc3 becomes [0,0,1], [0,0,2]) — P3's vector is incomparable with P1's. They are concurrent.
Even in vector clocks, some "squashing" of information still occurs: each process knows about the event counts of other processes but not about their knowledge of third processes. Matrix clocks extend the idea one more level: each process maintains an n × n matrix where entry M_i[j][k] represents p_i's knowledge of what p_j knows about p_k.
Matrix clocks provide strictly more information than vector clocks, but at the cost of O(n^2) storage and message overhead per process. For most practical purposes, vector clocks are sufficient. The slide deck mentions matrix clocks as a natural extension but declares "we stop here."
The happens-before relation (a → b) means all processes agree that event a occurs before event b. It is directly observed in two situations: (1) a and b are events in the same process, and a comes before b in that process's sequential execution; (2) a is the sending of a message and b is the reception of that same message, since messages take finite positive time to propagate. The relation is transitive: a → b and b → c implies a → c.
Two events a and b are concurrent (a ∥ b) when neither a → b nor b → a can be established. This does NOT mean they happened at the same physical time. It means that there is no causal path connecting them in either direction: no chain of same-process ordering and message propagation links one to the other. They are causally independent.
The clock consistency condition is: a → b ⇒ C(a) < C(b). It is called "weak" because the implication goes only one way. If C(a) < C(b), we cannot conclude that a → b — the timestamps may be ordered while the events are actually concurrent. This is the case with scalar (Lamport) clocks, which are weakly consistent but not strongly consistent.
(1) Before executing any local event (internal, send, or receive), process p_i increments its local clock lc_i by δ_i (typically 1). (2) When sending a message m, p_i piggybacks the current value of lc_i on the message as its timestamp. (3) Upon receiving a message m with timestamp ts, p_j adjusts its clock: lc_j ← max(lc_j, ts). This ensures the clock always moves forward to reflect observed causal information.
Scalar clocks merge the local clock and the view of global time into a single integer. When p₂ receives a message from p₁ with timestamp 3, it sets lc₂ = max(lc₂, 3) = 3, but "forgets" that this 3 represents p₁'s events, not p₂'s own. Later, p₁'s fourth event might get timestamp 4 and p₂'s third event also gets 4. Even if 4 = 4, the events might be concurrent — we cannot recover the causal relationship. More generally, C(a) < C(b) does not imply a → b because unrelated events can receive ordered timestamps.
A vector clock maintains one counter per process (n counters for n processes). Instead of squashing all information into a single integer, the receive operation applies element-wise max: ∀k, vc_j[k] ← max(vc_j[k], ts(m)[k]). This preserves what p_j knows about every process after incorporating the sender's knowledge. The ordering vc(a) < vc(b) is defined component-wise and holds if and only if a → b, giving strong consistency. If two vectors are incomparable, the events are concurrent.
By induction on the definition of a → b. Base cases: (1) Same process: a occurs before b at p_i. Each event increments vc_i[i], so vc(a)_i < vc(b)_i and all other components are ≥ (they may have been updated in between). Thus vc(a) < vc(b). (2) Message send/receive: a sends a message with timestamp ts = vc(a). Upon receive at b, p_j sets vc(b)[k] ← max(vc(b)[k], ts[k]) for all k, and also increments its own counter. So vc(b) ≥ ts = vc(a) element-wise, and at least one component (p_j's own) strictly increased. Thus vc(a) < vc(b). Transitive closure follows from transitivity of the < relation on vectors.
Scalar clocks use O(1) space per process and O(1) message overhead but cannot detect concurrency. Vector clocks use O(n) space and O(n) message overhead but can detect concurrency and provide strong consistency. In systems with many processes, the O(n) overhead of vector clocks can be significant, so some practical systems use scalar clocks with additional mechanisms (like version vectors in CRDTs) or hybrid approaches.
When the increment δ is always 1, the scalar timestamp h of an event a has the property that h − 1 represents the minimum logical duration in terms of number of events — the number of events that causally precede a along the longest causal path ending at a. This is called the height of the event. For example, if an event has timestamp 5, at least 4 other events must have occurred sequentially before it, regardless of which processes produced them.
Many distributed algorithms (e.g., distributed mutual exclusion, totally ordered multicast) rely on processing events in a consistent global order. Scalar timestamps with tie-breaking by process ID provide a total order. Since concurrent events are causally independent, ordering them arbitrarily does not violate causality. This total order ensures liveness: processes can always decide which event to handle next, even when events occur concurrently at different processes.