Distributed Systems — Prof. Omicini

Logical Clocks

DS-C5Andrea Omicini — DISI, Univ. Bologna A.Y. 2025/2026

In this lesson

1. Actions, Events, and Causes

Any computational system can be modelled as a sequence of executed actions. An action is defined by a change in the state of the system — or, more generally, in the world: reading from memory, writing to a file, switching on a lamp. In a distributed system, actions execute in multiple locations, and in this context they are often represented as events: sending or receiving messages, changing a record in a database.

The physical distribution of events can vary hugely: events may occur on different processes running on the same machine, or be geographically spread across the globe. The key property of a distributed system is that no process has immediate access to the state of another — information spreads only through communication.

Key idea

Events are the observable units of a distributed computation. Understanding how they relate to one another — especially which events could have caused which — is essential to designing correct distributed algorithms.

Not all events are related, but some events can cause or influence how other events occur. For example, a reply to a received email is influenced by that message — and possibly by other prior messages received. Detecting potential cause-and-effect relations between events is fundamental to the design of distributed algorithms: maintaining consistency in replicated databases, performing garbage collection, building checkpoints, and analysing concurrency.

2. Internal vs External Causality

To make sense of cause-and-effect relations within a distributed system, we should limit their scope to what can be perceived inside the system itself — this is internal causality. Processes observe events such as message sends and receives, and from these observations they can deduce causal links.

However, a distributed system also interacts with the outside physical world, where other cause-and-effect relations exist at large — external causality. Consider Alice reserving a table for dinner for two and then telling Bob; Bob then reserves two movie tickets for after dinner. Even if the restaurant and cinema booking systems are part of the same distributed system, they know nothing of the external causal relation ("Alice told Bob"). The two reservation systems see concurrent events even though a human observer knows they are causally linked.

Editor's note

External causality cannot be detected by a distributed system without domain-specific knowledge. The only general approximation available is physical time — if event A happened at 7 PM and event B at 8 PM, A might have caused B — but this is heuristic at best.

3. The Happens-Before Relation

In 1978, Leslie Lamport introduced the happens-before relation, denoted with the arrow operator a → b, meaning "a happens before b" — all processes agree that a occurs first, then b occurs. This relation is the foundation of all logical clock theory.

The relation is directly observable in two situations:

  1. Same process: If a and b are events of the same process, and a comes before b, then a → b. Local events are ordered by local sequential execution.
  2. Message propagation: If a message is sent by an event a (send event) and received by another event b (receive event), then a → b. A message always takes a finite, positive, non-zero amount of time to propagate.

The relation is transitive: if a → b and b → c, then a → c. This gives us a partial order over events in the system.

Important

When neither a → b nor b → a can be established, the events are said to be concurrent, written a ∥ b. Concurrency does not mean the events happened at the same physical time; it means we have no way to determine a causal ordering between them.

sequenceDiagram
    participant P1 as Process P1 (Alice)
    participant P2 as Process P2 (Bob)
    participant P3 as Process P3 (Chris)
    Note over P1: a1 → Check fridge
    P1->>P2: a2: "Dinner?" (ts=C(a2))
    Note over P2: b1 → Receive msg
    P2->>P1: b2: "Yes, let's do it" (ts=C(b2))
    Note over P3: c1 → "Bored..."
    P3->>P1: c2: "Can I join?" (ts=C(c2))
    Note over P1: a3 → Receive replies

In this diagram, the arrows represent message sends. The happens-before relation includes: a1 → a2 → b1 → b2 → a3, and also c1 → c2 → a3. But c1 is concurrent with a1 and a2 — no message connects Chris's boredom to Alice's initial check.

4. Causal Paths and Causal Histories

In a distributed computation, a simple way to check whether an event c could have caused another event e is to find at least one directed path from c to e following the happens-before relation. If such a path exists, then c → e and c is a possible cause for e.

The examiner will ask

Given a space-time diagram of a distributed execution, identify which events are causally related and which are concurrent. For any pair of events, determine whether there exists a causal path connecting them.

Causal Histories

A causal history is a richer representation: each process keeps track of the set of all events it knows about. When a message is sent, the sender includes its causal history; the receiver merges the received history with its own. The causal history of an event is the set of all events that happened before it.

For the scenario above:

Causal histories grow without bound, which makes them impractical as a data structure. However, the idea of tracking what each process knows about every other process directly motivates vector clocks.

5. Logical Time: From Causality to Clocks

Physical time could serve as an approximation for causality: something happening before in time might be the cause of something happening after. However, this approach loses track of explicit connections between events across processes — a message takes an unknown amount of time, and clocks on different machines drift.

Key idea

A logical notion of time can capture causality without requiring synchronised physical clocks. Logical clocks assign a time value to each event such that the ordering of time values respects the happens-before relation.

The core insight is that we do not need to know when an event happened in physical time — we only need to know whether it could have caused another event. A logical clock suffices for this purpose.

6. Logical Clocks: Definition and Consistency

Let H be the domain of events and T the domain of (logical) time values. A logical clock is a function C: H → T assigning a time value to each event, such that all processes agree on the value assigned to each event.

The fundamental property that every logical clock must satisfy is the clock consistency condition (monotonicity):

∀ a, b ∈ H, a → b ⇒ C(a) < C(b)

This ensures that if a causally precedes b, the logical clock reflects this by giving a a smaller timestamp. However, the converse does not necessarily hold: C(a) < C(b) does not imply a → b, because unrelated events can still receive ordered timestamps.

If the system additionally satisfies:

∀ a, b ∈ H, a → b ⇔ C(a) < C(b)

...then the clock system is said to be strongly consistent. Scalar (Lamport) clocks are not strongly consistent; vector clocks are.

7. Implementing Logical Clocks

Implementing logical clocks requires addressing two issues:

  1. Each process needs local data structures to represent logical time.
  2. A shared protocol defines how processes update these structures to ensure consistency.

Every process p_i maintains two conceptual views of time:

Different logical clock systems (scalar, vector, matrix) differ in their representation and their update protocol, but all implement the same two rules — one for updating the local clock, one for incorporating information from received messages — and all ensure the fundamental monotonicity property.

8. Lamport's Algorithm: Scalar Time

Lamport's logical clock algorithm is the simplest form of logical time. Each process p_i maintains a single integer lc_i (its local scalar clock) and a local increment δ_i (typically 1).

The algorithm's beauty is its minimalism: a single integer per process, with three straightforward rules. Despite its simplicity, it guarantees the clock consistency condition.

Because lc_i acts as both the local clock and the global timestamp, every event at p_i gets C(a) = lc_i(a). This merging of local and global time is what makes the system efficient but also what limits it — as we will see.

Key idea

The max operation on receive is what propagates causal information. When P2 receives a message with timestamp 3 but its own clock is only at 2, the max operation pulls the clock forward to 3, acknowledging that the sender had seen more events.

9. Lamport Clock Interactive Simulator

Step through a three-process distributed execution and watch the Lamport clocks evolve. Click a process button to advance its next event. If a receive step tries to consume a message that hasn't been sent yet, it will block (highlighted in amber) — demonstrating that happens-before imposes a real ordering constraint.

10. Total Ordering via Scalar Clocks

Scalar clocks can be used to establish a total order among all events in the system, even concurrent ones, by introducing a tie-breaking mechanism. When two events a, b have equal timestamps (C(a) = C(b)), we break ties using the process identifiers (which are linearly ordered).

This is crucial for liveness properties. Many distributed algorithms rely on totally ordered event queues: they need to decide which event to process next even when events are concurrent. Since concurrent events are causally independent, no causality relation is violated by ordering them arbitrarily.

For example, in Figure 1 of the slide reference, the third event of p_1 has the same timestamp (3) as the second event of p_2. With tie-breaking by process ID (assuming p_1 < p_2), the total order places p_1's event first.

11. The Limits of Scalar Time

Although scalar clocks satisfy a → b ⇒ C(a) < C(b), the converse is false:

C(a) < C(b)  ✘⇒  a → b

Scalar clocks are not strongly consistent. This is because each process squashes its local logical clock and its view of the global clock into a single integer. When process p_2 receives a message from p_1 with timestamp 2, it updates its clock to max(lc2, 2) = 2 — but it "forgets" that this value of 2 came from p_1's event, not from its own execution. Causal dependency information among events at different processes is lost.

Example

Consider three processes. p_1's third event has a scalar timestamp of 3, while p_3's fourth event has a timestamp of 4. Despite 3 < 4, the events may be concurrent — no message path connects them. The scalar clock gives the illusion of ordering where none exists causally.

This limitation motivates the need for a richer structure: vector clocks.

12. Vector Clocks: Motivation and Definition

In essence, with scalar clocks a → b implies C(a) < C(b), but C(a) < C(b) does not imply a → b. This means time values can be totally ordered when events are not — comparing scalar timestamps of concurrent events is meaningless.

What we need is a clock system that can answer two questions for any pair of events a, b:

  1. Does a causally precede b?
  2. Are a and b concurrent?

A vector clock for a system of n processes is an array (vector) of n logical clocks, one for each process. Each process p_i maintains a vector vc_i where:

Editor's note

The hypothesis is that the number of processes n is known and fixed. We could use smaller vectors for subsets of processes, but all correctness properties are proven when every process maintains an n-element vector.

13. Vector Clock Rules

The protocol for managing vector clocks follows the same structure as scalar clocks but operates on entire vectors:

The critical difference from scalar clocks is on receive: the element-wise max operation merges the sender's entire knowledge into the receiver's vector. This preserves causal dependency information: after receiving, p_j knows not just the sender's latest event count, but also what the sender knew about all other processes.

Evolution of Vector Time

sequenceDiagram
    participant P1 as p₁
    participant P2 as p₂
    participant P3 as p₃
    Note over P1: vc₁=[1,0,0]
    Note over P1: vc₁=[2,0,0]
    P1->>P2: msg with ts=[2,0,0]
    Note over P2: vc₂=[2,1,0] after receive
    Note over P2: vc₂=[2,2,0]
    P2->>P3: msg with ts=[2,2,0]
    Note over P3: vc₃=[2,2,1] after receive

Notice how the vector preserves the chain: p_3 ends up knowing that p_1 has had 2 events and p_2 has had 2 events. With a scalar clock, p_3 would have a single number (say, 5) and could not distinguish which process contributed which events.

14. Vector Clock Ordering and Strong Consistency

Vector timestamps are compared component-wise. For two vector timestamps vh and vk:

vh < vk ⇔ (∀i, vh_i ≤ vk_i) ∧ (∃i', vh_{i'} < vk_{i'})

This ordering is antisymmetric and transitive. But most importantly, it captures causality exactly:

vc(m1) < vc(m2) ⇔ m1 → m2

Vector clocks provide strong consistency — the bidirectional implication that scalar clocks lack. If two vector timestamps are incomparable (neither < nor > nor =), the corresponding events are concurrent.

The examiner will ask

Given two vector timestamps, determine the causal relationship between their events. For example, [2,1,0] and [2,2,0]: is the first event earlier, later, or concurrent with the second?

PropertyScalar (Lamport)Vector
Data per process1 integern integers
Message overhead1 integern integers
ConsistencyWeak: a→b ⇒ C(a)<C(b)Strong: a→b ⇔ vc(a)<vc(b)
Concurrency detectionNot possibleIncomparable ⇒ concurrent

15. Interactive Vector Clock Explorer

Fire events across three processes and watch the vector clock matrix update in real time. Each process shows its current vector; the matrix below gives the full picture. The event log tracks every action and its causal effect.

P1
P2
P3

Try this: fire P1 Internal (vc1 becomes [1,0,0]), then P1 Send to P2 (ts=[2,0,0]), then P2 Recv from P1 (vc2 becomes [2,1,0]). The vector preserves the knowledge that P2 knows about P1's two events. Now fire P3 Internal a few times (vc3 becomes [0,0,1], [0,0,2]) — P3's vector is incomparable with P1's. They are concurrent.

16. Matrix Time

Even in vector clocks, some "squashing" of information still occurs: each process knows about the event counts of other processes but not about their knowledge of third processes. Matrix clocks extend the idea one more level: each process maintains an n × n matrix where entry M_i[j][k] represents p_i's knowledge of what p_j knows about p_k.

Editor's note

Matrix clocks provide strictly more information than vector clocks, but at the cost of O(n^2) storage and message overhead per process. For most practical purposes, vector clocks are sufficient. The slide deck mentions matrix clocks as a natural extension but declares "we stop here."

17. Summary

  1. Actions and Events — Distributed computations are sequences of events across multiple processes.
  2. Happens-Before — Lamport's relation defines a partial order: same-process ordering plus message propagation, transitively closed.
  3. Causal Paths — A directed path through the happen-before graph identifies possible causality between events.
  4. Logical Clocks — Functions C: H → T satisfying a → b ⇒ C(a) < C(b).
  5. Scalar (Lamport) Clocks — Each process maintains one integer; increments before each event; piggybacks on sends; takes max on receives. Weakly consistent.
  6. Vector Clocks — Each process maintains an n-element vector; preserves causal dependencies; strongly consistent: vc(a) < vc(b) ⇔ a → b.
  7. Matrix Clocks — An n × n matrix per process capturing knowledge about other processes' knowledge.

Check Your Understanding

Explain the happens-before relation. What are the two ways to directly observe a → b?

The happens-before relation (a → b) means all processes agree that event a occurs before event b. It is directly observed in two situations: (1) a and b are events in the same process, and a comes before b in that process's sequential execution; (2) a is the sending of a message and b is the reception of that same message, since messages take finite positive time to propagate. The relation is transitive: a → b and b → c implies a → c.

What does it mean for two events to be concurrent (a ∥ b) in the context of happens-before?

Two events a and b are concurrent (a ∥ b) when neither a → b nor b → a can be established. This does NOT mean they happened at the same physical time. It means that there is no causal path connecting them in either direction: no chain of same-process ordering and message propagation links one to the other. They are causally independent.

State the clock consistency condition. Why is it called a "weak" condition?

The clock consistency condition is: a → b ⇒ C(a) < C(b). It is called "weak" because the implication goes only one way. If C(a) < C(b), we cannot conclude that a → b — the timestamps may be ordered while the events are actually concurrent. This is the case with scalar (Lamport) clocks, which are weakly consistent but not strongly consistent.

Describe the three rules of Lamport's logical clock algorithm.

(1) Before executing any local event (internal, send, or receive), process p_i increments its local clock lc_i by δ_i (typically 1). (2) When sending a message m, p_i piggybacks the current value of lc_i on the message as its timestamp. (3) Upon receiving a message m with timestamp ts, p_j adjusts its clock: lc_j ← max(lc_j, ts). This ensures the clock always moves forward to reflect observed causal information.

Why are scalar clocks NOT strongly consistent? Give a concrete example.

Scalar clocks merge the local clock and the view of global time into a single integer. When p₂ receives a message from p₁ with timestamp 3, it sets lc₂ = max(lc₂, 3) = 3, but "forgets" that this 3 represents p₁'s events, not p₂'s own. Later, p₁'s fourth event might get timestamp 4 and p₂'s third event also gets 4. Even if 4 = 4, the events might be concurrent — we cannot recover the causal relationship. More generally, C(a) < C(b) does not imply a → b because unrelated events can receive ordered timestamps.

How does a vector clock fix the strong consistency problem?

A vector clock maintains one counter per process (n counters for n processes). Instead of squashing all information into a single integer, the receive operation applies element-wise max: ∀k, vc_j[k] ← max(vc_j[k], ts(m)[k]). This preserves what p_j knows about every process after incorporating the sender's knowledge. The ordering vc(a) < vc(b) is defined component-wise and holds if and only if a → b, giving strong consistency. If two vectors are incomparable, the events are concurrent.

Prove that if a → b, then for vector clocks, vc(a) < vc(b).

By induction on the definition of a → b. Base cases: (1) Same process: a occurs before b at p_i. Each event increments vc_i[i], so vc(a)_i < vc(b)_i and all other components are ≥ (they may have been updated in between). Thus vc(a) < vc(b). (2) Message send/receive: a sends a message with timestamp ts = vc(a). Upon receive at b, p_j sets vc(b)[k] ← max(vc(b)[k], ts[k]) for all k, and also increments its own counter. So vc(b) ≥ ts = vc(a) element-wise, and at least one component (p_j's own) strictly increased. Thus vc(a) < vc(b). Transitive closure follows from transitivity of the < relation on vectors.

What is the practical implication of the trade-off between scalar and vector clocks?

Scalar clocks use O(1) space per process and O(1) message overhead but cannot detect concurrency. Vector clocks use O(n) space and O(n) message overhead but can detect concurrency and provide strong consistency. In systems with many processes, the O(n) overhead of vector clocks can be significant, so some practical systems use scalar clocks with additional mechanisms (like version vectors in CRDTs) or hybrid approaches.

Explain the concept of event height with scalar clocks when δ = 1.

When the increment δ is always 1, the scalar timestamp h of an event a has the property that h − 1 represents the minimum logical duration in terms of number of events — the number of events that causally precede a along the longest causal path ending at a. This is called the height of the event. For example, if an event has timestamp 5, at least 4 other events must have occurred sequentially before it, regardless of which processes produced them.

How could you use total ordering of scalar timestamps for liveness in a distributed algorithm?

Many distributed algorithms (e.g., distributed mutual exclusion, totally ordered multicast) rely on processing events in a consistent global order. Scalar timestamps with tie-breaking by process ID provide a total order. Since concurrent events are causally independent, ordering them arbitrarily does not violate causality. This total order ensures liveness: processes can always decide which event to handle next, even when events occur concurrently at different processes.