Distributed Systems — Prof. Omicini — University of Bologna, ISI LM

Computing with Time

DS-M6 • Academic Year 2025/2026 • Module 2

In this lesson

1. Prologue — Computing Needs Time

The relationship between computing and time is far more delicate than it first appears. Almost all computational devices today are not just general-purpose computers: cars, medical devices, industrial robots, toys, communication systems, and the exploding Internet of Things (IoT) embed computational power into physical contexts. These special-purpose computers are increasingly networked and intelligent, yet their growing complexity can come at the expense of dependability.

Conversely, general-purpose computers are ever more required to interact with physical processes: they integrate video and audio, sense physical dynamics through personal devices, and control physical actuators in pervasive systems. Computing is no longer confined to the abstract manipulation of symbols — it is embedded in the real world, and the real world runs on physical time.

Key idea

Time has always been abstracted away from computing. The Turing-Church-von Neumann foundations are about the transformation of data, not about the dynamics of physical processes. As computing merges with the physical world, this abstraction becomes a liability.

The central observation, due to Edward A. Lee [Lee, 2009], is that computing needs time as a first-class semantic property, not merely as a performance metric. The core abstractions of computing should be rethought to include time.

2. Cyber-Physical Systems

Definition and Scope

Cyber-Physical Systems (CPS) emerge from the deep integration of physical systems and processes with networked computing. CPS embed computations and communication into physical processes to add new capabilities to physical systems. The fundamental challenge is that the foundations of computing — Turing, Church, von Neumann — do not deal with the dynamics of physical processes.

Exam tip

CPS are about merging computing and networking with physical systems to create new science, techniques, and products. The passage of time is essentially absent in traditional computing — and this absence is the core problem CPS must address.

Remarkable Examples of CPS

High-confidence medical devices, assisted living systems, traffic control and safety, advanced automotive systems, process control, energy conservation, environmental control, avionics and instrumentation, critical infrastructure control (electric power, water, communications), distributed robotics (telepresence, telemedicine), defence systems, manufacturing, and smart structures.

The Passage of Time

The technological basis that engineers and computer scientists have chosen for general-purpose computing and networking does not support well the applications dealing with physical processes [Lee, 2009]. The passage of time is essentially absent in computing — time is abstracted away, modelled out, considered a non-functional concern. CPS force us to confront this absence directly.

Warning

A program that produces the correct result at any speed is functionally correct in the classical sense — but in a CPS, a result delivered too late is equivalent to a wrong result. Time is not a performance issue; it is a correctness issue.

3. Misconceptions about Time in Computing

Edward A. Lee [Lee, 2009] identifies four pervasive misconceptions about time in computing. Each of these hides a deeper truth that becomes critical in distributed and embedded systems.

MisconceptionReality
Computing takes time — efficiency has limits, but time is abstracted away. Time is always abstracted away from computing, so computing always omits time. It is not just that efficiency matters; time is removed from the semantic model entirely.
Time is a resource — like memory or CPU cycles, but a different kind. Time is practically unbounded and is expended regardless of whether we use it. While generic resource management works for memory or energy, time needs to be a semantic property of programs, not a resource to be managed.
Time is a nonfunctional property — functions are computed by abstracting from time. With a Turing Machine, a function can be computed by abstracting from time. But in a CPS, the function of a computation is defined through actions occurring in physical time with a duration. Time defines the function, not just its performance.
Real time is a QoS problem — about efficiency and meeting deadlines. Time in CPS is mostly not about efficiency. It is about predictability and repeatability. Precision and variability in timing are quality-of-service issues, but time itself is much more than that: time should be part of the semantics of programs.
Key idea

The subtitle says it all: core computing abstractions should be re-thought to include time. This lesson explores what that means for distributed systems.

4. Which Notion of Time?

Before we ask how time should enter computing, we must ask a more fundamental question: what notion of time are we talking about? The Britannica encyclopedia defines time as “a measured or measurable period, a continuum that lacks spatial dimensions” — but this definition is deeply intertwined with philosophical, mathematical, and scientific investigation.

Our common-sense notion of time — uniform, universal, ticking away independently of everything else — is precisely the notion that modern physics has shown to be incorrect. If computing must deal with time, it must deal with the real nature of time, not with a naive idealisation.

Editor's note

The slides ask “what about our common sense notion of time?” as a rhetorical question. The answer is that common sense is a poor guide: we think of time as a single, universal, flowing river, but physics tells us otherwise.

5. Time after Relativity

Einstein's relativity [Einstein, 1920] shattered the classical notion of a single universal time. Carlo Rovelli [Rovelli, 2017] articulates the consequences with striking clarity:

Exam tip

You may be asked: “Why is there no universal notion of time in a distributed system?” The foundational reason is not just engineering — it is physics. There is no universal time, period. Distributed systems inherit this fundamental property of the universe.

6. Parallel, Concurrent, Distributed — Spatial vs Temporal Contexts

A precise understanding of time in distributed systems requires that we recast the distinctions among parallel, concurrent, and distributed computing in terms of temporal and spatial contexts.

Parallel Computing

A computational system performs parallel computation when the temporal context is the same for all computational processes: all processes share the same notion of time T. This is the classical multi-core or SIMD model, where all processing elements are synchronised to a common clock.

graph LR
    subgraph "Same Temporal Context T"
        P1["Process 1"] --- P2["Process 2"]
        P2 --- PN["Process N"]
    end

Concurrent Computing

A system performs concurrent computation when at least two computational processes have a different temporal context: TT'. Each process has its own local notion of time, and these notions are not inherently synchronised. This is the natural model for multi-threading on separate cores, processes on the same machine with independent scheduling, or any scenario where processes advance at their own rate.

graph LR
    subgraph "Temporal Context T"
        P1["Process 1"]
    end
    subgraph "Temporal Context T'"
        P2["Process 2"]
    end

Distributed Computing

A system performs distributed computation when at least two computational processes have a different spatial context: SS'. The processes run on physically separate machines connected by a network. Distribution is defined by space, not time.

graph LR
    subgraph "Spatial Context S"
        P1["Process 1"]
    end
    subgraph "Spatial Context S'"
        P2["Process 2"]
    end
    P1 -.->|Network| P2

Combined Cases

System TypeSpatial ContextsTemporal Contexts
Distributed parallelSS'Same T
Distributed concurrentSS'TT'
Key idea

Distributed concurrent computing — where both spatial and temporal contexts differ — is the most natural and general model for distributed systems. It combines the challenges of independent clocks (concurrency) with physically separate nodes (distribution). Most real distributed systems operate in this regime.

7. The Issue of Time in Distributed Systems

In centralised, non-distributed systems, time is unambiguous: there is a single clock, a single temporal context. A process obtains the time through a system call to the kernel, and any subsequent call returns a strictly larger value. This gives a total ordering of events with no effort.

In a distributed system, there is no natural notion of time. Each machine has its own local clock, and these clocks drift apart. There is no shared memory, no shared clock, no global synchronisation primitive. This leads to the fundamental questions:

Exam tip

The dictionary defines “synchronous” as “existing or occurring at the same time.” In a distributed system, there is no way to determine whether two events on different machines occurred at exactly the same time — because there is no universal “same time” to measure them against. This is the root of the synchronisation problem.

8. Physical Clocks and Timers

How a Computer Keeps Time

A clock in a computer is actually a timer. Typically, it consists of an oscillating quartz crystal, a counter, and a holding register. The quartz oscillates at a precise frequency; the counter decrements with each oscillation; when the counter reaches zero, an interrupt is generated and the counter is reloaded from the holding register. Each interrupt is a clock tick.

No Problem in Centralised Systems

In a centralised system with a single clock, a process gets the time by issuing a system call; another process doing the same later will get a strictly larger value. This provides a total ordering of events reliably and without effort.

The Distributed Problem

In a distributed system, each process has its own internal clock and its own notion of time. Each clock ticks independently, and they drift apart — potentially by several seconds per day. Even if all clocks are synchronised at system start (set to the same value), differences in tick rates cause them to diverge over time.

Watch out

The naive solution — “just synchronise the clocks” — turns out to be deep and subtle. The course asks: do we actually need synchronisation? The answer, as we will see, is: it depends on what you need time for.

9. Clock Terminology: Offset, Skew, Drift

Given two machines a and b in a distributed system, with their clocks Ca and Cb, the following precise terminology [Kshemkalyani and Singhal, 2011] lets us reason about clock behaviour:

TermDefinitionFormulaIntuition
Time The value a clock returns at real time t Ca(t) For a perfect clock, Ca(t) = t
Frequency The rate at which a clock progresses C'a(t) First derivative of the clock function
Offset Difference between a clock and real time (or between two clocks) Ca(t) − t (absolute)
Ca(t) − Cb(t) (relative)
How far apart two clocks are at a given instant
Skew Frequency difference between a clock and a perfect clock (or between two clocks) C'a(t) − C'b(t) How fast the offset is growing or shrinking
Drift The second derivative of the clock function C''a(t) How the skew itself changes over time (e.g., due to temperature)
Key idea

Offset is a value (the difference in readings now), skew is a rate (how fast the offset grows), and drift is a change in rate (acceleration). Understanding these three levels is essential to designing synchronisation algorithms.

10. Clock Inaccuracies and the Maximum Skew Rate

Manufacturers typically specify the maximum skew rate ρ for a clock, also known as the maximum drift rate. A timer (clock) is said to be working within its specification if:

1 − ρ ≤ dC/dt ≤ 1 + ρ

This inequality says that a clock's frequency can deviate from perfect by at most ρ. For typical quartz clocks, ρ is about 10−6 (1 part per million), corresponding to a drift of roughly 86 milliseconds per day. Temperature, voltage, age, and manufacturing tolerances all contribute to ρ.

Clock Behaviour Visualised

Fast, Perfect, and Slow Clocks

Relative to UTC, a clock can be:

The clock offset grows linearly with time for a constant skew. After time Δt, the offset of a fast clock with skew ρ is approximately ρ × Δt. For example, a clock with ρ = 10−5 (10 ppm) will drift about 0.86 seconds per day.

Editor's note

Clock drift is a physical phenomenon, not a bug. Quartz crystals change their oscillation frequency with temperature, pressure, and age. A clock that is perfect at 25°C may be significantly slow at 0°C or fast at 50°C. In space or high-altitude applications, radiation can also affect timing.

11. Physical Clock Synchronisation: UTC and NTP

Why Synchronise?

Physical clock synchronisation is the process of ensuring that physically distributed processors have a common notion of time. It is needed for three main purposes: knowing the time of day at which an event happened on a specific machine, measuring the time interval between two events on different machines, and determining the relative ordering of events across machines.

Universal Coordinated Time (UTC)

UTC is the international standard for timekeeping, maintained by the BIPM (Bureau International des Poids et Mesures) in Sevres, France. BIPM combines and averages the atomic time standards of member nations to create a single official time. UTC is broadcast as short-wave radio pulses (WWV) by NIST every UTC second, and also via GPS and other satellites.

If one machine in a distributed system has access to a UTC service, an algorithm can synchronise all other machines based on this reference.

Network Time Protocol (NTP)

NTP (RFC 5905 for v.4) is the standard protocol for clock synchronisation over the Internet. A hierarchy of time servers, from Stratum 0 (atomic clocks) to Stratum 1 (directly connected to Stratum 0) to Stratum 2 and below, propagates accurate time. Key design constraint: clocks can only run forward — corrections cannot bring clocks backward, as this could break causality (e.g., file timestamps).

sequenceDiagram
    participant C as Client
    participant S as NTP Server
    Note over C: T1 = local time
    C->>S: NTP request (T1)
    Note over S: T2 = receive time (server)
    Note over S: T3 = transmit time (server)
    S->>C: NTP response (T1, T2, T3)
    Note over C: T4 = receive time (client)
    Note over C: delay = (T4-T1)-(T3-T2)
offset = ((T2-T1)+(T3-T4))/2

The key insight is that NTP estimates both the one-way network delay and the clock offset by measuring the round-trip time and assuming symmetric network delays. The client computes:

Clock Drift Visualiser

Clock Drift Visualiser

Simulate how two clocks drift apart. The UTC reference runs perfectly; the local clock can run fast or slow.

UTC Reference
0.00
Local Clock
0.00
Offset
+0.00
−200 ppm (slow)0 (perfect)+200 ppm (fast)
Clock offset: 0.00 s
Exam tip

Understand the NTP offset formula: offset = ((T2 − T1) + (T3 − T4)) / 2. This assumes symmetric network delays. Why is this assumption problematic? Because in real networks, the forward and reverse paths can have very different latencies (asymmetric routing, congestion).

12. Absolute Time vs Relative Time

Not all applications need UTC. The required temporal properties vary widely, and sometimes a UTC service is not available at all. This leads to a crucial distinction:

Global Absolute TimeGlobal Relative Time
ReferenceUTC (or another external standard)An internal agreement among processes
AccuracyBounded by NTP/Cesium clocksBounded by the synchronisation protocol
External meaningMatches real-world timeMeaningful only within the system
ExamplesNTP, GPS time syncBerkeley Algorithm, RBS

The Berkeley Algorithm

The Berkeley Algorithm [Gusella and Zatti, 1989] synchronises machines using only their mutual agreement, without requiring any machine to have UTC. A time daemon polls all participating machines, collects their local times, computes an average (discarding outliers if necessary), and sends each machine the required correction. Since no machine has the “true” time, the algorithm creates a relative global time that all machines share.

Reference Broadcast Synchronisation (RBS)

RBS [Elson et al., 2002] is designed for wireless networks. Instead of a client synchronising with a server, all receivers synchronise with each other based on a reference broadcast message. The key insight: the broadcast arrives at all receivers at approximately the same time (since radio propagation is near light-speed). Receivers record their local time upon reception and exchange these timestamps to compute relative offsets.

Key idea

Absolute time is needed when events must be correlated with the real world (e.g., logging for legal compliance, coordinating with external systems). Relative time suffices when only internal consistency matters (e.g., maintaining consistent ordering in a distributed database, coordinating distributed sensors in the same field).

13. Logical Time — Why It Is Needed

Physical time is not always necessary. In many cases, the only actual requirement is a shared notion of time among processes, with no need for it to correspond to real time. A step further: often the only need is a shared ordering of events, not a shared clock at all. This observation leads to logical time — a notion of time that is both possible and useful without any physical clock.

The Bank Database Example

A classic example [Tanenbaum and van Steen, 2017] motivates the need for logical time in the absence of synchronised physical clocks:

sequenceDiagram
    participant LA as LA Replica
    participant NY as NY Replica
    Note over LA: Account: $1000
    Note over NY: Account: $1000
    Note over LA: User adds $100
    Note over NY: Bank applies 1%
    LA->>NY: Update: +$100
    NY->>LA: Update: +1%
    Note over LA: LA result: $1110
(1000+100+1% of 1000) Note over NY: NY result: $1111
(1000+1%+100)

Two concurrent updates — a customer adds $100 to the account while a bank employee applies a 1% interest increment — produce inconsistent results at the two replicas. The LA replica ends up with $1110; the NY replica with $1111. The problem is that the updates are concurrent and their ordering is ambiguous without a shared notion of time.

Watch out

The bank example reveals that inconsistency arises not from clock imprecision but from clock absence. Even perfect physical clocks would not solve the problem if the updates arrive at different replicas in different orders — we need to order the updates, not just timestamp them.

The Core Insight [Lamport, 1978]

Synchronisation is possible with no need to be absolute. If two processes do not interact, there is no need for synchronisation — the lack would not be observable. What really matters is not the exact time when events occur, but the order in which events occur. This is precisely the insight behind make and Gradle, which use file modification timestamps to decide which targets to rebuild: they need relative ordering, not absolute time.

14. Lamport Logical Clocks

The Algorithm

Leslie Lamport's 1978 paper “Time, Clocks, and the Ordering of Events in a Distributed System” introduced logical clocks as a mechanism to capture causal order (the “happened-before” relation) without physical synchronisation. Each process p maintains an integer counter Lp, its logical clock, updated as follows:

Event TypeRule
Internal eventLpLp + 1
Send eventLpLp + 1; send message with timestamp Lp
Receive eventLp ← max(Lp, tsmsg) + 1

These three rules ensure the clock condition: if event a happens before event b (ab), then L(a) < L(b). Note that the converse does not hold: L(a) < L(b) does not imply ab; the clocks can produce false positives. This is the price of a lightweight, decentralised mechanism.

Interactive Lamport Clock Simulator

Step through the processes below to see how Lamport clocks maintain causal ordering. Each process can execute internal events, send messages, or receive them. The shared state shows the logical clock values.

Key idea

Lamport clocks give us a consistent ordering: if two events are causally related, their logical timestamps reflect that. But they do not give us a total ordering consistent with causality — concurrent events may be ordered arbitrarily. Vector clocks extend Lamport clocks to capture causality exactly, at the cost of larger timestamps.

15. Beyond Synchronisation: Mutual Exclusion and Election

Ordering Events Is Not Enough

Sometimes more articulated policies are required beyond ordering. For instance, ensuring that concurrent accesses to a shared resource do not corrupt its consistency requires mutual exclusion. Many distributed algorithms address this problem:

Editor's note

The slides do not review mutual exclusion algorithms in detail; the key point is that some are based on a coordinator and all of them are coordination algorithms. This is a conceptual bridge to the next topic.

Election Algorithms

Many distributed algorithms require a coordinator to be elected among the participating processes. Election algorithms (e.g., Bully algorithm, Ring algorithm) select one process as the leader. These are not reviewed in detail here either, but they are coordination algorithms used by other coordination algorithms — a recurring pattern in distributed systems.

Exam tip

Why distinguish synchronisation from coordination? Synchronisation is about when things happen; coordination is about what happens, based on both time and the nature of the actions. Synchronisation alone cannot solve problems where the type of action matters — such as ensuring mutual exclusion or electing a leader.

16. The Problem of Coordination

The final slide section frames the overarching problem that goes beyond time:

It is not merely a matter of time. Synchronisation is about when things happen. Actions are more than sending messages. Interaction does not merely translate into suitably-ordered distributed actions — undifferentiated actions. Actions have a nature, and meaningful interaction within a distributed system typically depends on such a nature.

Coordination [Omicini et al., 2001] is the problem of governing interaction based on both time and the nature of actions, aimed at the achievement of some global objective for the distributed system. It is the union of:

Key idea

Coordination = synchronisation + action semantics. The “nature” of actions matters: a read and a write to the same variable are not interchangeable, and a coordinator cannot be replaced by just any process. Time gives us ordering; coordination gives us meaningful ordering.

17. Conclusion

This lesson has traced the relationship between computing and time from philosophical foundations to practical distributed algorithms:

TopicKey Takeaway
Time & computationComputing abstracts time away, but CPS require time as a semantic property. The traditional notion of a single universal time is physically inaccurate.
Time in distributed systemsNo global clock, no shared memory, no total ordering. Each machine has its own clock with offset, skew, and drift.
Physical time / clockSynchronisation to UTC via NTP is possible but expensive and not always needed. The Berkeley algorithm provides relative time without UTC.
Logical time / clockLamport clocks capture causal ordering with simple integer counters, sufficient for many distributed algorithms at very low cost.
Toward coordinationEvents have a nature: synchronisation is not enough. Coordination governs interaction based on both time and action semantics.
Looking ahead

The next logical step is vector clocks, which extend Lamport clocks to capture causality exactly (not just partially), and causality-based algorithms for distributed debugging, checkpointing, and consistent snapshots.

Check Your Understanding

Explain why the traditional Turing-Church-von Neumann foundations of computing are insufficient for cyber-physical systems.

Those foundations are about the transformation of data in the abstract; they deliberately abstract away from time and physical dynamics. A CPS merges computation with physical processes, where actions occur in physical time with duration. A program that produces the correct result but too late is incorrect in a CPS. Time must be a first-class semantic property, not a non-functional concern.

What are the four misconceptions about time in computing identified by Lee? For each, explain the counter-argument.

(1) “Computing takes time” → No, time is abstracted away, computing omits it. (2) “Time is a resource” → It is unbounded and expended anyway; it needs to be a semantic property. (3) “Time is a nonfunctional property” → In CPS, time defines the function, not just its performance. (4) “Real time is a QoS problem” → It is about predictability and repeatability, not just efficiency.

Derive the NTP offset formula and explain the symmetric-delay assumption. What happens if the assumption is violated?

Given timestamps T1 (request sent), T2 (request received), T3 (response sent), T4 (response received): if forward and reverse delays are equal (= δ/2 where δ is the round-trip delay), then offset θ = T2 − (T1 + δ/2) = ((T2 − T1) + (T3 − T4))/2. If the assumption is violated (asymmetric delays), the computed offset includes a systematic error up to half the difference between forward and reverse delays.

What is the difference between clock offset, skew, and drift? Give a real-world analogy.

Offset is the instantaneous difference between two clock readings (like the distance between two runners at a specific moment). Skew is the rate of change of offset (the difference in their speeds). Drift is the change in skew over time (their acceleration or deceleration). A quartz clock might have offset of 5 ms (it reads 5 ms ahead), skew of 10 ppm (it gains 10 μs per second), and drift that changes with temperature.

Why does a Lamport logical clock guarantee that if a → b then L(a) < L(b), but not the converse?

The clock condition L(a) < L(b) whenever a → b is guaranteed by the increment-on-event rule combined with the max-plus-one rule on receive. But the converse can fail because concurrent events can be assigned arbitrary logical times that happen to satisfy L(a) < L(b) even though neither happened before the other. The logical clock adds enough ordering information to enforce the forward direction of causality but not enough to identify concurrency precisely.

Give a concrete example where physical clock synchronisation is essential, and one where logical clocks suffice.

Essential: high-frequency trading — regulators need to know the exact UTC time of each trade to detect fraud, and millisecond differences can determine which order was placed first. Logical clocks suffice: the bank database example — what matters is the order of updates (which came first), not their absolute time. A Lamport clock can ensure all replicas apply updates in the same order, even without knowing the actual wall-clock time.

Compare the Berkeley algorithm with NTP. When would you use each?

NTP provides global absolute time synchronised to UTC, using a hierarchical server architecture. Berkeley provides global relative time among a group of machines without requiring any UTC reference. Use NTP when external time correlation matters (logging, legal compliance, coordination with external systems). Use Berkeley when only internal consistency is needed (sensor networks, cluster-internal coordination) or when UTC access is unavailable.

Explain the distinction between synchronisation, mutual exclusion, and coordination.

Synchronisation ensures that events occur in a particular order (ordering messages by logical clock). Mutual exclusion ensures that conflicting actions (e.g., two writes to the same variable) do not overlap. Coordination is the general problem of governing interaction based on both time and the nature of actions, aimed at a global system objective. Mutual exclusion is a form of coordination; synchronisation is a tool for coordination.

How does the concept of physical time after relativity (Rovelli) relate to the practical problems of time in distributed systems?

Relativity tells us there is no universal “now” — different observers in different frames of reference experience time differently. Distributed systems face an analogous problem: processes on different machines have different clocks, no shared “present,” and no way to define simultaneity absolutely. The distributed systems problem is not just an engineering inconvenience; it reflects a fundamental property of the universe we inhabit.

What is the maximum skew rate ρ and why is it significant for clock synchronisation algorithms?

It bounds the clock frequency deviation from perfect: 1−ρ ≤ dC/dt ≤ 1+ρ. It gives a worst-case bound on how fast clocks can drift apart. Synchronisation algorithms use ρ to determine how frequently they must re-synchronise: if the required precision is ε and the worst-case relative skew between two clocks is 2ρ, then re-synchronisation must occur at intervals of at most ε/(2ρ).