Distributed Systems — Prof. Omicini

Why Distributed Systems?

M0 · Introductory ModuleAndrea Omicini — DISI, Univ. BolognaA.Y. 2025/2026

In this lesson

1. Pervasive Computing

Computational systems have become pervasive. They are everywhere — in our pockets, in our homes, in our cars, in hospitals, in airports, in schools, in public spaces — and they tend to affect every aspect of our everyday life and activity.

This ubiquity is not incidental: computational systems are now at the core of most (if not all) artificial systems. Any principled discipline for modelling or engineering computational systems therefore affects the modelling and engineering of almost every sort of artificial system. This is why distributed systems matter — because the world they model and operate in is inherently distributed.

Key idea

Pervasiveness means distribution is no longer optional. Every modern computational system must contend with the fact that its components exist across space and time.

2. Pervasive Computation

We live immersed in a sort of ever-expanding computational bubble. A huge number of computations are performed at every instant around us: at home (smart thermostats, voice assistants), in our cars (engine control units, navigation), in workplaces (servers, workstations), in hospitals (patient monitoring, imaging), in airports and train stations (scheduling, security), in schools and universities (learning platforms, research compute), and in public spaces (traffic lights, surveillance).

These computations are distributed and concurrent. Some are controlled or triggered by human action; others are autonomous, running without direct human intervention. The sheer number of simultaneous computations happening at any instant means that no single locus of control can oversee them all.

Note

The computational bubble grows every year with IoT, edge computing, and mobile devices. Each new device adds more concurrent, distributed computation to the ecosystem.

3. Pervasive Interaction

Almost any computational system today comes equipped with ICT (Information and Communication Technology) technologies for interacting with other computational systems. The result is a dense web of interaction:

This triple-layered interaction — human, machine, environment — is the defining characteristic of modern computing. It is also the source of the hardest problems in distributed systems: how do you coordinate, synchronize, and reach agreement when no participant has a complete view?

Interaction with humans introduces subjective timing, unpredictable input patterns, and semantic ambiguity. A human may pause for seconds or minutes between actions, and the system must remain responsive throughout.

Machine-to-machine interaction is governed by protocols, but networks introduce delays, reorder messages, and can drop packets. Two machines may have radically different views of the same interaction.

Sensors and actuators bridge the digital and physical worlds. Physical processes have their own timing (a temperature doesn't change instantly), and the environment is inherently unpredictable — noisy, lossy, and unbounded in its behaviour.

4. Physical vs. Computational

The physical nature of artificial systems adds complexity to computational components and systems. This complexity manifests in two fundamental ways:

Distribution

Physical components are spread across space and operate across time. A sensor on a bridge, a server in a data centre, and a phone in a user's pocket are spatially distributed. Their events are temporally distributed — they happen at different moments, with no single clock governing them all.

Unpredictability of the environment

Unlike a controlled laboratory setting, the real world is noisy, unbounded, and unreliable. A distributed system must function correctly despite network partitions, hardware failures, variable latencies, and malicious actors. The environment cannot be assumed friendly or predictable.

Challenge

The same physical distribution that gives a system its power (reach, scale, resilience) also introduces complexity that a centralized system never faces. Every advantage has a corresponding difficulty.

5. Spatial Distribution

What exactly is spatially distributed when we talk about distributed systems? The answer covers four dimensions:

DimensionDescription
1. Computational unitsProcessors, cores, nodes, and machines that execute code are located in different physical (or virtual) locations
2. Communication channelsThe links, networks, and buses that connect units have physical extent — distance means latency
3. Data / information / knowledgeData is stored across multiple nodes, often replicated or partitioned. Its representations may differ per node
4. Sensors, actuators, and the system boundaryThe boundary between the system and its environment is spatially sparse — sensing and actuation points are scattered

When every dimension of a system is spread across physical space, the notion of a single "system location" loses meaning. There is no here.

6. Temporal Distribution

What is temporally distributed? Events.

In a centralized system, events happen in a clear, well-defined sequence. The single processor, the single clock, the single memory — they impose a total order: for any two events A and B, either A happened before B or B happened before A (or they are simultaneous by the same clock).

In a distributed system, things happen, yet no longer in a clearly ordered sequence. Events are scattered across nodes, and the trivial before/after temporal relation simply cannot be applied to many pairs of events. Two events on different nodes may be concurrent — neither caused the other, and their order is not meaningful.

This loss of total order is the defining technical challenge of distributed systems. It forces us to talk about partial orders, logical clocks, and causality instead of simple timestamps.

The examiner will ask

Why can't we use physical clocks to order events in a distributed system? Because clocks drift, synchronisation is imperfect, and network delays mean you can never know the "true" order of events on different nodes. This is the motivation for logical clocks (Lamport, vector) that we will study later.

7. Broken Assumptions

When the spatio-temporal unity of a system is lost, a number of assumptions that held for centralized systems no longer apply:

  1. System events no longer constitute a totally ordered set. In general, only a partial order is meaningful. Two events on different nodes may be incomparable (concurrent).
  2. Admissible interactions no longer depend on compresence. Components do not need to be in the same place, at the same time, or even on the same network topology to interact. They communicate across space and time through messages, queues, and storage.

These broken assumptions cascade into every aspect of system design: agreement, consistency, replication, fault tolerance, security, and more. You cannot design a distributed system the way you design a single-threaded program.

8. Centralized vs. Distributed

Distributed systems and centralized systems represent fundamentally different design points. The trade-offs between them are summarised in the classic comparison from [Puder et al., 2005]:

CriteriaCentralized SystemDistributed System
EconomicsLow costHigh cost
AvailabilityLowHigh
ComplexityLowHigh
ConsistencySimpleDifficult
ScalabilityPoorGood
TechnologyHomogeneousHeterogeneous
SecurityHighLow

There is no free lunch: every advantage of a distributed system (availability, scalability) comes at the cost of something else (complexity, consistency, security). Understanding when to centralise and when to distribute is a core engineering judgment.

Editor’s note

The tension between consistency and availability in distributed systems is famously captured by the CAP theorem (Brewer, 2000): in a partition-tolerant system, you must choose between consistency and availability. This trade-off is a direct consequence of the distributed nature of the system.

9. Why Distributed Systems

Despite the costs, we need distributed systems for four fundamental reasons [Ghosh, 2014]:

  1. Geographically distributed environments
    Data and users are spread across the planet. A centralized system would incur unacceptable latency and become a single point of failure. Distributed systems bring computation to the data.
  2. Computation speedup
    By harnessing many nodes in parallel, distributed systems can solve problems faster than any single machine. This includes everything from scientific computing to rendering, analytics, and machine learning.
  3. Resource sharing
    Storage, compute, data, and specialised hardware (GPUs, TPUs, sensors) can be shared across a network. This reduces cost and enables collaboration.
  4. Fault tolerance
    Redundancy across independent nodes means the system can survive partial failures. If one node crashes, another takes over. Centralized systems have a single point of failure.
Key insight

These four drivers are not isolated. A system built for geographical distribution often gets fault tolerance as a side effect. A system built for speedup often forces resource sharing. The reasons reinforce each other.

10. Artificial Systems Are Distributed

Conceiving and constructing artificial systems nowadays means dealing with distributed systems whose core is represented by (distributed) computational systems which are to be modelled and built. This is not a niche specialisation — it is the default condition of modern engineering.

Consider any large-scale artificial system: a smart grid, an e-commerce platform, a social network, a fleet of autonomous vehicles, a hospital management system. All of them are distributed. None of them can be understood or built using only the tools of centralised computing.

11. New Theoretical Problems

Modelling distributed (artificial) systems involves new theoretical problems that simply do not arise in centralised systems:

These questions require new theoretical frameworks, models, abstractions, and techniques — mostly computational ones. This is one of the main objects of study of computer science.

12. New Practical Problems

Building distributed (artificial) systems involves new practical problems:

These problems require new technologies, infrastructures, methods, and methodologies — mostly computational ones. This is one of the main objects of study of computer engineering.

13. Computer Science Meets Engineering

Distributed systems sit at the intersection of computer science (theoretical foundations: models, algorithms, impossibility results) and computer engineering (practical construction: protocols, frameworks, operational tools). As a result, this course will mix theoretical and methodological issues with technological and practical ones, starting from the very beginning.

This duality is not a weakness — it is the essence of the field. A distributed system without theoretical grounding is fragile; a theory without practical implementation is sterile. The course aims to give you both.

The examiner will ask

Give an example of a theoretical problem in distributed systems and the corresponding practical challenge it creates. For instance: the impossibility of total event ordering (theoretical) forces us to build eventual consistency mechanisms (practical).

Check Your Understanding

Explain what "pervasiveness" means in the context of distributed systems and why it implies distribution.

Pervasiveness means that computational systems are everywhere and affect every aspect of modern life. Since these systems are physically spread across different locations (homes, cars, hospitals, public spaces), they are inherently distributed. No single machine or location can contain all the computation happening around us at any instant.

What are the three types of interaction that computational devices engage in continuously?

1. Interaction with humans (via screens, voice, touch). 2. Interaction with other computational systems (via networks, protocols, APIs). 3. Interaction with the physical environment and its resources (via sensors, actuators, cameras).

Describe the difference between spatial and temporal distribution. Give an example of each.

Spatial distribution refers to computational units, communication channels, data, and sensors being located across different physical locations. Example: sensors on a bridge sending data to a server in another city. Temporal distribution refers to events happening at different times with no shared clock to order them. Example: two sensors on different continents detect an event — we cannot determine which happened first without additional synchronisation.

Why does the trivial "before/after" relation not apply to many pairs of events in a distributed system?

Because there is no global clock. Two events on different nodes may be concurrent: neither A caused B nor B caused A. Their ordering is arbitrary without additional coordination mechanisms like logical clocks. This is fundamentally different from a single-threaded program where all events are totally ordered by the processor clock.

List the four criteria where distributed systems outperform centralized systems, and the three where centralized systems have the advantage.

Distributed systems are better at: availability (redundancy), scalability (adding nodes), economics (ironically: high upfront cost but better long-term TCO at scale through commodity hardware). Wait — in the table, centralized systems have low economics cost as an advantage. Distributed advantages: availability (high), scalability (good), economics (wait — let me re-check). According to [Puder et al., 2005]: Distributed advantages are availability (high), scalability (good), and also economics is marked "high" for distributed but that means cost is high (its advantage is debatable). Centralized advantages: economics (low cost), complexity (low), consistency (simple), technology (homogeneous), security (high). Distributed advantages: availability (high), scalability (good).

Explain the four fundamental reasons we need distributed systems according to [Ghosh, 2014].

1. Geographically distributed environments: users and data are spread globally, so computation must be too. 2. Computation speedup: parallel execution across many nodes delivers performance beyond a single machine. 3. Resource sharing: hardware, software, and data can be pooled and accessed remotely. 4. Fault tolerance: redundancy across independent nodes prevents single points of failure.

How do the theoretical problems of distributed systems differ from the practical ones? Give one example of each.

Theoretical problems concern models, abstractions, and proofs: e.g., how to define causality without a global clock (leads to logical clock theory). Practical problems concern construction and operation: e.g., how to deploy updates across hundreds of nodes without downtime (leads to rolling update strategies and orchestration tools). Both are needed; theory without practice is sterile, practice without theory is fragile.

What does it mean that "the spatio-temporal unity of systems is lost"? Why is this significant?

It means there is no longer a single system location nor a single system time. System components are only partially correlated, both temporally and spatially. This is significant because it invalidates the core assumptions of centralised computing: total event ordering, compresence-based interaction, global state visibility, and predictable timing. Every aspect of system design must be rethought.

Explain why security is considered "high" for centralized systems but "low" for distributed systems in the comparison table.

A centralized system has a single, well-defined attack surface — one machine, one location, one network entry point. Defending a single point is simpler. A distributed system has many nodes, many communication channels, many entry points — the attack surface is multiplied. Additionally, data in transit across a network is exposed to interception and tampering, and nodes may be compromised without immediate detection.

What does it mean to say that distributed systems sit at the intersection of computer science and computer engineering?

Computer science contributes the theoretical foundations: models of computation, algorithms, impossibility results (FLP, CAP), correctness proofs, and complexity analysis. Computer engineering contributes the practical tools: protocols, middleware, frameworks, operational practices, and infrastructure. A successful distributed system requires both rigorous theory and robust engineering.