Computational systems have become pervasive. They are everywhere — in our pockets, in our homes, in our cars, in hospitals, in airports, in schools, in public spaces — and they tend to affect every aspect of our everyday life and activity.
This ubiquity is not incidental: computational systems are now at the core of most (if not all) artificial systems. Any principled discipline for modelling or engineering computational systems therefore affects the modelling and engineering of almost every sort of artificial system. This is why distributed systems matter — because the world they model and operate in is inherently distributed.
Pervasiveness means distribution is no longer optional. Every modern computational system must contend with the fact that its components exist across space and time.
We live immersed in a sort of ever-expanding computational bubble. A huge number of computations are performed at every instant around us: at home (smart thermostats, voice assistants), in our cars (engine control units, navigation), in workplaces (servers, workstations), in hospitals (patient monitoring, imaging), in airports and train stations (scheduling, security), in schools and universities (learning platforms, research compute), and in public spaces (traffic lights, surveillance).
These computations are distributed and concurrent. Some are controlled or triggered by human action; others are autonomous, running without direct human intervention. The sheer number of simultaneous computations happening at any instant means that no single locus of control can oversee them all.
The computational bubble grows every year with IoT, edge computing, and mobile devices. Each new device adds more concurrent, distributed computation to the ecosystem.
Almost any computational system today comes equipped with ICT (Information and Communication Technology) technologies for interacting with other computational systems. The result is a dense web of interaction:
This triple-layered interaction — human, machine, environment — is the defining characteristic of modern computing. It is also the source of the hardest problems in distributed systems: how do you coordinate, synchronize, and reach agreement when no participant has a complete view?
Interaction with humans introduces subjective timing, unpredictable input patterns, and semantic ambiguity. A human may pause for seconds or minutes between actions, and the system must remain responsive throughout.
Machine-to-machine interaction is governed by protocols, but networks introduce delays, reorder messages, and can drop packets. Two machines may have radically different views of the same interaction.
Sensors and actuators bridge the digital and physical worlds. Physical processes have their own timing (a temperature doesn't change instantly), and the environment is inherently unpredictable — noisy, lossy, and unbounded in its behaviour.
The physical nature of artificial systems adds complexity to computational components and systems. This complexity manifests in two fundamental ways:
Physical components are spread across space and operate across time. A sensor on a bridge, a server in a data centre, and a phone in a user's pocket are spatially distributed. Their events are temporally distributed — they happen at different moments, with no single clock governing them all.
Unlike a controlled laboratory setting, the real world is noisy, unbounded, and unreliable. A distributed system must function correctly despite network partitions, hardware failures, variable latencies, and malicious actors. The environment cannot be assumed friendly or predictable.
The same physical distribution that gives a system its power (reach, scale, resilience) also introduces complexity that a centralized system never faces. Every advantage has a corresponding difficulty.
What exactly is spatially distributed when we talk about distributed systems? The answer covers four dimensions:
| Dimension | Description |
|---|---|
| 1. Computational units | Processors, cores, nodes, and machines that execute code are located in different physical (or virtual) locations |
| 2. Communication channels | The links, networks, and buses that connect units have physical extent — distance means latency |
| 3. Data / information / knowledge | Data is stored across multiple nodes, often replicated or partitioned. Its representations may differ per node |
| 4. Sensors, actuators, and the system boundary | The boundary between the system and its environment is spatially sparse — sensing and actuation points are scattered |
When every dimension of a system is spread across physical space, the notion of a single "system location" loses meaning. There is no here.
What is temporally distributed? Events.
In a centralized system, events happen in a clear, well-defined sequence. The single processor, the single clock, the single memory — they impose a total order: for any two events A and B, either A happened before B or B happened before A (or they are simultaneous by the same clock).
In a distributed system, things happen, yet no longer in a clearly ordered sequence. Events are scattered across nodes, and the trivial before/after temporal relation simply cannot be applied to many pairs of events. Two events on different nodes may be concurrent — neither caused the other, and their order is not meaningful.
This loss of total order is the defining technical challenge of distributed systems. It forces us to talk about partial orders, logical clocks, and causality instead of simple timestamps.
Why can't we use physical clocks to order events in a distributed system? Because clocks drift, synchronisation is imperfect, and network delays mean you can never know the "true" order of events on different nodes. This is the motivation for logical clocks (Lamport, vector) that we will study later.
When the spatio-temporal unity of a system is lost, a number of assumptions that held for centralized systems no longer apply:
These broken assumptions cascade into every aspect of system design: agreement, consistency, replication, fault tolerance, security, and more. You cannot design a distributed system the way you design a single-threaded program.
Distributed systems and centralized systems represent fundamentally different design points. The trade-offs between them are summarised in the classic comparison from [Puder et al., 2005]:
| Criteria | Centralized System | Distributed System |
|---|---|---|
| Economics | Low cost | High cost |
| Availability | Low | High |
| Complexity | Low | High |
| Consistency | Simple | Difficult |
| Scalability | Poor | Good |
| Technology | Homogeneous | Heterogeneous |
| Security | High | Low |
There is no free lunch: every advantage of a distributed system (availability, scalability) comes at the cost of something else (complexity, consistency, security). Understanding when to centralise and when to distribute is a core engineering judgment.
The tension between consistency and availability in distributed systems is famously captured by the CAP theorem (Brewer, 2000): in a partition-tolerant system, you must choose between consistency and availability. This trade-off is a direct consequence of the distributed nature of the system.
Despite the costs, we need distributed systems for four fundamental reasons [Ghosh, 2014]:
These four drivers are not isolated. A system built for geographical distribution often gets fault tolerance as a side effect. A system built for speedup often forces resource sharing. The reasons reinforce each other.
Conceiving and constructing artificial systems nowadays means dealing with distributed systems whose core is represented by (distributed) computational systems which are to be modelled and built. This is not a niche specialisation — it is the default condition of modern engineering.
Consider any large-scale artificial system: a smart grid, an e-commerce platform, a social network, a fleet of autonomous vehicles, a hospital management system. All of them are distributed. None of them can be understood or built using only the tools of centralised computing.
Modelling distributed (artificial) systems involves new theoretical problems that simply do not arise in centralised systems:
These questions require new theoretical frameworks, models, abstractions, and techniques — mostly computational ones. This is one of the main objects of study of computer science.
Building distributed (artificial) systems involves new practical problems:
These problems require new technologies, infrastructures, methods, and methodologies — mostly computational ones. This is one of the main objects of study of computer engineering.
Distributed systems sit at the intersection of computer science (theoretical foundations: models, algorithms, impossibility results) and computer engineering (practical construction: protocols, frameworks, operational tools). As a result, this course will mix theoretical and methodological issues with technological and practical ones, starting from the very beginning.
This duality is not a weakness — it is the essence of the field. A distributed system without theoretical grounding is fragile; a theory without practical implementation is sterile. The course aims to give you both.
Give an example of a theoretical problem in distributed systems and the corresponding practical challenge it creates. For instance: the impossibility of total event ordering (theoretical) forces us to build eventual consistency mechanisms (practical).
Pervasiveness means that computational systems are everywhere and affect every aspect of modern life. Since these systems are physically spread across different locations (homes, cars, hospitals, public spaces), they are inherently distributed. No single machine or location can contain all the computation happening around us at any instant.
1. Interaction with humans (via screens, voice, touch). 2. Interaction with other computational systems (via networks, protocols, APIs). 3. Interaction with the physical environment and its resources (via sensors, actuators, cameras).
Spatial distribution refers to computational units, communication channels, data, and sensors being located across different physical locations. Example: sensors on a bridge sending data to a server in another city. Temporal distribution refers to events happening at different times with no shared clock to order them. Example: two sensors on different continents detect an event — we cannot determine which happened first without additional synchronisation.
Because there is no global clock. Two events on different nodes may be concurrent: neither A caused B nor B caused A. Their ordering is arbitrary without additional coordination mechanisms like logical clocks. This is fundamentally different from a single-threaded program where all events are totally ordered by the processor clock.
Distributed systems are better at: availability (redundancy), scalability (adding nodes), economics (ironically: high upfront cost but better long-term TCO at scale through commodity hardware). Wait — in the table, centralized systems have low economics cost as an advantage. Distributed advantages: availability (high), scalability (good), economics (wait — let me re-check). According to [Puder et al., 2005]: Distributed advantages are availability (high), scalability (good), and also economics is marked "high" for distributed but that means cost is high (its advantage is debatable). Centralized advantages: economics (low cost), complexity (low), consistency (simple), technology (homogeneous), security (high). Distributed advantages: availability (high), scalability (good).
1. Geographically distributed environments: users and data are spread globally, so computation must be too. 2. Computation speedup: parallel execution across many nodes delivers performance beyond a single machine. 3. Resource sharing: hardware, software, and data can be pooled and accessed remotely. 4. Fault tolerance: redundancy across independent nodes prevents single points of failure.
Theoretical problems concern models, abstractions, and proofs: e.g., how to define causality without a global clock (leads to logical clock theory). Practical problems concern construction and operation: e.g., how to deploy updates across hundreds of nodes without downtime (leads to rolling update strategies and orchestration tools). Both are needed; theory without practice is sterile, practice without theory is fragile.
It means there is no longer a single system location nor a single system time. System components are only partially correlated, both temporally and spatially. This is significant because it invalidates the core assumptions of centralised computing: total event ordering, compresence-based interaction, global state visibility, and predictable timing. Every aspect of system design must be rethought.
A centralized system has a single, well-defined attack surface — one machine, one location, one network entry point. Defending a single point is simpler. A distributed system has many nodes, many communication channels, many entry points — the attack surface is multiplied. Additionally, data in transit across a network is exposed to interception and tampering, and nodes may be compromised without immediate detection.
Computer science contributes the theoretical foundations: models of computation, algorithms, impossibility results (FLP, CAP), correctness proofs, and complexity analysis. Computer engineering contributes the practical tools: protocols, middleware, frameworks, operational practices, and infrastructure. A successful distributed system requires both rigorous theory and robust engineering.