Before we can prove anything about distributed systems — impossibility results, consensus bounds, consistency trade-offs — we need a shared ontology: what is a computation? what is a machine? what is a system? This lesson builds the conceptual foundation from the ground up. We ask what computer science itself is, what computation means across different models, and arrive at a precise definition of parallel, concurrent, and distributed systems based on the temporal and spatial context of their processes.
The study of distributed systems is built upon impossibility results — theorems that tell us what cannot be done, no matter how clever our engineering. Brewer's CAP theorem, the FLP impossibility, the consensus hierarchy — these are the landmarks that define the field.
Impossibility results are theorems. They are essential because they delimit the space of admissible moves. But a theorem requires a well-defined ontology: a shared, unambiguous notation, a common definition of what a proof is, and a precise understanding of the objects the theorem talks about.
Do we have that in computer science? Brewer stated his CAP theorem in 2000 without a formal proof; a first proof came two years later from Gilbert and Lynch, under specific assumptions. Surveying the literature of impossibility results for distributed computing, we find heterogeneous notations, diverse ontologies, and different notions of proof — nothing like the uniform foundation of modern mathematics.
Why do impossibility results matter more to a field than possibility results? How do they define the boundaries of engineering?
This lesson therefore takes a step back. Before we can study distributed computation, we need to understand computation itself. And before that, we need to ask: what kind of science is computer science? What is its object of study? Only with a clear ontology can we give crisp definitions of parallel, concurrent, and distributed systems.
As computer scientists and engineers, we are supposed to ground our technical knowledge upon solid foundations. Yet the basic questions — what is science? what is a machine? what is a computer? what is a system? what is computation? — require deep thinking.
The Science Council defines science as "the pursuit and application of knowledge and understanding of the natural and social world following a systematic methodology based on evidence." This is a good starting point, but for our purposes we need a sharper philosophical distinction.
From Kant's philosophy: a phenomenon is any object, fact, or occurrence perceived or observed — the thing as it appears to an observer. A noumenon is the thing-in-itself, as opposed to the phenomenon.
Science works in both directions:
There may exist many different models of a given set of facts. They are not equivalent. We typically prefer models with fewer assumptions and more facts explained — more expressive scientific models.
| Concept | Definition | Role in science |
|---|---|---|
| Phenomenon | An observed fact or occurrence | The raw data science explains |
| Noumenon | The thing-in-itself, the underlying model | The explanatory theory |
| Explanation | Connecting phenomena to noumena | Making sense of observations |
| Prediction | Using noumena to anticipate phenomena | Validating the model |
A working scientific model does not just explain observations — it also tells us what happens next. It predicts not-yet-observed phenomena.
Classic examples:
The predictive power of a scientific theory is an essential part of what makes it a scientific one. A model that only post-hoc explains but never predicts is closer to a narrative than a science.
For computer science, this raises a fundamental challenge. Do our models of computation predict the behaviour of distributed systems? Can we explain phenomena like the CAP theorem's trade-off between consistency and availability as a necessary consequence of a deeper model — much like Einstein explained gravity as a consequence of spacetime curvature?
This is what a mature science of distributed systems should aim for.
The classic answer comes from Newell, Perlis, and Simon (1967):
"There are computers. Ergo, computer science is the study of computers." The phenomena surrounding computers are varied, complex, rich. Wherever there are phenomena, there can be a science to describe and explain them.
But what is a "computer"? The term is not well-defined, and its meaning changes with new developments. A science may bear a shifting object of study — astronomy did not originally include interstellar gases, physics did not include radioactivity, psychology did not include animal behaviour.
Whatever a computer is, what it does is compute. A computer is a machine that computes. Computer systems (or computational systems) are systems made of computers. Computers and computational systems produce the evidence, the facts, the phenomena studied by computer science.
What is the noumenon at the core of computer science? (Answer: computation is the core object of study — the essence of what computation is represents the noumenon at the heart of the discipline.)
The definitions of computation and computer science go hand in hand. Over time, the definition of computer science has been a moving target, reflecting increasingly sophisticated understandings of computation.
Denning (2008, 2010) traces the progression of definitions for computer science across decades:
This widening of scope is crucial for distributed systems. If computation can be natural, it can occur across space and time without a single clock or shared memory — precisely the conditions of distributed computing.
The question "What is computation?" has been called "considered harmful" (Freeman, 2011) — a rigid standard adopted too early impedes progress. Yet it is the basic question around which all computer science revolves, and any well-founded study of distributed systems must be grounded on it.
Conery (2010) defines computation as symbol manipulation: a computation is a sequence of simple, well-defined steps that lead to the solution of a problem. The problem and its solution must be encoded in symbols; each step transforms one set of symbols into a new set.
Frailey (2010) argues that computation can be found in any form of process. Every process is also a computation. A process is defined (Oxford) as "a series of actions or steps taken in order to achieve a particular end" or "a natural series of changes." In operating systems, it is "an instance of a program being executed in a multitasking operating system."
Defining computation as process liberates it from the machine. A computation is not what a computer does — it is what a process is. This abstraction is essential for distributed systems, where processes span multiple machines, networks, and time zones.
| Claim | Implication |
|---|---|
| The computational model matters | Different models capture different phenomena |
| Many important computations are natural | Computation is not limited to electronic devices |
| Many important computations are non-terminating | Servers, operating systems, protocols run indefinitely |
| Many important computations are continuous | Not all computation is discrete step-by-step |
| Computational thinking can be defined | The intellectual framework is teachable and transferable |
Two canonical models dominate our understanding:
These models are artificial — do they work when we include natural computations? They are discrete — do they work when we include continuous computations? They represent a single computing device — do they work when we deal with computational systems (multiple devices)?
Britannica defines a machine as a device, having a unique purpose, that augments or replaces human or animal effort for the accomplishment of physical tasks. All machines have:
A computing machine is a different sort of machine: its task is cognitive instead of physical; its input and output are basically information in some form. If we choose not to stick with one specific computational model, it is appropriate to abstract away from the machinery and focus instead on the computational process.
We now build the basic ontology for distributed systems. Our starting assumption: the elementary computational process is sequential. It represents the phenomenal expression of the dynamics of a computing machine, and has both input/output and context.
A computing machine is the place where a computational process occurs. Visually, we represent a sequential computation as happening inside a closed shape:
flowchart LR
subgraph Process["Computational Process"]
direction LR
P["• Sequential steps
• Symbol manipulation
• State transitions"]
end
The process receives information from outside (input) and sends information to the outside (output). This is the basic I/O model:
flowchart LR Input["Input"] --> Process["Process"] Process --> Output["Output"]
The process operates within a context that provides the environment for its execution:
flowchart LR
subgraph Context["Context"]
direction LR
C1["Resources, Time, Space, Machine"]
end
subgraph Process["Computational Process"]
P["Sequential computation"]
end
Context -.-> Process
Why is the assumption of sequential computation the starting point for our ontology? What happens when we relax it?
What is context when computation is concerned? Four dimensions matter:
| Dimension | Role | Example |
|---|---|---|
| Computing machine | The device that executes the process | CPU, virtual machine, container |
| Resources | Memory, bandwidth, energy, files | RAM, disk space, network throughput |
| Time | When computation occurs; ordering of events | Clock rate, scheduling, synchrony |
| Space | Where computation occurs; physical location | Data center, network node, core |
We need to represent context whenever the machine, resources, time, space — or all of them — are relevant to model, represent, or understand computation. In other words, whenever understanding the dynamics of the computational process depends on the environment it lives in.
For most of computing history, we have abstracted away from context. We assumed a single machine with abundant resources, a single clock, and no need to know where computation happens. Distributed computing is precisely the subfield where context cannot be abstracted away.
Depending on which contextual dimension is essential, we classify computations as:
| Type | Essential dimension | Definition |
|---|---|---|
| Timed computation | Time | The time of the computing machine is relevant or essential for the computing process |
| Spatial computation | Space | The spatial features of the computing machine are relevant or essential for the computing process |
| Situated computation (Suchman, 1987) | Environment | The environment of the computing machine is relevant or essential — any meaningful combination of temporal and spatial features with the resources required by the computation |
The notion of situated computation originates in human-computer interaction (Suchman's "Plans and Situated Actions"), but is profoundly relevant here. A distributed computation is the quintessential situated computation: its meaning and correctness depend on where and when each part executes.
Graphically, understanding a computational process requires the precise definition of its computational context. The process (blue area) depends on how we define the features of the context (grey area). This dependency is what we explore next.
A system is defined as a group of things, pieces of equipment, etc., that are connected or work together. A computational system (Goldin, Smolka, Wegner, 2006) is a system where two or more computational processes behave (by computing) and work together (by interacting).
flowchart LR
subgraph P1["Process 1"]
A["Sequential computation"]
end
subgraph P2["Process 2"]
B["Sequential computation"]
end
subgraph P3["Process 3"]
C["Sequential computation"]
end
P1 <--> P2
P2 <--> P3
P1 <--> P3
Interaction between processes is the key new phenomenon that emerges when we go from a single process to a computational system. Processes communicate, synchronize, compete for resources, and coordinate their behaviour.
When we have multiple processes, how should we represent the context? This is the critical design question that will lead us to the definitions of parallel, concurrent, and distributed systems.
Three possible approaches exist:
Each computational process has its own independent context. This means each process has its own machine, its own resources, its own sense of time, its own spatial location. This is the most general model — it encompasses all distributed systems where processes are truly independent.
flowchart LR
subgraph C1["Context 1 (T,S,R)"]
P1["Process 1"]
end
subgraph C2["Context 2 (T',S',R')"]
P2["Process 2"]
end
P1 <--> P2
A single shared context for the entire computational system. All processes share the same machine, the same clock, the same spatial location, the same resources. This is the model of a single computer running multiple threads — parallelism.
flowchart LR
subgraph C["Shared Context (T,S,R)"]
P1["Process 1"]
P2["Process 2"]
P3["Process 3"]
end
A hybrid model where each process has its own individual context, but there is also an overarching context shared by all processes. This captures systems where processes are on different machines (separate spatial contexts) but share a common time reference (e.g., synchronized clocks) or common resources.
flowchart LR
subgraph Shared["Shared Context (e.g., global time)"]
subgraph C1["Context 1 (S, R)"]
P1["Process 1"]
end
subgraph C2["Context 2 (S', R')"]
P2["Process 2"]
end
end
P1 <--> P2
The choice of the sort of context defines the sort of computational system. Temporal context determines parallelism vs. concurrency. Spatial context determines distribution.
In the literature, the terms parallel and concurrent are often used interchangeably, but they represent fundamentally different relationships between events.
Typically refers to non-sequential computing where more than one computation can be performed at the same time (Shonkwiler & Lefton, 2006). Requires multi-core architectures. Used for computationally-intensive scientific/mathematical problems.
Used with a twofold acceptation (Degano & Montanari, 1987):
| System type | Event ordering | Temporal context |
|---|---|---|
| Parallel | Totally ordered | Same for all processes |
| Concurrent | At most partially ordered | Different across processes |
In a parallel system, we know exactly when every instruction happens relative to every other instruction (total order). In a concurrent system, events from different processes have no guaranteed temporal order — we only know causal relationships (partial order). This difference is not a limitation of our measurement tools; it is a fundamental ontological property of the system.
In the literature, distributed computing refers to a number of asynchronous computational processes located on different devices and communicating via message passing (no shared memory) (Kshemkalyani & Singhal, 2011). It is an activity performed on a spatially distributed system (Lamport & Lynch, 1990).
A distributed system is a collection of independent computers that appears to its users as a single coherent system (Tanenbaum & van Steen, 2017).
Distribution is about space. The physical distribution of computational processes and computing devices is the main point:
Spatial context defines both.
We can now give precise definitions based on the context model:
These definitions give us a unified framework. Instead of memorizing different textbook characterizations, we derive the type of a system from the context structure of its processes. A parallel system is one where all processes share a temporal context; a concurrent system is one where at least two have different temporal contexts; a distributed system is one where at least two have different spatial contexts.
Explore the context combinations interactively below. Click on any cell to see the system type that results from that configuration of temporal and spatial contexts across processes.
The temporal and spatial dimensions are independent. A system can be distributed and parallel, or distributed and concurrent:
Processes have different spatial contexts (S ≠ S') but the same temporal context (T = T'). This applies to tightly-coupled cluster computing where nodes are geographically separate but share a synchronized clock (e.g., via NTP or hardware clock sync).
flowchart LR
subgraph T["Shared Time (T)"]
subgraph S1["Space S"]
P1["Process 1"]
end
subgraph S2["Space S'"]
P2["Process 2"]
end
end
P1 <-->|"Message passing"| P2
Processes have different spatial contexts (S ≠ S') and different temporal contexts (T ≠ T'). This is the most general and most common case for modern distributed systems — nodes in different data centres, with independent clocks, communicating asynchronously.
flowchart LR
subgraph C1["Context 1 (T,S)"]
P1["Process 1"]
end
subgraph C2["Context 2 (T',S')"]
P2["Process 2"]
end
P1 <-->|"Asynchronous messaging"| P2
Is a distributed system necessarily concurrent? Or can a distributed system be parallel? Explain using the temporal and spatial context framework.
This lesson established the foundational ontology for distributed systems:
With this foundation in place, we can proceed to study the fundamental results of distributed computing — impossibility results, consensus algorithms, consistency models, and the like — with a shared, precise vocabulary.
The next lessons will build on this ontology to explore distributed algorithms, logical clocks, consensus protocols, replication strategies, and the CAP theorem. Each of these topics depends on the precise definitions we have established here.
In Kant's philosophy, phenomena are observed facts or occurrences — the things as they appear to an observer. Noumena are the things-in-themselves, the underlying models that explain phenomena. In computer science, the phenomena are the observed behaviour of computers and computational systems: program outputs, network traffic, system crashes, performance metrics. The noumenon at the core of computer science is computation itself — the abstract essence of what it means to compute, which gives rise to all the observable phenomena. A mature science of distributed systems seeks to explain distributed phenomena (like consensus failures or consistency violations) in terms of fundamental models of distributed computation.
We start from the basic ontology: a computational process is sequential and operates within a context defined by machine, resources, time, and space. For a computational system of multiple processes, we must decide how to represent the context for each process. If we give each process its own spatial context (different location), and those processes interact via message passing (no shared memory), then the system is distributed. Spatial context is the defining dimension because the separation in space forces processes to communicate asynchronously through messages rather than through shared state — and this has profound consequences for consistency, availability, and fault tolerance.
In a parallel system, all processes share the same temporal context (same clock). Events across processes are totally ordered — we know exactly what happens when. In a concurrent system, at least two processes have different temporal contexts. Events across those processes are at most partially ordered — we cannot determine the relative timing of events from different contexts. This indeterminacy is not a measurement problem; it is an ontological property. Parallel computing is about doing multiple things simultaneously; concurrent computing is about dealing with multiple things whose relative timing is unknown. A parallel system is always concurrent in the interleaving sense, but a concurrent system is not necessarily parallel — the processes might execute on a single core with time-slicing.
An impossibility result is a theorem about what a system cannot do. For it to be provable, we need precise definitions of: what a system is (components, interactions), what the system model includes (synchrony assumptions, failure models, communication primitives), and what the claimed property means (consistency, availability, partition tolerance). Without a shared ontology, different researchers might mean different things by the same terms, and proofs cannot be compared or verified. The history of Brewer's CAP theorem illustrates this: it was originally stated without formal proof, and the first proof (Gilbert & Lynch, 2002) had to first define the system model precisely before proving the trade-off. A mature field requires the kind of ontological clarity that, say, modern mathematics has achieved with set theory as its common foundation.
A web server is a classic example of a non-terminating computation: it listens for requests indefinitely, processing each one in a loop. Turing machines, by contrast, are defined to compute a function and halt — the computation ends when the machine reaches a halting state. A web server never halts by design. This means the Turing model, while fundamental for computability theory, is insufficient for modeling reactive, interactive, or distributed systems that maintain ongoing behaviour. Models like I/O automata, process algebras (CSP, CCS, pi-calculus), and state machines with liveness conditions are needed to capture non-terminating computations. This connects to Denning's (2011) observation that "many important computations are non-terminating."
We have three choices for representing context in a multi-process computational system: (1) A different context for each process — this gives the most general model where each process has its own machine, time, space, and resources. This encompasses distributed concurrent systems. (2) A single context for all processes — all processes share the same machine, clock, and location. This is the model of a parallel system on a single computer. (3) Individual contexts plus one shared context — processes have their own spatial/resource contexts but share a temporal context (e.g., synchronized clocks). This yields distributed parallel systems. The granularity and structure of context directly determines the system classification. This framework shows that parallel, concurrent, and distributed are not arbitrary categories but emerge from a systematic analysis of context.