DS-M5

1. Prologue: Three Generations

Distributed systems are not a new thing. Decades ago, distributed computing was already being studied before we even fully understood what computation itself is. Today, distributed systems are everywhere: their sorts are so diverse in nature, structure, organisation, and dynamics that classifying them all is neither an easy task nor a particularly useful one.

Nevertheless, over the first decades of their existence, changing needs and available resources promoted an evolution of distributed systems that generated, over time, three recognisable fundamental classes. Each class emerged from a different combination of technological constraints, application requirements, and user expectations. Understanding these classes is essential because they provide a vocabulary for talking about what a distributed system does and how it does it.

Key idea

The three classes are not strict categories with rigid boundaries. They represent evolutionary phases. Modern distributed systems often borrow features from all three classes simultaneously.

Class	Primary Concern	Era	Example
Distributed Computing Systems	High-performance computation via aggregated resources	1990s	Beowulf clusters, computational grids
Distributed Information Systems	Integration of separate, heterogeneous applications	2000s	TP monitors, EAI middleware
Distributed Pervasive Systems	Systems embedded in unstable, mobile environments	2010s	Sensor networks, health care systems

3. Distributed Computing Systems: Clusters

The defining characteristic of distributed computing systems is that they use a multiplicity of distributed computers to perform high-performance tasks. The goal is aggregate power: combining many ordinary machines to do what a single supercomputer would do at a fraction of the cost.

Cluster Computing Systems

Cluster computing is built around a simple idea: take a collection of similar workstations or PCs, running the same OS, located in the same area, and interconnect them through a high-speed LAN. The result is a single, unified computing resource.

Key idea

The motivation for cluster computing is economic: the ever-increasing price/performance ratio of commodity computers makes it cheaper to build a supercomputer from many simple computers than to buy a dedicated high-performance machine. Additionally, clusters offer better robustness, easier maintenance, and incremental addition of computing power.

The primary usage of cluster computers is parallel computing: a single computationally-intensive program is split into parts that run simultaneously on multiple machines. This is only possible when the problem can be decomposed into independent or loosely-coupled sub-problems — a property called embarrassingly parallel in the best case, and difficult to parallelise in the worst.

graph LR
    subgraph "Master Node"
        M["Scheduler
Job Queue
Data Distributor"]
    end
    subgraph "Compute Nodes"
        C1["Node 1
Worker"]
        C2["Node 2
Worker"]
        C3["Node 3
Worker"]
        CN["Node N
Worker"]
    end
    M --> C1
    M --> C2
    M --> C3
    M --> CN
    C1 -.-> M
    C2 -.-> M
    C3 -.-> M
    CN -.-> M

The scheduler (or master node) is responsible for decomposing the computation, distributing work units to worker nodes, collecting results, and handling failures. This introduces a single point of failure and a potential bottleneck — a tension that grid computing will address by removing the assumption of centralised control.

4. Beowulf: A Cluster Case Study

The most famous example of cluster computing is the Beowulf architecture. Beowulf clusters are Linux-based: each cluster is a collection of computing nodes controlled and accessed by a single master node. The master node provides the user interface, storage, and job scheduling; the worker nodes are typically diskless and boot from the network, which reduces cost and simplifies management.

Beowulf demonstrated that commodity hardware + open-source software could compete with proprietary supercomputers. The key design decisions were:

Commodity hardware: standard PCs with off-the-shelf Ethernet networking
Open-source OS: Linux with custom kernel patches for cluster communication
Standard programming models: MPI (Message Passing Interface) and PVM (Parallel Virtual Machine) for writing parallel programs
Single-system image: the user sees one machine, not 64 nodes in a rack

Editor’s note

The name "Beowulf" comes from the Old English epic poem — a reference to the legendary strength of the hero. The name was chosen to evoke the idea of many ordinary components combining to achieve extraordinary power.

Property	Cluster Computing	Grid Computing
Hardware	Similar or identical machines	Diverse, different architectures
Operating System	Same OS across all nodes	Multiple OSes, versions
Network	Single high-speed LAN	Wide-area, heterogeneous links
Administration	Single administrative domain	Multiple domains, organisations
Failure Model	Nodes fail, network is reliable	Nodes and network both unreliable

6. Grid Computing and Virtual Organisations

Grid computing extends the cluster concept across organisational boundaries. The main idea is that resources from different organisations are brought together to promote collaboration between individuals, groups, or institutions, bypassing organisational boundaries.

Collaboration takes the form of a virtual organisation: a new virtual organisational entity that includes people from existing organisations, who access resources made available by participating organisations — including servers, databases, hard disks, and other infrastructure.

Key idea

A virtual organisation is a dynamic, temporary alliance of participants who pool resources to achieve a common goal. It is not a legal entity; it is a computational one. By their very nature, grid computing systems must deal with different administrative domains — each with its own policies, security requirements, and management practices.

This introduces several difficult problems:

Cross-domain authentication: how does a user from organisation A prove their identity to a resource in organisation B?
Resource discovery: how do you find available resources across participating organisations?
Policy negotiation: what happens when organisation A’s security policy conflicts with organisation B’s?
Accounting and billing: how do you track and charge for resource usage across domains?

8. Distributed Information Systems: Transactions

The second major class — distributed information systems — originated from a different problem: many separate networked applications need to be integrated. The core difficulty is interoperability: how do you make independently-designed applications work together meaningfully?

The slides identify two sub-sorts:

Transaction Processing Systems: several non-interoperating servers shared by a number of clients, requiring distributed queries and distributed transactions
Enterprise Application Integration (EAI): sophisticated applications — not just databases but also processing components — that need to communicate directly with each other

The first sub-sort starts from the database world: when data is distributed across multiple servers, operations on that data must be distributed transactions that span servers while preserving correctness.

sequenceDiagram
    participant C as Client
    participant TM as Transaction Manager
    participant DB1 as Database A
    participant DB2 as Database B
    C->>TM: BEGIN TRANSACTION
    C->>TM: UPDATE account SET balance=balance-100 WHERE id=1
    TM->>DB1: Lock row, deduct 100
    DB1-->>TM: OK
    C->>TM: UPDATE account SET balance=balance+100 WHERE id=2
    TM->>DB2: Lock row, add 100
    DB2-->>TM: OK
    C->>TM: COMMIT
    TM->>DB1: Prepare
    DB1-->>TM: Ready
    TM->>DB2: Prepare
    DB2-->>TM: Ready
    TM->>DB1: Commit
    TM->>DB2: Commit
    Note over TM,DB2: Two-Phase Commit (2PC)

9. ACID Properties in a Distributed Setting

Operations on databases are usually performed in terms of transactions. When databases are distributed, transactions should be distributed. This requires special primitives from the distributed system or the runtime system, and it relies on the ACID properties:

Property	Meaning	Distributed Challenge
Atomic	All steps of a transaction occur invisibly to the outside world — either all happen or none happen	Ensuring atomicity across multiple servers requires a coordination protocol (2PC). Partial failures can leave some servers committed and others aborted.
Consistent	The transaction does not violate system invariants	Invariants may be distributed across servers. Checking consistency globally requires distributed snapshots or locking.
Isolated	Concurrent transactions do not interfere with each other	Isolation in a distributed system requires distributed locking or multi-version concurrency control across nodes.
Durable	Once a transaction commits, its effects are permanent	Durability requires writing to stable storage at multiple sites. If one site crashes after commit but before persisting, the guarantee is violated.

Key idea

The ACID properties were designed for single-node databases. Achieving them in a distributed setting adds enormous complexity: atomicity requires two-phase commit (which blocks), isolation requires distributed locking (which risks deadlocks), and durability requires synchronous replication (which costs latency).

10. Nested Transactions

A nested transaction is a transaction made of a number of subtransactions. Nesting in transactions can be arbitrarily deep — each subtransaction may itself contain subtransactions. This structure is a natural way to distribute transactions: "leaf" subtransactions are ordinary transactions over single servers, and distributed transactions are nested transactions.

The Durability Problem with Nested Transactions

A whole nested transaction should exhibit ACID properties. However, this creates a tension: if a subtransaction commits but the top-level transaction later fails, the subtransaction’s effects should be undone even though they already committed. Durability at the subtransaction level would contradict atomicity at the top level.

The solution is the concept of a private copy of the world: all transactions are performed over a copy of the data. Subtransactions can maintain ACID properties within this "local world." The effects of a successful nested transaction are propagated to the real world only after the top-level transaction succeeds. If it fails, the private copy is simply discarded.

Exam tip

Understand the trade-off: subtransactions appear durable within the nested transaction, but their effects are only provisional from the perspective of the outside world. Durability refers to the top-level transaction, not the subtransactions.

11. TP Monitors and Enterprise Application Integration

Transaction Processing Monitors

An early solution for managing distributed transactions was the Transaction Processing (TP) Monitor. A TP monitor sits between clients and multiple database servers, allowing applications to access multiple DB servers with transactional semantics. It handles:

Routing: directing client requests to the appropriate DB server
Coordination: managing distributed transactions across servers (typically using two-phase commit)
Load balancing: distributing requests among replicated servers
Recovery: handling server failures by aborting or retrying transactions

graph TB
    subgraph "Clients"
        C1["Client 1"]
        C2["Client 2"]
        C3["Client 3"]
    end
    subgraph "TP Monitor"
        TM["Transaction Manager
Queue Manager
Scheduler"]
    end
    subgraph "Databases"
        DB1["DB Server A"]
        DB2["DB Server B"]
        DB3["DB Server C"]
    end
    C1 --> TM
    C2 --> TM
    C3 --> TM
    TM --> DB1
    TM --> DB2
    TM --> DB3

Enterprise Application Integration

The second sub-sort of distributed information systems goes beyond database integration. It is not only a matter of accessing distributed databases — integration should happen at the application level too. Beyond data integration, what is needed is process integration: applications should interact and communicate meaningfully with each other as part of coordinated business processes.

The key architectural technique is using middleware as a communication facilitator. Middleware provides a common communication backbone that applications use to exchange messages, invoke services, and coordinate workflows — without each application needing to know the details of every other application’s implementation.

Editor’s note

EAI middleware is the conceptual ancestor of modern message queues (RabbitMQ, Kafka), service meshes (Istio, Linkerd), and API gateways. The problems are the same; the technologies have evolved.

12. Distributed Pervasive Systems

The third class — distributed pervasive systems — comes from asking a different question: what happens when instability is the default condition? Mobile devices with batteries, sporadic network connections, and no permanent human administrator — these are the defining characteristics of pervasive systems.

Key idea

Unlike computing and information systems that assume stable infrastructure, pervasive systems accept instability as the normal mode of operation. Devices come and go, networks partition and merge, batteries drain — and the system must keep working.

Two main features distinguish pervasive systems:

Part of our surroundings: a distributed pervasive system is embedded in the physical environment, not confined to a datacentre or server room
Lack of human administrative control: there is no dedicated system administrator managing the system day-to-day. The system is self-managing — or must appear to be

13. Requirements for Pervasive Systems

Grimm et al. (2004) identify three fundamental requirements that pervasive systems must satisfy:

Embrace contextual changes: A device must be continually aware that its environment may change at any time. Network bandwidth fluctuates, services appear and disappear, user location changes. The system must adapt, not fail.
Encourage ad hoc composition: Many devices in a pervasive system will be used in different ways by different users. There is no pre-planned "correct" way to use them. The system should support spontaneous, opportunistic composition — devices should be able to discover and use each other’s capabilities on the fly.
Recognise sharing as the default: Devices generally join the system to access or provide information. Information should therefore be easy to read, store, manage, and share. The system should not require explicit permission for basic sharing — it should be the natural mode of operation.

Exam tip

These three requirements — contextual awareness, ad hoc composition, sharing as default — represent a fundamental shift from traditional systems design. Be prepared to explain how each differs from assumptions made by computing and information systems.

14. Home Systems and Health Care

The slides present two concrete application domains for pervasive systems:

Home Systems

Home systems are built around home networks. The defining constraint is that there is no way to ask people to act as competent network or system administrators — home systems must be self-configuring and self-maintaining.

Beyond network management, home systems must handle a huge amount of heterogeneous personal information coming from many sources, both inside and outside the home system: media files, documents, photos, sensor data, appliance status, energy usage, security camera feeds. The system must organise, protect, and make accessible all of this data without requiring users to understand its complexities.

Health Care Systems

Health care systems are personal systems built around a Body Area Network (BAN) — a collection of wearable sensors that monitor a patient’s vital signs (heart rate, blood pressure, glucose levels, etc.). The system must minimise impact on the person — preventing free motion would defeat the purpose of ambulatory monitoring.

Several critical questions must be addressed:

Storage: Where and how should monitored data be stored? On the device, on a local hub, in the cloud?
Reliability: How can loss of crucial data be prevented? If a sensor disconnects mid-transmission, what happens?
Alerting: What infrastructure is needed to generate and propagate alerts? If a patient’s heart rate exceeds a threshold, who gets notified, how quickly, and through what channel?
Feedback: How can physicians provide online feedback? Can a doctor adjust monitoring parameters remotely?
Robustness: How can extreme robustness of the monitoring system be realised? The system must work even when components fail — lives may depend on it.
Security: What security issues arise, and how can proper policies be enforced? Medical data is sensitive; unauthorised access, modification, or disclosure have serious legal and ethical implications.

graph TB
    subgraph "Body Area Network"
        HR["Heart Rate Sensor"]
        BP["Blood Pressure"]
        GLU["Glucose Monitor"]
        ACT["Activity Tracker"]
    end
    subgraph "Local Hub"
        PHONE["Smartphone Hub"]
    end
    subgraph "Cloud Infrastructure"
        STORE["Data Storage"]
        ALERT["Alert Service"]
        DASH["Physician Dashboard"]
    end
    HR --> PHONE
    BP --> PHONE
    GLU --> PHONE
    ACT --> PHONE
    PHONE --> STORE
    PHONE --> ALERT
    STORE --> DASH
    ALERT --> DASH

15. Sensor Networks

Sensor networks are an enabling technology for pervasive systems. They consist of clouds of spatially-distributed sensors — from tens to thousands of nodes with a sensing device — acquiring, processing, and transmitting environmental information.

A useful way to view sensor networks is as distributed databases: distributed sources of information that can be queried over time. This perspective raises a fundamental design question: where does the computation happen?

Two Extremes (Both Bad)

Extreme	Description	Problem
All data to base	Sensors send raw information to a central base station without cooperating	Excessive network consumption — thousands of sensors constantly transmitting data saturates the network
All computation at node	Sensors do all processing locally and return only results	Excessive node power consumption — sensors have limited battery life and processing complex queries drains them quickly

The Solution: In-Network Data Processing

The answer lies between the two extremes: in-network data processing. The key ideas are:

Set up a tree network among sensors
Pass queries through the sensor tree, from root to leaves
Aggregate results at the different levels of the tree as they flow back to the root

This approach balances network consumption and power consumption: intermediate nodes combine partial results, reducing both the total data transmitted and the computation required at any single node.

graph TB
    BS["Base Station
(Root)"] --> R1["Region Router A"]
    BS --> R2["Region Router B"]
    R1 --> S1["Sensor 1"]
    R1 --> S2["Sensor 2"]
    R1 --> S3["Sensor 3"]
    R2 --> S4["Sensor 4"]
    R2 --> S5["Sensor 5"]
    S1 -.->|"temp=22.1"| R1
    S2 -.->|"temp=22.4"| R1
    S3 -.->|"temp=21.9"| R1
    S4 -.->|"temp=23.0"| R2
    S5 -.->|"temp=22.8"| R2
    R1 -.->|"avg=22.1"| BS
    R2 -.->|"avg=22.9"| BS

Open questions that the research community continues to address:

How do we dynamically set up an efficient tree in a sensor network?
How does aggregation of results take place? Can it be controlled (e.g., average vs. max vs. median)?
What happens when network links fail? How does the tree repair itself?

16. Distributed Systems Nowadays: CNCF and Cloud Native

The increasing complexity of today’s distributed systems is only manageable by groups of engineers and practitioners. Community efforts are essential, particularly for critical infrastructure components. Open software has become the default — collections of components that provide a foundation for designing complex distributed systems are freely available.

It is fundamental to:

Keep an eye on the community projects available
Participate in the community process to know the state of each project and possibly contribute
Understand the best selection of components for your specific system

Key foundations include:

Foundation	Focus	URL
Apache Software Foundation (ASF)	General-purpose open-source projects	https://www.apache.org
Cloud Native Computing Foundation (CNCF)	Cloud native technologies	https://www.cncf.io/
Free Software Foundation (FSF)	Free software advocacy and licensing	https://www.fsf.org/
Python Software Foundation (PSF)	Python ecosystem	https://www.python.org/psf-landing/

The Cloud Native Computing Foundation

The CNCF, part of the Linux Foundation, is particularly significant for modern distributed systems. It collects and promotes projects that form the foundation of cloud native computing, organised by maturity stage:

Stage	Description	Examples
Graduated	Production-ready, widely adopted	Kubernetes, Prometheus, Envoy, etcd
Incubating	Growing adoption, active development	Vitess, Cilium, Chaos Mesh
Sandbox	Early-stage innovations	Various experimental projects
Archived	No longer actively maintained	Projects that did not gain traction

Key idea

The CNCF Charter defines cloud native technologies as those that "empower organisations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds." The key techniques are containers, service meshes, microservices, immutable infrastructure, and declarative APIs. These enable loosely coupled systems that are resilient, manageable, and observable.

The full landscape of projects is visualised at landscape.cncf.io — a constantly-evolving map of the modern distributed systems ecosystem.

17. Conclusion: Convergence of the Three Classes

Diverse sorts of distributed systems exist, roughly telling the story of the evolution of distributed systems over the years:

Distributed computing systems — focused on aggregating computing power
Distributed information systems — focused on integrating heterogeneous applications
Distributed pervasive systems — focused on embedding computation in unstable environments

Each class depends on the environment where it was developed, the goals it must achieve, and the level of available technologies. Different models, methodologies, and technologies are used to design and develop different sorts of distributed systems.

Key idea

Today, the complexity of distributed systems means they typically borrow features from all three classes. A cloud-native application (class 3, modern) might use Kubernetes for cluster-like resource management (class 1), rely on a distributed database with ACID transactions (class 2), and run on mobile devices with intermittent connectivity (class 3). The boundaries are blurring.

Looking forward, the complexity of distributed systems engineering calls for community efforts. Hundreds of critical software components are available to be looked up, selected, and used. The modern engineer must be both a designer and a curator, choosing the right components for the task at hand and understanding how they interact.

Check Your Understanding

What are the three fundamental classes of distributed systems, and what drives the differences between them?

The three classes are distributed computing systems (high-performance computing via clusters and grids), distributed information systems (transaction processing and enterprise application integration), and distributed pervasive systems (systems embedded in unstable, mobile environments). The differences are driven by the environment, goals, and available technologies at the time each class emerged.

Compare and contrast cluster computing with grid computing. What is the key distinguishing property?

The key distinguishing property is homogeneity vs. heterogeneity. Clusters are homogeneous: similar hardware, same OS, single LAN, single administrative domain. Grids are heterogeneous: diverse hardware, multiple OSes, wide-area networks, multiple administrative domains forming virtual organisations.

Describe the five layers of the grid computing architecture. Which layers form the grid middleware?

Fabric (local resources), Resource (single resource management), Connectivity (communication and security protocols), Collective (multi-resource coordination — discovery, allocation), Application (user-facing programs). The connectivity, resource, and collective layers together form the grid middleware.

Explain the ACID properties. Why is achieving them harder in a distributed setting?

Atomicity (all-or-nothing), Consistency (invariant preservation), Isolation (non-interference of concurrent transactions), Durability (permanence after commit). In a distributed setting: atomicity requires two-phase commit across servers (blocking under failures), isolation requires distributed locking (deadlock risk), durability requires synchronous writes to multiple sites (latency cost), and consistency must hold across distributed invariants.

What is a nested transaction? Why does it create a tension with durability?

A nested transaction is a transaction composed of subtransactions, possibly to arbitrary depth. The tension: if a subtransaction commits but the top-level transaction later fails, the subtransaction’s effects must be undone. Durability at the subtransaction level contradicts atomicity at the top level. The solution is a private copy of the world — subtransactions run on a copy, and results propagate only on top-level success.

What is a TP monitor, and what problems does it solve?

A Transaction Processing Monitor is middleware that sits between clients and multiple DB servers, allowing applications to access them with transactional semantics. It handles routing, distributed transaction coordination (2PC), load balancing, and recovery after server failures.

List and explain the three requirements for pervasive systems according to Grimm et al. (2004).

(1) Embrace contextual changes: devices must be continually aware of changes in their environment. (2) Encourage ad hoc composition: devices should support spontaneous, opportunistic discovery and use of each other’s capabilities. (3) Recognise sharing as the default: information should be easy to read, store, manage, and share without requiring explicit permissions for basic operations.

Why are both extremes of sensor network design (all data to base vs. all computation at node) suboptimal?

All-data-to-base saturates the network with raw transmissions from thousands of sensors. All-computation-at-node drains sensor batteries because each sensor must process complex queries locally. The solution is in-network data processing: build a tree, pass queries through it, aggregate results at intermediate levels — balancing network and power consumption.

What is the CNCF, and what are its four project maturity stages?

The Cloud Native Computing Foundation, part of the Linux Foundation, collects and promotes cloud native computing projects. The four stages are: Graduated (production-ready), Incubating (growing adoption), Sandbox (early-stage), and Archived (no longer maintained). Examples: Kubernetes (graduated), Cilium (incubating).

Explain why modern distributed systems "borrow features from all three classes." Give an example.

Modern systems are so complex that no single class addresses all needs. Example: a cloud-native IoT platform uses Kubernetes for resource management (class 1, computing), a distributed SQL database with ACID transactions (class 2, information), and runs on edge devices with intermittent connectivity (class 3, pervasive). The boundaries between classes have blurred as technology has matured.

Sorts of Distributed Systems

In this lesson

1. Prologue: Three Generations

2. The Three Fundamental Classes

3. Distributed Computing Systems: Clusters

Cluster Computing Systems

4. Beowulf: A Cluster Case Study

5. Cluster vs. Grid: Homogeneity and Heterogeneity

6. Grid Computing and Virtual Organisations

7. Grid Middleware Architecture

8. Distributed Information Systems: Transactions

9. ACID Properties in a Distributed Setting

10. Nested Transactions

The Durability Problem with Nested Transactions

11. TP Monitors and Enterprise Application Integration

Transaction Processing Monitors

Enterprise Application Integration

12. Distributed Pervasive Systems

13. Requirements for Pervasive Systems

14. Home Systems and Health Care

Home Systems

Health Care Systems

15. Sensor Networks

Two Extremes (Both Bad)

The Solution: In-Network Data Processing

16. Distributed Systems Nowadays: CNCF and Cloud Native

The Cloud Native Computing Foundation

17. Conclusion: Convergence of the Three Classes

Check Your Understanding