DS-M8

1. Prologue: Why Model Distributed Systems?

Distributed systems are inherently complex: many nodes, heterogeneous hardware, unpredictable networks, concurrent execution. To build them reliably we need ways to represent them that abstract away accidental detail while preserving what matters for analysis and design.

The core questions that drive architectural modelling are:

Design-time representation — how can a distributed system be represented before it is built, so that engineers can reason about its structure?
Run-time representation — how can a running system be described, so that operators understand its current configuration?
Evolution over time — how do we account for changes as the system grows, migrates, or recovers from failure?
Distribution in space — how do we capture the spatial arrangement of components across machines, networks, and geographic regions?

Key idea

The basic ontology of distributed systems (nodes, links, processes, messages) gives us a ground to stand on. Software and system architectures provide the language to describe organised distributed systems at a higher level of abstraction.

2. Software Architectures to Handle Complexity

Distributed systems are complex by nature. To manage this complexity, a system must be properly organised. The organisation of a distributed system is mostly expressed in terms of its software components and the way they interact.

A software architecture is an abstraction of the run-time elements of a software system during some phase of its operation — Fielding, 2000. A system may be composed of many levels of abstraction and many phases of operation, each with its own software architecture.

There are many ways to organise components of a distributed system, classified as software architectures. Furthermore, there are many possible instantiations of a software architecture, where components have their actual place in the distributed system — often called system architectures.

Editor's note

Software architecture is about logical organisation (the "what" and "how" of interaction). System architecture is about physical/spatial placement (the "where"). Both are essential, and confusing them is a common source of design errors.

3. Architectural Elements: Components, Connectors, Data

According to Fielding (2000), a software architecture is defined by a configuration of architectural elements constrained in their relationships to achieve a desired set of architectural properties. There are three kinds of architectural elements:

Components

A component is a modular unit with well-defined interfaces, which is replaceable within its environment. It is an abstract unit of software instructions and internal state that provides a transformation of data via its interface. Interfaces exist in both directions: components both provide and require interfaces.

Connectors

A connector is an abstract mechanism that mediates communication, coordination, or cooperation among components. Anything that provides a mechanism for interaction among components qualifies as a connector: sockets, pipes, RPC stubs, message queues, middleware buses, etc.

Data

A datum is an element of information that is transferred from a component, or received by a component, via a connector. Data is the third essential ingredient: without it, components and connectors have nothing to exchange.

graph LR
    subgraph Component
        C1["Component A"]
        C2["Component B"]
    end
    subgraph Connector
        K1["Socket / RPC / Message Bus"]
    end
    subgraph Data
        D1["Request / Response / Event"]
    end
    C1 -- "sends data via" --> K1
    K1 -- "delivers data to" --> C2

Exam tip

Be prepared to explain why all three elements (components, connectors, data) are necessary to define a software architecture. A system with only components and connectors but no notion of data is incomplete — the data flowing through the connectors determines the semantics of interaction.

4. Architectural Properties & Constraints

Architectural properties are derived from the selection and arrangement of components, connectors, and data within a system. They include both functional properties (what the system does) and quality attributes such as ease of evolution, reusability of components, efficiency, dynamic extensibility, and so on.

Properties are induced by the set of constraints within an architecture. Constraints restrict the roles and features of architectural elements and the allowed relationships among them. They are often motivated by the application of a software engineering principle to an aspect of the architectural elements.

For example, imposing a constraint that "a component may only communicate with its immediate neighbour in a chain" induces properties of modularity and testability, but may reduce flexibility and increase latency. Every constraint is a trade-off.

Key idea

Constraints are not limitations — they are design decisions that induce properties. A good architecture chooses the right constraints to produce the desired properties for a given context.

5. The Notion of Architectural Style

An architectural style is a coordinated set of architectural constraints that restricts the roles and features of architectural elements and the allowed relationships among those elements within any architecture that conforms to that style — Fielding, 2000.

Architectural styles serve two fundamental purposes:

Classification — they are a mechanism for categorising architectures and defining their common characteristics, allowing different systems to be compared at a structural level.
Design guidance — they provide general patterns for the overall design of new systems, capturing the essence of a pattern of interaction by ignoring accidental details.

An architectural style is formulated in terms of components, the way components are connected to each other, the data flowing through the components, and the configuration of all these things together to build the system. Styles are devised out rather than invented — they emerge from observing what works in practice.

Editor's note

Architectural styles are to software architecture what design patterns are to software engineering: recurring solutions to recurring problems, codified as named, well-understood configurations. The difference is that styles operate at a higher level of abstraction — they define the overall shape of the system, not the internal structure of individual classes or modules.

6. Five Main Architectural Styles

In the distributed systems literature, four main architectural styles have been identified, plus a hybrid that combines two of them. The following comparison table captures their essence:

Layered Architectures

Components are organised in a layered fashion; a component in a layer only calls the layer directly below, and is only called by the layer directly above. The request-response flow is top-down / bottom-up, and control flow follows the data flow.

Typical example: The OSI network stack, Web application frameworks (presentation, business logic, data access).

Key property: Each layer can be replaced independently as long as its interface to adjacent layers is preserved.

Object-Based Architectures

Components are objects connected through a remote procedure call (RPC) mechanism. Each object encapsulates state and exposes methods; clients invoke methods on remote objects as if they were local. Client-server architectures are built out of this style.

Typical example: CORBA, Java RMI, gRPC services, RESTful microservices (to some extent).

Key property: Location transparency — the caller does not need to know where the target object resides.

Data-Centered Architectures

Processes communicate through a shared repository. The repository may be passive (reactive — it only responds to queries) or proactive (it monitors, triggers, and pushes data). Everything depends on how the repository represents information, handles events, and responds to interaction.

Typical example: Web-based systems, shared file systems, databases as integration hub.

Key property: Producers and consumers are coupled through the repository but need not know each other directly.

Event-Based Architectures

Processes communicate through an event bus that propagates events, possibly carrying data. The canonical example is publish/subscribe: publishers publish events to the middleware; subscribers receive only the events to which they have subscribed.

Typical example: Kafka topics, MQTT brokers, DOM event dispatch in a browser.

Key property: Referential uncoupling (processes do not need to know each other's identity) and space uncoupling (they do not need to share the same address space).

Shared Data-Space Architectures

A hybrid of data-centered and event-based styles. The shared repository is simultaneously a persistent data-space and an event bus: data is stored and accessed, and related events are propagated. The canonical example is the blackboard system: processes put data on a blackboard; the blackboard aggregates knowledge, implements policies, and drives coordination.

Typical example: Linda tuple spaces, JavaSpaces, coordination middleware.

Key property: Time uncoupling — processes can communicate without co-presence; data persists beyond the lifetime of any single process.

Comparison summary

The five styles differ along two key axes: coupling dimension (referential, spatial, temporal) and flow pattern (top-down, request-response, shared-repository, event-propagation). Shared data-space architectures combine the strengths of both data-centered and event-based approaches, achieving the fullest decoupling.

7. Layered Architectures

In a layered architecture, components are arranged in horizontal strata. Each layer provides services to the layer above and consumes services from the layer below. The flow of control and data follows a strict top-down request, bottom-up response pattern.

graph TD
    L1["Presentation Layer"] --> L2["Business Logic Layer"]
    L2 --> L3["Data Access Layer"]
    L3 --> L4["Database Layer"]
    L4 --> L3
    L3 --> L2
    L2 --> L1

The strict dependency rule — "a layer only calls the layer directly beneath" — ensures that each layer is a replaceable unit. As long as interfaces are preserved, swapping out an entire layer (e.g., replacing a relational database with a document store) is possible without changing the layers above.

Layering is the dominant organisational pattern in enterprise applications, the OSI network model, and the TCP/IP stack. The web itself is layered: HTTP over TCP over IP over Ethernet.

Key idea

Layering is a constraint that induces modifiability and portability at the cost of performance (every interaction must traverse multiple layers, potentially adding latency). This is the classic architectural trade-off.

8. Object-Based Architectures

In object-based architectures, the components are objects that encapsulate state and behaviour. Objects are distributed across the network and communicate through some form of remote procedure call (RPC) or remote method invocation (RMI).

graph LR
    subgraph "Client Node"
        C["Client Object"]
    end
    subgraph "Server Node"
        O1["Object A"]
        O2["Object B"]
    end
    C -- "RPC call" --> O1
    O1 -- "invokes" --> O2
    O2 -- "reply" --> O1
    O1 -- "response" --> C

The defining characteristic of object-based architectures is that the distribution of objects is hidden behind the interface. A client invokes a method as if the object were local; the middleware takes care of marshalling arguments, sending them across the network, and unmarshalling the result. This is access transparency and location transparency in action.

The most prominent systems built out of this style are client-server architectures. The balance between client and server can vary: thin-client vs. thick-client, two-tier vs. three-tier, and the modern micro-services decomposition are all variations.

Limitation

Object-based architectures assume that the network is reliable enough to make remote calls look like local calls. When the network fails (partition, high latency), the illusion breaks — what was a simple method call becomes a partial failure that the client must handle explicitly.

9. Data-Centered Architectures

In data-centered architectures, communication among processes occurs through a shared repository. Processes do not talk to each other directly; they interact indirectly by reading from and writing to the central repository.

graph BT
    subgraph "Shared Repository"
        R["Data Store"]
    end
    P1["Process 1"] -- "read/write" --> R
    P2["Process 2"] -- "read/write" --> R
    P3["Process 3"] -- "read/write" --> R
    P4["Process 4"] -- "read/write" --> R

The behaviour of the architecture depends critically on the nature of the shared repository:

How information is represented — is it a file system, a relational database, a document store, a tuple space?
How events are handled — does the repository notify subscribers when data changes, or does each process poll for changes?
How the repository behaves in response to interaction — is it passive (only responds to queries) or proactive (triggers computation, runs policies)?
How processes interact through the repository — do they coordinate via locks and transactions, or via an optimistic concurrency model?

Data-centered architectures are everywhere: web-based systems are largely data-centric (the browser renders data from a server, which itself reads from a database); many distributed applications still work by sharing files around the network.

Exam tip

Compare data-centered with event-based architectures. In data-centered, the state is shared; in event-based, the events are shared. Shared data-space architectures combine both, sharing state and events in a single infrastructure element.

10. Event-Based Architectures

In event-based architectures, processes communicate by generating and consuming events through an event bus. An event represents something that happened: a sensor reading changed, a file was uploaded, a payment was processed. Events may carry data along with them.

graph LR
    Prod1["Publisher 1"] --> Bus["Event Bus"]
    Prod2["Publisher 2"] --> Bus
    Bus --> Sub1["Subscriber 1"]
    Bus --> Sub2["Subscriber 2"]
    Bus --> Sub3["Subscriber 3"]

The canonical instance is the publish/subscribe model:

Publishers publish events to the middleware without knowing who, if anyone, will receive them.
Subscribers express interest in certain types of events (by topic, content filter, or channel) and receive matching events asynchronously.

The defining property of event-based architectures is decoupling:

Referential uncoupling — publishers and subscribers have no references to each other. A publisher does not need to know the identity or address of subscribers.
Space uncoupling — participants do not need to share the same address space or even be active at the same time (the event bus persists and delivers events asynchronously).

Key idea

Event-based architectures are the natural model for reactive systems: systems that must respond to changes in their environment in near real-time. They are also the foundation for loosely-coupled microservice architectures, where services communicate through message brokers.

11. Shared Data-Space Architectures

Shared data-space architectures are a hybrid that puts together data-centered and event-based architectures. The shared repository serves a dual role: it is a shared persistent data-space where data is stored and accessed, and also an event bus through which related events are propagated.

graph TB
    subgraph "Shared Data-Space"
        DS["Tuple Space / Blackboard"]
        E["Event Notifications"]
    end
    P1("Process 1") -- "write tuples" --> DS
    P2("Process 2") -- "read/consume" --> DS
    DS -- "triggers" --> E
    E -. "notifies" .-> P1
    E -. "notifies" .-> P2
    P3("Process 3") -- "write tuples" --> DS

The canonical example is the blackboard system. In a blackboard architecture:

Multiple processes (knowledge sources) put data on a shared blackboard.
The blackboard aggregates knowledge, implements policies, and drives coordination among the processes.
When relevant data appears, the blackboard can notify waiting processes.

The defining property of shared data-space architectures is time uncoupling: processes can communicate without needing to be co-present. A producer can write data to the space and terminate; a consumer can come online later and read it. This is in contrast to event-based systems, where an undelivered event may be lost if no subscriber is listening.

Editor's note

Linda (the tuple-space coordination language) is the archetypal example. Processes communicate through a shared tuple space by writing tuples with out(), reading with rd(), and consuming with in(). Pattern matching on tuple fields enables selective retrieval. JavaSpaces and TSpaces are later incarnations of the same idea.

13. Software vs. System Architectures

A crucial distinction runs throughout this lesson: software architecture versus system architecture.

Dimension	Software Architecture	System Architecture
Focus	Logical organisation of components	Physical placement of components
Primary concern	Interaction patterns, interfaces, data flow	Spatial distribution, replication, deployment
Time	Deals with behaviour over time	Concerned with configuration in space
Abstraction level	High: components are conceptual units	Lower: components are actual processes/nodes
Example question	"Which layers does the request traverse?"	"Which machines run which services?"

Software architectures are concerned with logical organisation, possibly over time: how components interact, what data flows, what interfaces are exposed. System architectures are concerned with component placement in a distributed setting: they deal with spatial distribution, with the actual mapping of logical components to physical nodes.

Exam tip

You may be asked to distinguish software architecture from system architecture with examples. The same software architecture (e.g., a three-tier layered style) can be instantiated as multiple different system architectures (e.g., all three tiers on one machine, or each tier on separate machines, or each tier replicated across multiple machines for fault tolerance).

14. Open Questions & the Role of Formalism

The notion of software and system architectures provides a solid start for the discipline of distributed systems engineering. However, several important questions remain open:

Is this enough for a well-grounded foundation for a science of computational distributed systems? Architectural styles are expressive and abstract, but they are also approximative and perhaps non-scientific. They serve engineering well, but do they serve science?
Can we prove theorems based on software and system architectures? To prove properties about a distributed system (e.g., "deadlock freedom", "consistency guarantee"), we need formal models. Architectural styles describe the shape of a system, not its dynamics.
What is the role of "math-like" formalism, such as process algebras? Process algebras (CCS, CSP, the pi-calculus) provide a formal language for describing and reasoning about concurrent and distributed processes. They offer compositionality (the meaning of a system is a function of the meaning of its parts) and algebraic laws for equational reasoning.

Key idea

Architectural styles are a descriptive tool for engineering; formal methods (process algebras, temporal logics) are a prescriptive tool for verification. The two are complementary: styles give us a language to think about distributed systems, while formalisms give us the tools to prove things about them.

15. Conclusion

This lesson introduced the fundamental concepts for modelling distributed systems through software and system architectures:

Software architectures describe the logical organisation of components, connectors, and data, constrained to achieve desired architectural properties.
Architectural styles are coordinated sets of constraints that classify systems and provide design patterns. The five main styles layer, object, data-centered, event-based, and shared data-space each embody different coupling models and trade-offs.
System architectures instantiate software architectures with actual placement of components across a distributed system.
The distinction between logical organisation (software architecture) and physical distribution (system architecture) is foundational.

Looking ahead

In subsequent lessons, the course will explore how these architectural styles are implemented in practice: middleware, communication protocols, naming, synchronization, consistency, fault tolerance, and security all build on the architectural foundation laid here.

Check Your Understanding

Explain the difference between a software architecture and a system architecture, giving a concrete example.

Software architecture is the logical organization of components, connectors, and data — the abstract design describing how interaction happens. System architecture is the spatial instantiation of that design on real machines. Example: a three-tier layered software architecture (presentation / business logic / data) could be deployed as a system architecture where all three tiers run on one laptop during development, or on three separate server clusters in production with load balancers and replication.

Define an architectural style in your own words. What are its constituent parts?

An architectural style is a named, coordinated set of constraints on architectural elements (components, connectors, data) and their allowed relationships. It defines: (1) what kinds of components exist, (2) how they can be connected, (3) what data flows between them, and (4) how they are configured together to form a system. Styles serve both to classify existing systems and to guide the design of new ones.

Compare layered and event-based architectures along the dimension of coupling.

Layered architectures are tightly coupled in the vertical dimension: a layer knows the interface of the layer below and must follow the strict top-down/bottom-up flow. There is no referential uncoupling — each layer explicitly references its lower neighbor. Event-based architectures achieve referential and spatial uncoupling: publishers do not know subscribers, and neither needs to share address space. However, event-based systems introduce temporal uncertainty — you cannot be sure when or if an event has been processed. Layered systems give stronger guarantees about processing order but are harder to reconfigure dynamically.

What is the fundamental trade-off that constraints introduce in an architecture? Give an example.

Constraints induce properties but also restrict flexibility. Example: the strict layering constraint ("a layer only calls the layer below") induces modifiability (you can replace an entire layer) but imposes a performance cost (every request must traverse all layers, adding latency). If you violate the constraint to allow a shortcut (bypassing a layer for performance), you lose the modifiability guarantee. Every architectural constraint is a trade-off between competing quality attributes.

Explain the difference between referential uncoupling, space uncoupling, and time uncoupling. Which architectural styles provide which?

Referential uncoupling: components do not need references to each other (event-based). Space uncoupling: components do not share the same address space (event-based and shared data-space). Time uncoupling: components need not be active at the same time; data persists so that late-joining consumers can access it (shared data-space). Layered and object-based styles provide none of these — components explicitly reference each other. Data-centered provides space uncoupling but not time uncoupling (the repository persists, but processes that read from it must be active). Shared data-space provides all three.

Derive: Given the description of a distributed system, identify which architectural style(s) it most likely conforms to. For instance, "A system where services communicate by writing to and reading from a shared database, and are notified of changes via triggers."

This describes a data-centered architecture (shared database as the repository) with event-based characteristics (triggers as event notifications), which together point toward a shared data-space architecture. The shared database acts as both persistent store and notification bus. This is a classic blackboard-like configuration.

What are the three kinds of architectural elements according to Fielding's definition, and why are all three necessary?

They are components (computational units with interfaces), connectors (mechanisms mediating interaction), and data (information transferred between components via connectors). All three are necessary because: components without connectors are isolated and cannot interact; connectors without components have nothing to connect; components and connectors without data have nothing to exchange. A complete architecture must specify all three and their constraints.

Modelling Distributed Systems. Software & System Architectures

In this lesson

1. Prologue: Why Model Distributed Systems?

2. Software Architectures to Handle Complexity

3. Architectural Elements: Components, Connectors, Data

Components

Connectors

Data

4. Architectural Properties & Constraints

5. The Notion of Architectural Style

6. Five Main Architectural Styles

Layered Architectures

Object-Based Architectures

Data-Centered Architectures

Event-Based Architectures

Shared Data-Space Architectures

7. Layered Architectures

8. Object-Based Architectures

9. Data-Centered Architectures

10. Event-Based Architectures

11. Shared Data-Space Architectures

12. Architectural Styles State Explorer

13. Software vs. System Architectures

14. Open Questions & the Role of Formalism

15. Conclusion

Check Your Understanding