DS-CX

1. Motivations: Why Container Orchestration Matters

Most software systems deployed today are distributed. They span multiple processes, multiple machines, and often multiple data centres. Containerization frameworks such as Docker opened the door to packaging and shipping these systems in a portable, reproducible way — but they did not solve the operational problem: once you have dozens or hundreds of containers, how do you keep them running?

The gap containers created

Docker gave us a uniform packaging format. What it did not give us was a way to schedule containers across a fleet of machines, restart them when they fail, route traffic to healthy instances, scale them under load, or roll out updates without downtime. That is the job of a container orchestrator — and Kubernetes is now the de facto standard for that job.

Non-functional requirements — availability, scalability, reliability, recoverability, observability — acquire even more importance when a system is distributed. A single-container application that crashes is an annoyance. A hundred-container application where 5% of containers crash every hour is a disaster without automation. Kubernetes encodes these quality attributes directly into its architecture, providing declarative controls for all of them.

The examiner will ask

Why did the industry need a container orchestrator on top of container runtimes? Explain the specific operational gaps that Docker alone does not address and how Kubernetes fills them with concrete mechanisms.

2. Premises: Containerization as the Foundation

Before Kubernetes, there were virtual machines — and before VMs became the dominant deployment model, there were containers. Containers offer a lightweight form of virtualization that shares the host operating system kernel while providing process and filesystem isolation through Linux kernel features such as namespaces and cgroups.

Background

The VM era taught us that full OS-level virtualization carries heavy overhead: every VM runs its own kernel, consumes gigabytes of RAM, and boots slowly. Containers flip this model: they share the kernel but isolate user-space, achieving near-native performance with millisecond startup times. For a deeper treatment of Linux isolation primitives, see Giovanni Ciatto's lecture on Virtuale.

Docker became the most popular containerization platform, providing a user-friendly CLI, a layered image format, and a registry (Docker Hub) for distributing images. The key insight is that a container image is an immutable snapshot of an application and all its dependencies — the same image runs identically on a developer's laptop, a CI server, and a production cluster. This reproducibility is the precondition for everything Kubernetes builds on top.

Each VM has its own guest OS kernel
Heavyweight: GB of RAM, minutes to boot
Strong isolation at the hypervisor level
Good for multi-tenant, mixed-OS environments

Containers share the host kernel
Lightweight: MB of RAM, milliseconds to start
Isolation via Linux namespaces and cgroups
Ideal for microservices and dense packing

3. What Kubernetes Is (and What It Is Not)

Kubernetes describes itself as "a portable, extensible, open source platform for managing containerized workloads and services." The emphasis on platform is deliberate: Kubernetes is not a single tool but an ecosystem that provides primitives (scheduling, service discovery, scaling, self-healing) that other tools compose into complete solutions.

Equally important is understanding the boundaries. Kubernetes is not:

Kubernetes is not...	What fills that gap
A containerization platform	Docker, containerd, CRI-O (any CRI runtime)
A PaaS (Platform as a Service)	It provides lower-level primitives; PaaS layers like OpenShift build on top
A cloud provider	Kubernetes runs on AWS, GCP, Azure, on-premises — it is cloud-agnostic
A CI/CD deployment tool	CI/CD pipelines (GitHub Actions, Jenkins, ArgoCD) push to Kubernetes
A logging or monitoring tool	It offers integrations and APIs; tools like Prometheus, Grafana, and the EFK stack handle observability

The separation of concerns is important. Kubernetes manages the runtime plane — keeping your containers running, healthy, and reachable. It deliberately leaves the development pipeline, logging storage, and monitoring dashboards to other tools in the cloud-native ecosystem. This modularity is a strength, not a limitation.

Mental model

Think of Kubernetes as the distributed-system kernel of your production environment. It provides scheduling (like a process scheduler), networking (like a virtual network stack), and storage (like a volume manager) — but for containers spread across a cluster of machines rather than processes on a single host.

4. Kubernetes vs. Docker Swarm

Docker Swarm is the native clustering and orchestration tool bundled with Docker. It is simpler to set up and operates within the familiar Docker ecosystem — the docker-compose.yml file is its primary configuration interface. For small clusters and less complex applications, Swarm can be a pragmatic choice.

Kubernetes, by contrast, is a full-featured orchestration platform. The trade-off is real: more complexity at setup time, but dramatically more automation at runtime.

Native clustering for Docker — simple setup
Suitable for smaller clusters and less complex applications
Users manually handle resource allocation and scaling
Configuration via docker-compose.yml
Simple access control based on TLS
Limited ecosystem and community momentum

Open-source orchestration platform — richer but more complex
Ideal for larger clusters and more complex applications
Automatically handles scaling, load balancing, and failover
Configuration via declarative YAML (or JSON) manifests
Advanced access control with Role-Based Access Control (RBAC)
Massive cloud-native ecosystem (Helm, Istio, Prometheus, ArgoCD, etc.)

The examiner will ask

Compare Kubernetes and Docker Swarm along the axes of complexity, automation, scaling, and access control. In what scenarios would Swarm still be a reasonable choice, and at what point does the complexity investment in Kubernetes pay off?

5. Key Features: Immutability and Declarative Configuration

Two philosophical pillars distinguish Kubernetes from imperative orchestration tools: immutability and declarative configuration.

Immutability

In Kubernetes, resources are not meant to be mutated in-place after creation. If a change is needed, the resource is deleted and a new one is created. Containers themselves are designed to be ephemeral and stateless: deleting and recreating containers is a standard part of their lifecycle, not an exceptional event. This principle ensures consistency and reliability — the cluster always converges toward the declared desired state rather than accumulating drift from ad-hoc patches.

Declarative Configuration

Kubernetes adheres to the principle "everything is an object." Many kinds of objects — Pods, Deployments, Services, ConfigMaps, Secrets, and dozens more — are available to shape the production environment. Configuration files are written in YAML (or JSON), and an external tool named kubectl manages the environment by applying these files:

kubectl create -f configuration-file.yaml

The declarative model works as follows: you describe the desired state in a manifest file, and Kubernetes continuously reconciles the actual state to match it. If a Pod crashes, Kubernetes notices and creates a replacement — not because you told it to, but because the desired state says "there should be N replicas" and the actual state is "there are N-1." This control loop is the engine of Kubernetes' automation.

Why declarative wins

Imperative systems (like shell scripts or Ansible playbooks) describe how to reach a state. Declarative systems describe what state you want. The difference matters at scale: when a declarative system encounters an unexpected condition, it keeps trying to converge. An imperative script just stops where it was written to stop. Declarative configuration directly improves configurability and maintainability.

6. Key Features: Autoscaling, Self-Healing, and CRI

Autoscaling

Kubernetes supports automatic scaling based on resource usage or custom metrics, with two dimensions:

Type	Mechanism	What it scales
Horizontal Scaling	Horizontal Pod Autoscaler (HPA)	Number of Pod replicas (up and down)
Vertical Scaling	Vertical Pod Autoscaler (VPA)	CPU and memory resources available to a container

Autoscaling directly improves availability (more replicas handle more traffic) and scalability while optimizing resource usage — you do not provision for peak load 24/7.

Self-Healing

Kubernetes continuously monitors the health of containers and nodes, and takes corrective action automatically:

Failed containers are automatically replaced
If a node fails, Kubernetes can reattach its volumes to a new instance on a different node
If a container behind a Service fails, Kubernetes removes its route and redirects traffic to other healthy instances of the same Service

These behaviours directly improve recoverability and reliability without operator intervention.

Container Runtime Interface (CRI)

The CRI is a plugin interface that decouples Kubernetes from any specific container runtime. A CRI-compliant runtime must be running on each cluster node. Supported runtimes include containerd, CRI-O, and Mirantis Container Runtime — meaning Kubernetes is not tied to Docker only. This decoupling was a key architectural decision that allowed the ecosystem to evolve beyond Docker after Docker's own deprecation of dockershim.

The examiner will ask

Explain self-healing in Kubernetes with concrete examples. How does it differ from simply having a process supervisor like systemd at the node level? Why is the CRI abstraction important for the long-term health of the Kubernetes project?

7. Cluster Architecture: Control Plane and Worker Nodes

A Kubernetes cluster is composed of a control plane plus a set of worker machines called nodes. This is a classic master-worker architecture, but with an important twist: every component on the control plane is itself designed for high availability and can be replicated.

graph TB
  subgraph "Control Plane"
    API[kube-apiserver]
    ETCD[(etcd)]
    SCHED[kube-scheduler]
    CM[kube-controller-manager]
    CCM[cloud-controller-manager]
  end
  subgraph "Worker Node 1"
    K1[kubelet]
    KP1[kube-proxy]
    CR1[container runtime]
  end
  subgraph "Worker Node 2"
    K2[kubelet]
    KP2[kube-proxy]
    CR2[container runtime]
  end
  subgraph "Worker Node N"
    K3[kubelet]
    KP3[kube-proxy]
    CR3[container runtime]
  end
  API --- ETCD
  API --- SCHED
  API --- CM
  API --- CCM
  API --- K1
  API --- K2
  API --- K3

Control Plane Components

Component	Role
kube-apiserver	Exposes the Kubernetes HTTP API. Every interaction with the cluster — from kubectl commands to internal component communication — goes through the API server. It is the front door.
etcd	A distributed key-value database that stores all cluster metadata and state. It is the source of truth for the entire cluster.
kube-scheduler	Watches for newly created Pods with no assigned node and selects a node for them to run on, based on resource availability, affinity rules, and policies.
kube-controller-manager	Runs controller processes — each controller is a control loop that watches the shared state through the API server and makes changes to move the current state toward the desired state.
cloud-controller-manager	Embeds cloud-specific control logic, allowing the core Kubernetes code to remain cloud-agnostic.

Worker Node Components

Component	Role
kubelet	The agent that runs on every node. It ensures that containers described in PodSpecs are running and healthy.
kube-proxy	A network proxy that maintains network rules on nodes, implementing the Service abstraction through iptables or IPVS rules.
container runtime	The software responsible for running containers (e.g., containerd, CRI-O). Must implement the Container Runtime Interface.

Architecture insight

The only component that talks directly to etcd is the API server. All other components read and write cluster state through the API server. This centralised data path is not a bottleneck — it is a design choice that enforces consistency, enables authentication and authorization at a single choke point, and makes the system auditable.

8. Objects: The Building Blocks of Kubernetes

Objects are persistent entities in the Kubernetes system. They represent the state of the cluster: what containerized applications are running and on which nodes, the resources available to those applications, and the policies that govern their behaviour (restarts, upgrades, fault tolerance).

Record of intent

A Kubernetes object is a "record of intent." Once you create an object, the Kubernetes system will constantly work to ensure that the object exists — matching the actual state to the desired state you declared. This is the core reconciliation loop.

Object Structure

Every Kubernetes object includes two nested object fields that govern its behaviour:

Spec: provides a description of the characteristics you want the resource to have — the desired state
Status: describes the current state of the object, supplied and updated by the Kubernetes system and its components

The control plane continually manages every object's actual state to match the desired state supplied in the spec. This control loop is at the heart of Kubernetes' automation.

Required Manifest Fields

When creating an object via a manifest file (conventionally YAML), four fields are required:

Field	Purpose
`apiVersion`	The version of the Kubernetes API to use (e.g., `v1`, `apps/v1`)
`kind`	The type of object being created (e.g., `Pod`, `Deployment`, `Service`)
`metadata`	Data that uniquely identifies the object: name, namespace, UID, labels, annotations
`spec`	The desired state of the object — the format varies per kind and contains nested fields specific to that object

Here is a minimal manifest that creates a Pod running nginx:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2
    ports:
    - containerPort: 80

Applied with: kubectl apply -f manifest.yaml

9. Pods: The Smallest Deployable Unit

A Pod is the smallest deployable unit in Kubernetes. It represents a group of one or more containers with shared storage and network resources, and a specification for how to run them. The shared context of a Pod is a set of Linux namespaces, cgroups, and potentially other facets of isolation — exactly the same primitives that containers themselves use, but now shared among a co-located group of containers.

Shared Context Within a Pod

Containers within the same Pod share:

IP address and ports — they share the same network namespace
Hostname — the Pod's hostname
Storage volumes — any volumes mounted in the Pod are accessible to all its containers
Process identifiers (PIDs) — they may share the PID namespace

Two Usage Patterns

Single container per Pod (most common): the Pod is simply a wrapper around a single container. Kubernetes manages Pods, not containers directly, so this is the standard pattern.
Multiple containers per Pod (advanced): used to encapsulate an application composed of tightly coupled containers that need to share resources. Common patterns include Sidecar (helper container alongside the main application), Ambassador (proxy that abstracts external service access), and Adapter (transforms application output for external consumers).

Design decision

Why not manage containers directly? The Pod abstraction gives Kubernetes a uniform unit of scheduling and lifecycle management, regardless of whether the container runtime uses Docker, containerd, or CRI-O. It also enables advanced patterns (sidecars, init containers) without changing the scheduler.

apiVersion: v1
kind: Pod
metadata:
  name: node-app
  labels:
    app: node
spec:
  containers:
  - name: node-container
    image: node:18
    command: ["node", "-e", "console.log('Hello World!')"]
    resources:
      requests:
        memory: "128Mi"
        cpu: "100m"
      limits:
        memory: "256Mi"
        cpu: "500m"
    ports:
    - containerPort: 3000
      protocol: TCP

Key resource fields:

requests: the amount of CPU/memory that Kubernetes guarantees to the container — used by the scheduler for placement decisions
limits: the maximum amount of CPU/memory the container is allowed to use — exceeding CPU limits causes throttling; exceeding memory limits causes OOM kill
CPU is expressed in millicpu units: 100m = 0.1 CPU core = 10% of a core
Memory is expressed in bytes: 128Mi = 128 mebibytes

10. ReplicaSets, Deployments, and Workload Controllers

ReplicaSet

A ReplicaSet's purpose is to maintain a stable set of replica Pods running at any given time. It guarantees the availability of a specified number of identical Pods. When a ReplicaSet needs to create new Pods, it uses its Pod template. However, ReplicaSets are usually not directly managed by the user — they are controlled by a higher-level object called a Deployment.

Deployment

A Deployment manages a set of Pods to run an application workload. It provides declarative updates for Pods and ReplicaSets, enabling:

Rollout: transitioning from the current state to the desired state (e.g., changing the container image version)
Rollback: reverting to a previous revision if something goes wrong
Zero-downtime updates: through rolling update strategies that gradually replace old Pods with new ones

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

The replicas field specifies the desired number of Pods. The selector tells the Deployment which Pods to manage (via label matching). The template contains the Pod specification used to create new Pods.

Other Workload Controllers

Deployments are not the only solution for managing workloads. Kubernetes provides specialised controllers for different use cases:

For running short-lived, one-off tasks. A Job creates one or more Pods and ensures that a specified number of them successfully terminate. Ideal for batch processing, data migration, or any finite-duration computation.

For regularly scheduled actions such as backups, report generation, log rotation, or periodic cleanup. A CronJob creates Jobs on a repeating schedule defined using cron syntax.

Manages Pods like a Deployment but maintains a sticky identity for each Pod. Pods are created from the same spec but are not interchangeable: each has a persistent identifier and stable network identity. Essential for stateful applications like databases (MySQL, PostgreSQL, MongoDB) where each instance has unique data.

Ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them; as nodes are removed, those Pods are garbage collected. Common use cases: log collection daemons (Fluentd, Filebeat), monitoring agents (Node Exporter), and storage daemons.

The examiner will ask

When would you use a StatefulSet instead of a Deployment? Explain the concept of "sticky identity" and why stateless and stateful workloads require different controllers in Kubernetes.

11. Services: Stable Endpoints for Ephemeral Pods

Pods are ephemeral — they can be created, destroyed, and moved between nodes, which means their IP addresses are subject to change. A Service provides a stable endpoint (a fixed IP address and DNS name) to access a set of Pods, using label selectors to identify the target Pods.

A Service is the entry point of the application. It defines a logical set of Pods and a policy by which to access them, redirecting requests to the appropriate Pods regardless of where they are running in the cluster.

Service Type	Behaviour
ClusterIP (default)	Exposes the Service on a cluster-internal IP. Only reachable from within the cluster. Suitable for internal communication between microservices.
NodePort	Exposes the Service on each Node's IP at a static port (range 30000-32767). Accessible from outside the cluster via `<NodeIP>:<NodePort>`.
LoadBalancer	Exposes the Service externally using a cloud provider's load balancer (or MetalLB for on-premises deployments). Creates a NodePort and ClusterIP Service automatically, then provisions an external load balancer that forwards to the NodePort.

apiVersion: v1
kind: Service
metadata:
  name: node-service
  labels:
    app: node
spec:
  selector:
    app: node
  ports:
  - protocol: TCP
    port: 80
    targetPort: 3000
  type: ClusterIP

The port is the Service port (client-facing); the targetPort is the container port inside the Pod. The selector must match the labels on the target Pods — this is how the Service discovers its backends dynamically as Pods come and go.

Separation of concerns

Services decouple who needs to be reached (the application) from where it currently runs (the Pod IPs). This is exactly the same pattern as DNS for physical machines: a hostname stays constant even when the underlying IP changes. In Kubernetes, a Service provides a virtual IP and DNS name that survive individual Pod restarts and rescheduling.

12. The Pod Lifecycle

Understanding the Pod lifecycle is essential for debugging and for designing applications that gracefully handle Kubernetes' operational model. A Pod traverses several phases from creation to termination, and the transitions are driven by the kubelet on the node where the Pod is scheduled.

Ephemeral by design

Pods are never "repaired" in place. If a container crashes, the kubelet restarts it (subject to the restart policy), but if a node fails entirely, Pods are rescheduled on a different node as new Pods with new IPs. Applications must be designed to tolerate this: store state externally (in a volume or database), use Services for discovery, and never assume a fixed IP or hostname survives a rescheduling event.

The lifecycle phases are:

Pending — The Pod has been accepted by the cluster but one or more containers are not yet running. This includes time spent waiting for the scheduler to assign a node, and time spent pulling container images.
Running — The Pod is bound to a node, all containers have been created, and at least one container is still running or is in the process of starting or restarting.
Succeeded — All containers in the Pod have terminated successfully and will not be restarted. Applicable to Jobs and one-off workloads.
Failed — All containers in the Pod have terminated, and at least one container has terminated in failure (non-zero exit or system kill).
Unknown — The state of the Pod could not be obtained, typically due to a communication error with the node where the Pod should be running.

13. Practical Example: From Docker Compose to Kubernetes

The lecture presents a concrete migration of a simple web application — a visit counter composed of a Flask backend and a Redis in-memory store. The full source is available at github.com/Mala1180/kubernetes-hpa-example.

Phase 1: Development with Docker Compose

The application is minimal: a GET / endpoint increments a hit counter in Redis and returns the count along with the serving Pod's hostname. A GET /busy-wait endpoint performs a CPU-intensive operation for stress testing. The Docker Compose file defines two services — backend and redis — with the backend depending on Redis:

services:
  backend:
    image: username/kubernetes-example-backend:latest
    build: .
    ports:
      - "3000:3000"
    environment:
      - REDIS_HOST=redis
      - REDIS_PORT=6379
    depends_on:
      - redis
  redis:
    image: redis:7-alpine

Development workflow

The application is developed locally using Docker Compose. The depends_on directive ensures Redis starts before the backend, and environment variables wire the components together. The backend connects to Redis at redis:6379 — the Compose DNS resolves redis to the Redis container's IP.

Phase 2: Migration to Kubernetes

Each Compose service is mapped to two Kubernetes objects: a Deployment and a Service. An additional Horizontal Pod Autoscaler object is defined for the backend.

Backend Deployment — defines replicas, the container image (pushed to Docker Hub), environment variables for Redis connectivity, and resource requests/limits:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  labels:
    app: backend
spec:
  replicas: 1
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: mala1180/kubernetes-example-backend:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 3000
        env:
        - name: REDIS_HOST
          value: "redis"
        - name: REDIS_PORT
          value: "6379"
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "200m"

Backend Service — exposes the backend externally via NodePort:

apiVersion: v1
kind: Service
metadata:
  name: backend
  labels:
    app: backend
spec:
  type: NodePort
  ports:
  - port: 3000
    targetPort: 3000
    protocol: TCP
  selector:
    app: backend

The Redis service follows an analogous pattern (Deployment + Service with ClusterIP type, since Redis should not be exposed externally).

14. Horizontal Pod Autoscaling and Stress Testing

The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of Pod replicas in a Deployment based on observed resource utilisation or custom metrics. In the example, the HPA is configured to trigger when CPU usage exceeds 50% or memory usage exceeds 70%, scaling between 1 and 10 replicas:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: backend-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70

Deployment and Scaling Stepper

The following interactive stepper simulates the sequence of operations from initial deployment through scaling under load. Step through each action to see how the cluster state evolves:

Stress Testing Workflow

To validate the autoscaling, a stress-test script generates load on the backend:

Run the stress-test script — the web page becomes unresponsive due to the high load of requests
Within seconds, the HPA detects CPU/memory thresholds are exceeded and starts new Pod replicas
After a while, the web page becomes responsive again, now showing different Pod IDs on each request — confirming that multiple Pods are serving traffic

The practical workflow for a local deployment (using Minikube):

# Start a local Kubernetes cluster
minikube start

# Enable the metrics server (required for HPA)
minikube addons enable metrics-server

# Apply all manifests
kubectl apply -f k8s/

# Access the application
minikube service backend
# or: kubectl port-forward svc/backend 8080:3000

# Open the monitoring dashboard
minikube dashboard

The examiner will ask

Walk through what happens when a stress-test script hits a Kubernetes-deployed application behind an HPA. Describe the sequence of events from initial overload to recovery, naming each Kubernetes component involved and the data it observes or acts upon.

15. System Monitoring: Observability in Production

So far we have focused on scalability, availability, and reliability — but another critical quality attribute is observability. A production system will eventually go down for some reason; an observable system allows operators to understand what happened and what is happening.

Monitoring Stack

The standard monitoring stack for Kubernetes combines two open-source tools:

Tool	Role
Prometheus	Open-source systems monitoring and alerting toolkit. It scrapes metrics from instrumented applications and Kubernetes components, stores them as time-series data, and provides a powerful query language (PromQL). It can also fire alerts based on threshold conditions.
Grafana	Open-source web interface for analytics and monitoring. It connects to Prometheus (and many other data sources) to visualise time-series data through dashboards, allowing operators to see CPU usage, memory consumption, request rates, error counts, and HPA activity at a glance.

Observability triad

The cloud-native observability stack is often described through three pillars: metrics (Prometheus — quantitative data over time), logs (Elasticsearch/Fluentd/Kibana or Loki — structured event records), and traces (Jaeger or OpenTelemetry — end-to-end request flows across services). Kubernetes integrates with all three but does not mandate any specific tool — the CRI and the metrics API provide the integration points.

During the stress test, the Grafana dashboard visualises metrics collected by Prometheus, showing the spike in CPU/memory that triggers the HPA, the creation of new Pods, and the eventual stabilisation as load is distributed across replicas.

16. Kompose and Migration Tooling

Kompose is a conversion tool that automates the translation of Docker Compose files into Kubernetes manifests, saving time during migration. It supports Kompose-specific labels within the compose.yml file to explicitly control how resources are generated:

services:
  backend:
    ...
    labels:
      kompose.service.type: nodeport
      kompose.controller.type: deployment
      kompose.image-pull-policy: always

The conversion is a single command:

kompose convert -f docker-compose.yml -o k8s/

Kompose generates Deployment, Service, and other object manifests in the output directory. While automated conversion is a useful starting point, production Kubernetes manifests typically require manual refinement: resource requests and limits, health checks (liveness/readiness probes), security contexts, ConfigMaps for configuration, and Secrets for sensitive data are all Kubernetes-native concepts with no direct Docker Compose equivalent.

Limitations

Kompose handles the mechanical translation of Compose syntax to Kubernetes syntax, but it does not add Kubernetes-specific best practices. A generated manifest lacks probes, resource tuning, pod disruption budgets, affinity rules, and network policies. Treat Kompose output as a starting template, not a production-ready configuration.

17. Putting It All Together: A Production-Ready Deployment

Let's synthesise the complete picture. Moving a distributed application from development to production on Kubernetes involves a deliberate pipeline that maps architectural concerns to Kubernetes abstractions:

Containerize each service with a Dockerfile — produce immutable images pushed to a registry
Define Deployments for each service — specify replicas, resource requests/limits, environment variables, and image pull policy
Define Services for each Deployment — choose the appropriate Service type (ClusterIP for internal, NodePort/LoadBalancer for external)
Configure autoscaling via HPAs — set minimum and maximum replicas, and define metric thresholds that trigger scaling
Apply manifests with kubectl — the control plane's reconciliation loop converges the cluster to the declared desired state
Observe via Prometheus and Grafana — collect metrics, build dashboards, and configure alerts

The result is a system that is:

Quality Attribute	Kubernetes Mechanism
Availability	ReplicaSets ensure multiple Pod instances; Services route around failed instances
Scalability	Horizontal Pod Autoscaler adjusts replica count based on real metrics
Reliability	Self-healing replaces failed containers and reschedules Pods from failed nodes
Recoverability	Declarative desired state + reconciliation loop = automatic recovery from failures
Observability	Prometheus metrics scraping + Grafana dashboards + alerting rules
Maintainability	Declarative YAML manifests under version control; rollout/rollback for zero-downtime updates

The distributed-system kernel

Kubernetes is best understood as a distributed-system kernel — it provides process scheduling (Pods on nodes), inter-process communication (Services + DNS), storage (PersistentVolumes), and resource isolation (namespaces, cgroups, RBAC) across a cluster of machines. The difference from a traditional OS kernel is that the "processes" are containers, the "machines" are nodes, and failure is treated as a normal condition rather than an exceptional one.

Check Your Understanding

1. What does Kubernetes provide that Docker alone does not? Give at least three concrete mechanisms.

Docker alone provides container packaging, running, and basic networking. Kubernetes adds: (1) automated scheduling — deciding which node runs which Pod based on resource availability and constraints; (2) self-healing — continuously replacing failed containers and rescheduling Pods from failed nodes to healthy ones; (3) declarative autoscaling — automatically adjusting the number of replicas based on CPU, memory, or custom metrics via the HPA; (4) service discovery and load balancing — providing stable IPs/DNS names for ephemeral Pods through Services; (5) rolling updates and rollbacks — deploying new versions with zero downtime and the ability to revert.

2. Explain the difference between the Spec and Status fields of a Kubernetes object. Why is this distinction fundamental?

The Spec is the user-declared desired state. The Status is the current observed state, continuously updated by Kubernetes components. The distinction is fundamental because it enables the reconciliation loop: Kubernetes controllers continuously compare Status to Spec and take actions to close the gap. This is what makes Kubernetes self-healing and declarative — the user does not write imperative scripts; they declare intent, and the system works to make it true. Without this split, Kubernetes would be a static deployment tool rather than a continuously operating control system.

3. Why does Kubernetes use label selectors to connect Services to Pods rather than IP addresses?

Pods are ephemeral — their IP addresses change every time they are recreated or rescheduled. Hard-coding IPs would break every time a Pod restarts. Label selectors provide an indirection layer: a Service targets "all Pods with label app=backend" rather than "Pod at 10.244.1.5." The Service controller dynamically updates the list of backend endpoints as Pods matching the selector are created and destroyed, ensuring the Service always routes to healthy, current Pods. This is the same pattern as DNS for physical hosts: a name stays constant while the underlying address changes.

4. Describe the step-by-step sequence from "user submits a Deployment manifest" to "Pods are running." Name every component involved.

1. User runs kubectl apply -f deployment.yaml, which sends the manifest to kube-apiserver. 2. The API server validates the request and writes the Deployment object to etcd. 3. The Deployment controller (part of kube-controller-manager) observes the new Deployment, creates a ReplicaSet object, and updates etcd. 4. The ReplicaSet controller observes the new ReplicaSet, creates Pod objects (with no assigned node), and updates etcd. 5. The kube-scheduler watches for unassigned Pods, selects a suitable worker node for each based on resource availability and constraints, and updates the Pod's node assignment in etcd. 6. The kubelet on the assigned node observes that a Pod has been scheduled to its node, instructs the container runtime (via CRI) to pull the image and start the container. 7. The kubelet reports the Pod's status back to the API server, which updates the Status field in etcd.

5. When is a StatefulSet the right choice over a Deployment? Explain with a concrete example.

A StatefulSet is the right choice when Pods need stable, unique identities that persist across rescheduling. Concretely: a three-node PostgreSQL cluster where each instance stores a different partition of data. With a Deployment, if a Pod dies, the replacement gets a new name and IP — but the data on the old Pod's volume must be reattached to the correct role (primary or replica). A StatefulSet assigns each Pod a stable ordinal name (postgres-0, postgres-1, postgres-2), stable network identities, and stable storage — when postgres-1 dies, its replacement is also named postgres-1 and gets the same PersistentVolume. Deployments treat all Pods as interchangeable; StatefulSets preserve identity.

6. A Deployment has replicas: 3. One worker node crashes entirely. Describe what Kubernetes does, step by step, and why.

1. The kubelet on the crashed node stops sending heartbeats to the API server. After a timeout (default ~40 seconds), the node controller marks the node as NotReady. 2. The Pods on that node are marked as Terminating (or Unknown). 3. After a further grace period, the node controller evicts the Pods. 4. The ReplicaSet controller observes that the actual number of healthy Pods (0, since they were evicted) is less than the desired number (3 from the Deployment spec). 5. The ReplicaSet controller creates three new Pod objects. 6. The scheduler assigns them to healthy nodes (assuming sufficient capacity). 7. The kubelet on the target nodes starts the new containers. The system converges back to 3 replicas. If a Service is fronting these Pods, kube-proxy updates the endpoint list to include the new Pods and remove the dead ones, maintaining continuous availability for the surviving 2/3 of capacity during the transition.

7. What is the role of the Container Runtime Interface (CRI), and why was it important for Kubernetes to define it?

The CRI is a plugin interface that decouples Kubernetes from any specific container runtime. Before CRI, Kubernetes was tightly coupled to Docker via an internal component called dockershim. The CRI allows any runtime that implements the interface — containerd, CRI-O, Mirantis Container Runtime — to be used interchangeably. This was important for three reasons: (1) it prevented vendor lock-in to Docker; (2) it allowed the community to develop purpose-built runtimes optimized for Kubernetes (like CRI-O); (3) when Docker deprecated dockershim in Kubernetes 1.20, the ecosystem could transition smoothly because all major runtimes already supported CRI.

8. What happens during a rolling update of a Deployment from nginx:1.14 to nginx:1.16? How does Kubernetes prevent downtime?

The Deployment controller creates a new ReplicaSet for the updated Pod template. It then gradually scales up the new ReplicaSet while scaling down the old one, respecting the maxSurge and maxUnavailable parameters. Typically: one new Pod is created; once it is healthy and ready, one old Pod is terminated; this repeats until all Pods are from the new ReplicaSet. The Service continues routing traffic to both old and new Pods during the transition, and because at least the desired number of Pods are always available (thanks to maxUnavailable=0 or 1), there is no downtime. If the new Pods fail health checks, the rollout can be automatically rolled back.

9. Why are Kompose-generated Kubernetes manifests not production-ready? List at least four missing concerns.

Kompose performs a mechanical syntax translation from Compose to Kubernetes. It typically misses: (1) liveness and readiness probes — without them, Kubernetes cannot detect hung vs. healthy containers; (2) resource requests and limits — without them, the scheduler cannot make informed placement decisions; (3) security contexts — Compose has no concept of PodSecurityContext, non-root users, or read-only root filesystems; (4) Pod disruption budgets — to protect availability during voluntary disruptions like node drains; (5) ConfigMaps and Secrets — Compose uses environment variables or files; Kubernetes has first-class objects for configuration and secrets management; (6) network policies — Compose networks are permissive by default; production Kubernetes should restrict inter-service communication.

10. Explain the three pillars of observability (metrics, logs, traces) and map each to the tools discussed in the lecture.

Metrics: quantitative data over time (CPU usage, request rate, error count). Collected by Prometheus, which scrapes endpoints and stores time-series data. Visualised in Grafana dashboards. Logs: structured or semi-structured event records emitted by applications and system components. The lecture references monitoring integration points; the standard Kubernetes logging stack includes Fluentd (collection) + Elasticsearch (storage) + Kibana (visualisation) — the EFK stack — though not covered in detail in this lecture. Traces: end-to-end request flows across services. While not covered in the lecture slides, tools like Jaeger or OpenTelemetry integrate with Kubernetes to provide distributed tracing, showing latency breakdowns per service hop.

11. (Oral-style) Prove or argue: "If a Kubernetes Service of type ClusterIP has a selector that matches zero Pods, the Service is useless." Is this always true?

Not always. If the selector matches zero Pods, the Service's endpoint list is empty — any request to the ClusterIP will be rejected (connection refused or timeout), so it cannot forward traffic. However, the Service still provides a stable DNS name and ClusterIP that can exist before Pods are ready. This is useful for bootstrapping: you can create all Services first, then deploy Pods later. Once Pods matching the selector appear, the Service starts routing to them automatically — no reconfiguration needed. So the Service is not useless; it is an inert name reservation waiting for backends. This is standard practice in GitOps workflows where manifests are applied in dependency order.

12. (Oral-style) Describe a scenario where the HPA's CPU threshold is 50% and the maxReplicas is set to 10, but the system still becomes unresponsive under extreme load. What could be limiting the system?

Several possibilities: (1) Downstream bottleneck — the backend may be waiting on Redis or a database that cannot scale, so adding more backend replicas does not improve throughput; (2) Node resource exhaustion — if all nodes in the cluster are at capacity and the cluster autoscaler cannot add new nodes fast enough (or at all), new Pods remain in Pending state; (3) HPA metric lag — the metrics server polls every 15-30 seconds, and the HPA has a stabilisation window, so there can be a delay before scaling kicks in; (4) connection pool saturation — each Pod may have a limited connection pool to an external resource, and adding Pods saturates the downstream faster; (5) CPU is not the bottleneck — the real constraint might be I/O, network bandwidth, or file descriptors, not CPU, and the HPA is watching the wrong metric.

Addressing production-ready distributed systems through Kubernetes

In this lesson

1. Motivations: Why Container Orchestration Matters

2. Premises: Containerization as the Foundation

3. What Kubernetes Is (and What It Is Not)

4. Kubernetes vs. Docker Swarm

5. Key Features: Immutability and Declarative Configuration

Immutability

Declarative Configuration

6. Key Features: Autoscaling, Self-Healing, and CRI

Autoscaling

Self-Healing

Container Runtime Interface (CRI)

7. Cluster Architecture: Control Plane and Worker Nodes

Control Plane Components

Worker Node Components

8. Objects: The Building Blocks of Kubernetes

Object Structure

Required Manifest Fields

9. Pods: The Smallest Deployable Unit

Shared Context Within a Pod

Two Usage Patterns

10. ReplicaSets, Deployments, and Workload Controllers

ReplicaSet

Deployment

Other Workload Controllers

11. Services: Stable Endpoints for Ephemeral Pods

12. The Pod Lifecycle

13. Practical Example: From Docker Compose to Kubernetes

Phase 1: Development with Docker Compose

Phase 2: Migration to Kubernetes

14. Horizontal Pod Autoscaling and Stress Testing

Deployment and Scaling Stepper

Stress Testing Workflow

15. System Monitoring: Observability in Production

Monitoring Stack

16. Kompose and Migration Tooling

17. Putting It All Together: A Production-Ready Deployment

Check Your Understanding