Cilium Architecture

Cilium Architecture

Cilium is an eBPF-powered platform for Kubernetes networking, security, load balancing, and observability. The agent runs as a DaemonSet on each node, programming the kernel’s eBPF datapath directly for high-performance packet processing. A dedicated operator reconciles cluster-wide state, while modular components handle policies, identities, and multi-cluster federation.

High-Level Architecture

graph TD
  K8s[Kubernetes API Server]
  CRDs[Custom Resources<br/>(CNP, CEP, CIDR, etc.)]
  Op[Cilium Operator<br/>pkg/identity, pkg/cidr]
  Agent[Cilium Agent<br/>daemon/main.go<br/>pkg/hive]
  Datapath[eBPF Datapath<br/>bpf/, pkg/datapath]
  Proxy[Envoy Proxy<br/>L7 Policies<br/>pkg/proxy]
  Endpoints[Endpoints/Pods]
  KV[KVStore<br/>(etcd/consul)]
  Hubble[Hubble Relay/UI<br/>hubble/]
  ClusterMesh[ClusterMesh<br/>pkg/clustermesh]

  K8s -->|Watch CRDs| Op
  K8s -->|Watch Endpoints/Svcs| Agent
  Op -->|Allocate IDs, CIDRs| KV
  Op -.->|Reconcile| Agent
  Agent -->|Load/Control| Datapath
  Agent -->|Access Maps| KV
  Agent -->|Configure| Proxy
  Endpoints <-->|Packets| Datapath
  Datapath <-->|Flows/Metrics| Hubble
  Datapath -->|L7 Redirect| Proxy
  Agent <-->|Federation| ClusterMesh
  ClusterMesh <-->|Remote KV| KV

This diagram illustrates the core flow: Kubernetes resources drive the operator and agent, which program the eBPF datapath. Observability feeds into Hubble, and proxies handle L7.

Component Breakdown

Cilium Agent

Responsibility: Node-local controller that manages endpoints, loads eBPF programs/maps, enforces policies, handles service load-balancing (replacing kube-proxy), and coordinates with operator via KVStore. Uses a cell-based architecture via pkg/hive for modular lifecycle (start/stop/dependencies).

Key Files/Directories:

Interfaces: Watches K8s resources (informers), reads/writes to KVStore (pkg/kvstore), controls eBPF via github.com/cilium/ebpf (pinned maps in BPF FS), redirects to Envoy.

eBPF Datapath

Responsibility: Kernel-space packet processing for L3-L4 forwarding, NAT, encryption (WireGuard/IPsec), identity-aware filtering. Bypasses iptables/conntrack for scalability using hash maps for policies/services.

Key Files/Directories:

  • bpf/: eBPF C programs (e.g., l3.c for IPv4/IPv6, cgroup.c for sock ops).
  • pkg/datapath/linux: Go loader for BPF objects, map population.
  • bpf/bpftool/: Custom helpers for map gen.

Interfaces: Agent populates maps (e.g., cilium_ipcache, cilium_policy) via bpf.Map.Update(). Hooks into TC (qdisc), XDP (NIC), cgroups (egress), tracepoints (syscalls). Exports metrics to agent via perf rings.

Cilium Operator

Responsibility: Cluster-wide reconciliation for identities, CIDR allocation, background policy compilation, and CRD status updates. Scales horizontally, stateless.

Key Files/Directories:

Interfaces: Uses client-go informers on CRDs like CiliumNetworkPolicy. Publishes to KVStore for agent consumption. Leader election via leases.

Envoy Proxy (L7 Integration)

Responsibility: Handles L7 policy enforcement (HTTP/gRPC/Kafka), transparent proxying via eBPF redirects.

Key Files/Directories:

  • pkg/proxy: Manages Envoy instances, generates configs.
  • Depends on [github.com/cilium/proxy](go.mod dep): Envoy control plane integration.

Interfaces: eBPF redirects sockets to Envoy listeners. Agent configures via xDS (gRPC) using go-control-plane.

Hubble (Observability)

Responsibility: Collects eBPF flowlogs, metrics, and layers 7 visibility. Relay aggregates and serves to UI/CLI.

Key Files/Directories:

Interfaces: Agents push samples via gRPC to Relay. Uses statedb for efficient querying.

KVStore & ClusterMesh

Responsibility: Shared state (identities, services) across agents/nodes/clusters. ClusterMesh enables global policies.

Key Files/Directories:

Interfaces: etcd watches trigger agent regenerations. Optimistic concurrency via revs.

Data Flow

Typical pod-to-pod policy-enforced packet flow (sequence diagram):

sequenceDiagram
  participant PodA as Pod A (TX)
  participant TC as TC Ingress (Kernel)
  participant BPF as eBPF Prog<br/>bpf/l3.c
  participant Maps as BPF Maps<br/>(ipcache, policy)
  participant Agent as Cilium Agent
  participant PodB as Pod B (RX)

  PodA->>TC: send packet (dst IP)
  TC->>BPF: tail call entrypoint
  BPF->>Maps: lookup src/dst identity (256-bit)
  alt No identity
    Maps->>Agent: async notify (ipcache miss)
    Agent->>Maps: populate from KVStore/CRDs
  end
  Maps->>BPF: policy verdict (allow/deny/log)
  alt Denied
    BPF->>BPF: drop/redirect
  else Allowed
    BPF->>BPF: decap/NAT/forward
    BPF->>PodB: deliver
    Note over BPF: Perf ring sample -> Agent -> Hubble
  end

On identity/policy changes: K8s CNP -> Operator -> KVStore -> Agent watch -> endpoint regen -> map updates -> BPF sees via helper calls.

Key Design Decisions

  • eBPF Datapath (Kernel User-Space Split): Agent in Go userspace programs kernel eBPF bytecode/maps for zero-copy, high-scale processing (millions of policies). Trade-off: Kernel version lock-in (4.9+), but enables kube-proxy replacement with O(1) lookups vs iptables chains. Clever: Generic helpers (call_policy()) allow hot-loading without full reloads.

  • Hive Cell Architecture (pkg/hive): Modular agent as composable “cells” with dependencies (DAG lifecycle). E.g., EndpointManager cell depends on IdentityAllocator. Pattern: Dependency injection + lifecycle hooks. Trade-off: Abstraction overhead vs. hot-reloadability (restart cells independently).

  • Identity-Based Security: Policies on labels/identities (not IPs), allocated cluster-wide by operator. Clever: Revocable, multi-cluster friendly. Trade-off: Requires KVStore coordination (etcd latency).

  • StateDB (github.com/cilium/statedb): Embedded MVCC store for efficient diffs/queries (e.g., Hubble). Pattern: Event-sourced state. Avoids full K8s API thundering.

  • Monolith-per-Node with Microservices: Agent is monolithic for perf but cell-modular; operator scales out. No central DB—KVStore for loose coupling. Trade-off: Strong consistency via watches vs. eventual (performance).

  • Multi-Cluster (ClusterMesh): KVStore federation without VPNs. Notable: Global service discovery/policies via identity sharing.

This design prioritizes performance/scalability (eBPF), Kubernetes-native (CRDs/informers), and extensibility (maps as APIs). For deep dives, trace hive.Lifecycle in agent startup or BPF map pins in /sys/fs/bpf/tc/globals.