Cilium Codebase Walkthrough
Cilium is an eBPF-based networking, observability, and security platform for Kubernetes. The codebase is a monorepo building multiple binaries (agent, operator, CLI, etc.) via the Makefile, which orchestrates subdirectories like daemon, operator, and bpf. It leverages Hive (a custom DI/lifecycle framework), cilium/ebpf, and statedb for state management. The design emphasizes modularity (cells), kernel/userspace sharing via eBPF maps, and Kubernetes reconciliation.
1. Where Execution Starts
Cilium has 10+ entry points for distinct binaries/services, listed in the prompt. Each main() follows a pattern: flag parsing, logging setup, config validation, Hive lifecycle boot, and run loop. No single monolith—daemon is the core agent (runs on every node), operator manages cluster-wide state, hubble handles observability.
Primary Entry Points
| Binary | Purpose | Entry Function | Startup Flow |
|---|---|---|---|
| Daemon (Agent) | Node-local eBPF datapath, endpoints, policies | daemon/main.go:main | Flags → NewDaemonCommand() (Cobra CLI) → Hive cells → Run() loop with signal handling |
| Operator | K8s controller for identities, CNI, CRDs | operator/main.go:main | Flags → NewOperatorCommand() → Hive → K8s client init → Reconcile loop |
| Hubble (Observability) | Flows/metrics export | hubble/main.go:main | Flags → GRPC server → Relay connection |
| Hubble Relay | Aggregates Hubble flows | hubble-relay/main.go:main | Flags → Metrics/GRPC servers → Daemon connections |
| CLI | cilium tool | cilium-cli/cmd/cilium/main.go:main | Cobra CLI tree → Commands like status, install |
Daemon Startup Deep Dive (core path):
main()indaemon/main.goparses CLI flags/env (e.g.,--config,--debug).- Initializes logging via
lumberjack(go.mod deps). - Calls
cmd.NewDaemonCommand()(Cobra root) → executes subcommands or defaults to agent mode. - Boots Hive in
pkg/daemon/hive/main.go(or equiv.):hive.New(...)registers cells (e.g., datapath, policy), thenh.Run(ctx)starts lifecycle (Init → Start → Stop hooks). - Enters signal loop (
SIGTERM→ graceful shutdown via context cancel).
flowchart TD
A[main()] --> B[Parse flags/env<br>Validate config]
B --> C[Init logging/metrics]
C --> D[hive.New(hive.Group{...})<br>Register cells]
D --> E[h.Run(ctx)]
E --> F[Cells: Init() → Start()]
F --> G[Main loop:<br>Watch K8s, process events]
G --> H[SIGTERM → h.Shutdown()]
style A fill:#f9f
Trade-off: Multi-binary reduces attack surface but increases deployment complexity (Helm charts handle this).
2. Core Abstractions
Cilium’s internals revolve around modularity via Hive cells, eBPF datapath, and state machines (endpoints, identities). Key design: Userspace orchestrates kernel via eBPF maps (shared memory—no syscalls for hot paths).
Key Types/Interfaces
- Hive Cell: Modular unit with
lifecycle.Lifecycle, params, deps. E.g.,DatapathCellprovides*Datapath. - Endpoint: Stateful pod representation (
pkg/endpoint.Endpoint). Tracks identity, policy, status. - Identity: Labels → numeric ID (
pkg/identity.Identity). Cached in BPF maps. - PolicyRepository: Enforces L3-L7 rules (
pkg/policy.api). - BPF Maps: Userspace wrappers around
ebpf.Map(e.g.,sockops,neighbors). - StateDB: Append-only DB for identities/endpoints (
pkg/statedb).
classDiagram
class Hive {
+Run(ctx context.Context)
+Shutdown()
+Group(map[string]Cell)
}
class Cell {
+Params() Params
+Init(lf Lifecycle) (interface{}, error)
+Start(lf Lifecycle) error
+Deps() []Ref
}
class Datapath {
+Loader
+Maps (ebpf.Map)
+Regenerate()
}
class Endpoint {
+ID uint16
+Identity *identity.Identity
+Policy *PolicyComputationResult
+Regenerate()
}
class Identity {
+ID NumericIdentity
+Labels Labels
}
Hive ||--o| Cell : manages
Cell ..> Datapath : provides
Datapath ||--o| Endpoint : owns
Endpoint --> Identity : has
Endpoint --> PolicyRepository : queries
Clever Patterns:
- Hive DI: Cells declare deps (e.g.,
hive.Ref[*k8s.Client]), auto-wired. Trade-off: Steep learning but enforces isolation (vs. global state). - Regeneration: Idempotent endpoint rebuilds on config change (e.g., policy update →
endpoint.Regenerate()pins new BPF progs). - Map-in-Map: Nested BPF maps for scalability (e.g., per-endpoint policy maps).
3. Request/Operation Lifecycle
Example: Endpoint Creation (typical op: K8s Pod → Cilium Endpoint). Traces operator → daemon → eBPF.
- Operator watches Pods/NodeLabels → allocates
CiliumEndpoint/EndpointSliceCRD → assigns Identity viapkg/identity/cache.go:AllocateIdentity(operator/watchers/endpoints.go). - Daemon cell
EndpointManagerwatches EndpointSlices viainformers→pkg/endpointmanager/manager.go:UpsertEndpointcreatesEndpoint. - Endpoint
Apply(): Computes policy (pkg/policy/repopolicy.go:ComputePolicy()) → Pins BPF progs/maps (pkg/datapath/loader/loader.go:LoadDatapath). - Hot path: Packet hits TC/XDP eBPF → Lookup maps (identity, policy) → Tail call to L7 proxy.
sequenceDiagram
participant K8s as Kubernetes API
participant Op as Operator
participant Daemon as Daemon (Hive)
participant EP as Endpoint
participant BPF as eBPF Kernel
K8s->>Op: Pod Created
Op->>Op: AllocateIdentity()
Op->>K8s: Create EndpointSlice CRD
Daemon->>K8s: Watch EndpointSlice
Daemon->>EP: manager.UpsertEndpoint()
EP->>EP: ComputePolicy()
EP->>BPF: Load progs/maps (TailCall)
Note over Daemon,BPF: Packet Flow: TC/XDP → BPF Lookup → Allow/Deny
Key Functions:
operator/watchers/endpoints.go:upsertEndpoints: Reconciles CRD → Identity.pkg/endpoint/endpoint.go:NewEndpoint: Allocates, regenerates.pkg/datapath/datapath.go:Loader: BPF loading.
Trade-off: Eventual consistency (watches can lag), mitigated by statedb snapshots.
4. Reading Order
Prioritize build → daemon → core pkgs. 1-2 hours per step for experienced Go/K8s devs.
- Build System (15min):
Makefile(targets likebuild-container),go.mod(deps: ebpf, hive). - Entrypoints (30min):
daemon/main.go,operator/main.go—see Cobra + Hive boot. - Hive Framework (1hr):
pkg/hive—cell.go,lifecycle.go. Understand cells/deps. - Daemon Core (2hr):
pkg/daemon/daemon.go,pkg/daemon/hive—startup orchestration. - Datapath/eBPF (3hr):
pkg/datapath—datapath_linux.go, maps, loaders. BPF inbpf/. - Endpoints/Policy (2hr):
pkg/endpoint,pkg/policy—regeneration flow. - Operator (1hr):
operator/main.go,operator/watchers. - Advanced: Hubble (
pkg/hubble), statedb (pkg/statedb).
Pro Tip: make -C bpf for eBPF, cilium bpf lb list in-cluster for runtime inspection.
5. Common Patterns
- Hive Cells Everywhere: 100+ cells (e.g.,
k8sClientCell,endpointManagerCell). Declare deps withhive.Provide, lifecycle hooks for cleanup. Idiom:cell.MustInvoke(func(lf lifecycle.Lifecycle){...}). - Context Propagation: All loops use
context.Context+ cancel for shutdown. Goroutines viaworkerpool(go.mod). - BPF Maps as State:
pkg/mapswrappers (e.g.,maps.OpenMostRecent). Pattern: Load → Update userspace mirror → Sync on change. - Regeneration Idempotency:
endpoint.Regenerate()+ status tracking prevents thrashing. - Config Layers: Flags > Env > ConfigFile > Defaults. Validation in
option.Config. - Testing:
pkg/testutils, fuzzing (go-fuzz), e2e in.github/actions/e2e. - Trade-offs: Heavy eBPF reliance (kernel compat issues) → runtime detection/fallbacks. Scalability via sharding (e.g., per-CPU maps).
This covers ~80% of internals; dive into bpf/ for kernel magic. Contribute via PRs—workflows auto-label (.github/workflows/auto-labeler.yaml).