Cilium Code Walkthrough

Cilium Codebase Walkthrough

Cilium is an eBPF-based networking, observability, and security platform for Kubernetes. The codebase is a monorepo building multiple binaries (agent, operator, CLI, etc.) via the Makefile, which orchestrates subdirectories like daemon, operator, and bpf. It leverages Hive (a custom DI/lifecycle framework), cilium/ebpf, and statedb for state management. The design emphasizes modularity (cells), kernel/userspace sharing via eBPF maps, and Kubernetes reconciliation.

1. Where Execution Starts

Cilium has 10+ entry points for distinct binaries/services, listed in the prompt. Each main() follows a pattern: flag parsing, logging setup, config validation, Hive lifecycle boot, and run loop. No single monolith—daemon is the core agent (runs on every node), operator manages cluster-wide state, hubble handles observability.

Primary Entry Points

BinaryPurposeEntry FunctionStartup Flow
Daemon (Agent)Node-local eBPF datapath, endpoints, policiesdaemon/main.go:mainFlags → NewDaemonCommand() (Cobra CLI) → Hive cells → Run() loop with signal handling
OperatorK8s controller for identities, CNI, CRDsoperator/main.go:mainFlags → NewOperatorCommand() → Hive → K8s client init → Reconcile loop
Hubble (Observability)Flows/metrics exporthubble/main.go:mainFlags → GRPC server → Relay connection
Hubble RelayAggregates Hubble flowshubble-relay/main.go:mainFlags → Metrics/GRPC servers → Daemon connections
CLIcilium toolcilium-cli/cmd/cilium/main.go:mainCobra CLI tree → Commands like status, install

Daemon Startup Deep Dive (core path):

  1. main() in daemon/main.go parses CLI flags/env (e.g., --config, --debug).
  2. Initializes logging via lumberjack (go.mod deps).
  3. Calls cmd.NewDaemonCommand() (Cobra root) → executes subcommands or defaults to agent mode.
  4. Boots Hive in pkg/daemon/hive/main.go (or equiv.): hive.New(...) registers cells (e.g., datapath, policy), then h.Run(ctx) starts lifecycle (Init → Start → Stop hooks).
  5. Enters signal loop (SIGTERM → graceful shutdown via context cancel).
flowchart TD
    A[main()] --> B[Parse flags/env<br>Validate config]
    B --> C[Init logging/metrics]
    C --> D[hive.New(hive.Group{...})<br>Register cells]
    D --> E[h.Run(ctx)]
    E --> F[Cells: Init() → Start()]
    F --> G[Main loop:<br>Watch K8s, process events]
    G --> H[SIGTERM → h.Shutdown()]
    style A fill:#f9f

Trade-off: Multi-binary reduces attack surface but increases deployment complexity (Helm charts handle this).

2. Core Abstractions

Cilium’s internals revolve around modularity via Hive cells, eBPF datapath, and state machines (endpoints, identities). Key design: Userspace orchestrates kernel via eBPF maps (shared memory—no syscalls for hot paths).

Key Types/Interfaces

  • Hive Cell: Modular unit with lifecycle.Lifecycle, params, deps. E.g., DatapathCell provides *Datapath.
  • Endpoint: Stateful pod representation (pkg/endpoint.Endpoint). Tracks identity, policy, status.
  • Identity: Labels → numeric ID (pkg/identity.Identity). Cached in BPF maps.
  • PolicyRepository: Enforces L3-L7 rules (pkg/policy.api).
  • BPF Maps: Userspace wrappers around ebpf.Map (e.g., sockops, neighbors).
  • StateDB: Append-only DB for identities/endpoints (pkg/statedb).
classDiagram
    class Hive {
        +Run(ctx context.Context)
        +Shutdown()
        +Group(map[string]Cell)
    }
    class Cell {
        +Params() Params
        +Init(lf Lifecycle) (interface{}, error)
        +Start(lf Lifecycle) error
        +Deps() []Ref
    }
    class Datapath {
        +Loader
        +Maps (ebpf.Map)
        +Regenerate()
    }
    class Endpoint {
        +ID uint16
        +Identity *identity.Identity
        +Policy *PolicyComputationResult
        +Regenerate()
    }
    class Identity {
        +ID NumericIdentity
        +Labels Labels
    }
    Hive ||--o| Cell : manages
    Cell ..> Datapath : provides
    Datapath ||--o| Endpoint : owns
    Endpoint --> Identity : has
    Endpoint --> PolicyRepository : queries

Clever Patterns:

  • Hive DI: Cells declare deps (e.g., hive.Ref[*k8s.Client]), auto-wired. Trade-off: Steep learning but enforces isolation (vs. global state).
  • Regeneration: Idempotent endpoint rebuilds on config change (e.g., policy update → endpoint.Regenerate() pins new BPF progs).
  • Map-in-Map: Nested BPF maps for scalability (e.g., per-endpoint policy maps).

3. Request/Operation Lifecycle

Example: Endpoint Creation (typical op: K8s Pod → Cilium Endpoint). Traces operator → daemon → eBPF.

  1. Operator watches Pods/NodeLabels → allocates CiliumEndpoint/EndpointSlice CRD → assigns Identity via pkg/identity/cache.go:AllocateIdentity (operator/watchers/endpoints.go).
  2. Daemon cell EndpointManager watches EndpointSlices via informerspkg/endpointmanager/manager.go:UpsertEndpoint creates Endpoint.
  3. Endpoint Apply(): Computes policy (pkg/policy/repopolicy.go:ComputePolicy()) → Pins BPF progs/maps (pkg/datapath/loader/loader.go:LoadDatapath).
  4. Hot path: Packet hits TC/XDP eBPF → Lookup maps (identity, policy) → Tail call to L7 proxy.
sequenceDiagram
    participant K8s as Kubernetes API
    participant Op as Operator
    participant Daemon as Daemon (Hive)
    participant EP as Endpoint
    participant BPF as eBPF Kernel
    K8s->>Op: Pod Created
    Op->>Op: AllocateIdentity()
    Op->>K8s: Create EndpointSlice CRD
    Daemon->>K8s: Watch EndpointSlice
    Daemon->>EP: manager.UpsertEndpoint()
    EP->>EP: ComputePolicy()
    EP->>BPF: Load progs/maps (TailCall)
    Note over Daemon,BPF: Packet Flow: TC/XDP → BPF Lookup → Allow/Deny

Key Functions:

Trade-off: Eventual consistency (watches can lag), mitigated by statedb snapshots.

4. Reading Order

Prioritize build → daemon → core pkgs. 1-2 hours per step for experienced Go/K8s devs.

  1. Build System (15min): Makefile (targets like build-container), go.mod (deps: ebpf, hive).
  2. Entrypoints (30min): daemon/main.go, operator/main.go—see Cobra + Hive boot.
  3. Hive Framework (1hr): pkg/hivecell.go, lifecycle.go. Understand cells/deps.
  4. Daemon Core (2hr): pkg/daemon/daemon.go, pkg/daemon/hive—startup orchestration.
  5. Datapath/eBPF (3hr): pkg/datapathdatapath_linux.go, maps, loaders. BPF in bpf/.
  6. Endpoints/Policy (2hr): pkg/endpoint, pkg/policy—regeneration flow.
  7. Operator (1hr): operator/main.go, operator/watchers.
  8. Advanced: Hubble (pkg/hubble), statedb (pkg/statedb).

Pro Tip: make -C bpf for eBPF, cilium bpf lb list in-cluster for runtime inspection.

5. Common Patterns

  • Hive Cells Everywhere: 100+ cells (e.g., k8sClientCell, endpointManagerCell). Declare deps with hive.Provide, lifecycle hooks for cleanup. Idiom: cell.MustInvoke(func(lf lifecycle.Lifecycle){...}).
  • Context Propagation: All loops use context.Context + cancel for shutdown. Goroutines via workerpool (go.mod).
  • BPF Maps as State: pkg/maps wrappers (e.g., maps.OpenMostRecent). Pattern: Load → Update userspace mirror → Sync on change.
  • Regeneration Idempotency: endpoint.Regenerate() + status tracking prevents thrashing.
  • Config Layers: Flags > Env > ConfigFile > Defaults. Validation in option.Config.
  • Testing: pkg/testutils, fuzzing (go-fuzz), e2e in .github/actions/e2e.
  • Trade-offs: Heavy eBPF reliance (kernel compat issues) → runtime detection/fallbacks. Scalability via sharding (e.g., per-CPU maps).

This covers ~80% of internals; dive into bpf/ for kernel magic. Contribute via PRs—workflows auto-label (.github/workflows/auto-labeler.yaml).