Cilium Agent Deep-Dive
1. What This Component Does
The Cilium Agent (also referred to as the “Daemon”) is the per-node userspace process that orchestrates all Cilium functionality on a host. It runs as a long-lived daemon on every node in the cluster (e.g., Kubernetes worker nodes) and is responsible for:
- Programming eBPF programs into the kernel for L3/L4/L7 networking, security policies, load balancing, and observability.
- Managing cluster state: Allocating security identities, enforcing NetworkPolicies, handling service translations, and monitoring endpoint lifecycle (pod/container add/remove).
- Interfacing with orchestrators: Integrates with Kubernetes via API server watches, CRDs, and CNI plugins.
- Exposing APIs and metrics: Runs gRPC/HTTP servers for health checks, Hubble (observability), and operator coordination.
When/why use it? Deploy it on every node for eBPF-powered CNI networking in Kubernetes (or Nomad/others). It’s essential for replacing kube-proxy, enabling encryption (WireGuard/TLS), and scaling to 10k+ pods/node without performance cliffs. Without it, Cilium falls back to iptables (inefficient).
2. How It Works
The Cilium Agent follows a startup → initialization → watch-loop model:
- Parse config/flags and validate environment (eBPF FS mount, kernel headers).
- Instantiate Daemon struct (
NewDaemon), which wires up 20+ subcomponents (datapath, identity allocator, policy repo, endpoint manager). - Initialize core subsystems serially: Load eBPF templates, create maps, attach XDP/TC programs.
- Start background controllers/watchers in goroutines: K8s watches for pods/services/policies, identity GC, status reporters.
- Enter main loop: Serve REST/gRPC APIs, handle signals (SIGHUP reload), health checks, and process queued events (e.g., endpoint regen).
- Packet flow (async): Userspace queues events (e.g., “sync endpoint”), controllers update eBPF maps atomically; kernel eBPF tail-calls for decisions—no syscalls.
Key algorithm: Event-driven with eventual consistency. Changes propagate via pub/sub (channels, notifiers) to eBPF maps. Identities are reference-counted and allocated lazily from a global allocator synced via KV store (etcd).
Internal Flow Diagram:
graph TD
A[cilium-agent Entry<br/>daemon/cmd/daemon.go] --> B[Parse Flags/Config<br/>daemon/options.ParseFlags()]
B --> C[NewDaemon<br/>daemon/daemon.go:NewDaemon()]
C --> D[InitDatapath<br/>pkg/datapath/datapath.go:InitDatapath()]
D --> E[Init IdentityAllocator<br/>pkg/identity/cache/local.go]
E --> F[Init PolicyRepository<br/>pkg/policy/repository.go]
F --> G[Init EndpointManager<br/>pkg/endpointmanager/manager.go]
G --> H[Start K8s Watchers<br/>pkg/k8s/watchers/]
H --> I[Start Controllers<br/>pkg/controller/manager.go:Run()]
I --> J[Main Serve Loop<br/>Daemon.Run():<br/>Health, API, Metrics Servers]
J --> K[Event Queue Loop<br/>Endpoint sync, Policy update<br/>→ eBPF Map Writes]
K -->|Kernel| L[eBPF Tailcalls:<br/>XDP/TC/LB/Policy/Encrypt]
L -.->|Metrics/Logs| J
style L fill:#ff9999
Step-by-step process:
- Bootstrap: Ensures
/sys/fs/bpfis mounted; loads verifier-approved eBPF objects. - State sync: Pulls identities/policies from KV store; watches K8s resources via informers.
- Endpoint lifecycle: On pod-add → allocate ID → derive L7 policy → regenerate BPF → pin to TC/XDP.
- Regeneration: Idempotent BPF reloads via
endpoint.Regenerate()lock-free where possible. - Trade-off: Userspace-centralized for simplicity, but scales via sharded maps (per-CPU) and batched updates. Clever: “bpfgen” templates allow runtime policy injection without full recompiles.
3. Key Code Paths
Main Files:
daemon/daemon.go- CoreDaemonstruct (1.5k LoC); orchestrates everything.daemon/cmd/daemon.go- CLI entrypoint and signal handling.pkg/datapath/datapath.go- eBPF loader and map management.pkg/endpointmanager/manager.go- Tracks endpoint state.pkg/controller/manager.go- Goroutine pool for background tasks.pkg/k8s/watchers- K8s resource informers.
Key Functions (with explanations):
daemon/daemon.go:NewDaemon- Dependency injection: Wires config → components (returns error if kernel incompatible).daemon/daemon.go:Run- Startup choreography: Init() → controllers.Start() → servers.Serve() → waitgroup.daemon/daemon.go:CompileBasePrograms- Loads core eBPF (e.g.,bpf_lxc.o) and pins maps; clever caching via loader.pkg/endpointmanager/manager.go:UpdateEndpointStatus- Pub/sub for endpoint changes → BPF sync.pkg/datapath/loader/loader.go:ReplaceBPFPrograms- Atomic map/program swaps (zero-downtime).pkg/policy/repository.go:PolicyCalculateIdentityIndependentResult- L7 policy derivation (regex/HTTP parsing).
Hot paths: Endpoint regen (~1-10ms) → endpoint.Regenerate() → map lookups/writes.
4. Configuration
Loaded via CLI flags (highest precedence), then /var/run/cilium/agent-flags.config, env vars, ConfigMap (K8s).
Key Options (from daemon/options/options.go):
--bpf-root=/sys/fs/bpf: eBPF FS path.--enable-k8s-without-kube-proxy=true: Replace kube-proxy.--identity-allocation-mode=kvstore: Sync mode (crd/kvstore).--enable-l7-proxy=true: Envoy for L7.--enable-encryption=wireguard: Node-to-node mesh encryption.- Env vars:
CILIUM_K8S_API_SERVER=...,CILIUM_ENABLE_IPV4=false.
Defaults: pkg/defaults/defaults.go. Reload via SIGHUP on daemon/cmd/daemon.go:handleSignals.
K8s-specific: Helm values → ConfigMap → downward API env vars.
5. Extension Points
Cilium is highly modular; extend via interfaces and hooks:
- Custom Controllers: Implement
controller.Manager.Run()inDaemon.RunK8sCiliumAgent; add goroutines for custom watches. - Interfaces:
IdentityAllocator(pkg/identity/allocator.go): Swap for custom ID scheme.PolicyRepository(pkg/policy/interface.go): Inject selectors/L7 parsers.Datapathloader hooks inpkg/datapath/datapath.go.
- eBPF Hooks: Extend templates in
bpf/, regenerate viamake bpf. - Notifiers:
pkg/agent/notifications/- Pub/sub for endpoint/policy changes. - Plugins: CNI chain (
pkg/datapath/cni/); Hubble modules. - Modify: Fork
NewDaemonto inject custom impls (e.g.,daemon.Daemon.IdentityAllocator = myAllocator). Rebuild withmage build-agent.
Trade-offs: Interfaces are narrow (policy-focused); deep changes need eBPF expertise. Test via cilium-dbg or unit tests in TestDaemon*.