eBPF Datapath Deep-Dive
What This Component Does
The eBPF Datapath is Cilium’s kernel-level packet processing engine, leveraging eBPF programs and maps to enforce network security policies (L3/L4 identity-aware), perform service load balancing (kube-proxy replacement), handle encryption (WireGuard/IPsec), and enable observability (metrics, tracing) at line-rate performance.
It intercepts packets at kernel hooks like XDP (earliest possible) and TC (ingress/egress on veth/host interfaces) to minimize context switches and CPU overhead, outperforming iptables by 10-100x in benchmarks.
Use cases: Deploy in Kubernetes as CNI for pod-to-pod, host-to-pod, external traffic with zero-trust security. Enable when --enable-bpf-datapath=true (default in modern Cilium); fallback to iptables only if eBPF unsupported (rare on modern kernels >=4.9).
How It Works
The eBPF Datapath processes packets via a chain of eBPF programs loaded at kernel hooks, using tail calls for modularity (e.g., from entrypoint to policy/lb/CT subprograms). Core state is stored in eBPF maps (hash, LPM trie, LRU) shared across programs for identity resolution, connection tracking (CT), policies, and endpoints.
Internal Flow Diagram
flowchart TD
A[Packet Ingress<br/>TC/XDP Hook<br/>e.g., bpf_host.c:handle_ipv4] --> B[Parse L3/L4<br/>Extract src/dst IP:port]
B --> C[Lookup Prefix in LPM Map<br/>pkg/datapath/maps/maps.go:MapTypeIdentity<br/>Resolve Sec. Identities]
C --> D[Conntrack Lookup<br/>maps: sock, CT4/6 global<br/>New? Create entry]
D --> E[Policy Check<br/>maps: policy, caller<br/>Allow/Deny based on ID pair]
E --> F{Allowed?}
F -->|No| G[Drop/Redirect<br/>Metrics++ via perf event]
F -->|Yes| H[Load Balancing?<br/>maps: lb4/6 services, backends<br/>Maglev/MMSC hashing]
H -->|Yes| I[DNAT to backend<br/>Endpoint map lookup]
H -->|No| J[Masquerade/SNAT if needed<br/>maps: nat]
J --> K[Update CT state<br/>Tail call to egress/forward]
K --> L[Tx to iface<br/>Encap if tunneling]
style A fill:#f9f
style G fill:#ff9
style L fill:#9f9
Step-by-Step Process:
- Hook Attachment: Programs like
cilium_tcattach tocls_bpfon interfaces (pkg/datapath/ifaces.go). XDP for RX drops. - Parsing & Lookup: Entry functions (e.g.,
handle_ipv4()inbpf/bpf_host.c) parse skb, lookup source/destination identities inbpf_lpm_ipv4/6maps (trie for CIDR->ID). - Conntrack (CT): Check/update per-connection state in
cilium_ct4_globalmaps (flags for reply/seen, timeouts via GC). - Policy Enforcement: Query
cilium_policy_<ID>orbpf_cilium_policymaps for L4 rules (allow/deny ports based on ID pairs). Clever: Proxy policy viafrom-tomatrices. - Load Balancing: If service traffic, hash (Maglev) selects backend from
lb4_services/backends; DNAT + endpoint map for forwarding. - Post-Processing: SNAT (masquerade), tunneling (VXLAN/Geneve), encryption (lookup WireGuard keys). Tail calls (
bpf_tail_call) jump to helpers likedo_nat_ipv4(). - Metrics/Tracing: Per-cpu arrays for drops/accepts; perf events ringbuf for Hubble observability.
- GC & Updates: Userspace (Cilium agent) regenerates maps via
bpf_map_update/delete_elemon endpoint/policy changes.
Design Patterns & Trade-offs:
- Tail Calls: Modular (up to 33 deep), verifier-friendly; trade-off: slight perf hit vs. monolith.
- Maps as IPC: Kernel-userspace sync via BPF_PROG_TYPE_SCHED_[ACT|CLS]; dynamic sizing avoids OOM.
- Verifier Constraints: Programs <1MB, bounded loops; trade-off: no recursion, manual state machines.
- eBPFLess Fallback: Hybrid mode pins legacy maps for iptables compat (
pkg/datapath/ebpfless).
Key Code Paths
Core BPF Programs (C sources, compiled to .o)
bpf/bpf_host.c- Host namespace ingress/egress (entry:handle_ipv4()~L150).bpf/bpf_lxc.c- Pod veth processing (policy/CT core ~L200).bpf/nodeport.c- External LB (DDS revDNAT ~L300).bpf/encap.c- VXLAN/Geneve tunneling helpers.
Userspace Loaders & Maps
pkg/datapath/maps/maps.go- Defines all maps (e.g.,MapTypeSockRevNat4); auto-generated from YAML.pkg/datapath/loader/programs.go-loadPrograms()pins .o files, attaches to ifaces (~L200).pkg/datapath/datapath_linux.go-InitDatapath()creates maps, loaders (~L100).pkg/datapath/bpf.go-Update()regenerates on config change.
Compilation
bpf/Makefile compiles C to BPF via clang/LLVM; DAG gen for sockmaps.
Configuration
Controlled via Cilium agent flags (env vars prefixed CILIUM_) and ConfigMap:
| Flag/Env | Type | Default | Effect |
|---|---|---|---|
--enable-bpf-datapath | bool | true | Enables full eBPF mode (disables iptables). |
--bpf-root=/var/lib/cilium/bpf | string | /var/lib/cilium/bpf | Directory for pinned .o/maps. |
--bpf-map-dynamic-size-ratio=0.002 | float | 0.002 | Auto-grow maps based on usage. |
--bpf-lb-map-max=65536 | int | 65536 | Sizes lb4_services_v2 etc.; tune for scale. |
--enable-bpf-masquerade | bool | true | SNAT for NodePort/extern. |
--bpf-policy-map-max=16384 | int | 16384 | Per-endpoint policy rules. |
--enable-l7-proxy | bool | false | Envoy sidecar for L7 (uses sockmap). |
Maps resized dynamically if dynamic-size-ratio >0. See pkg/option/config.go.
Extension Points
-
New BPF Program:
- Add
.cinbpf/(e.g., copybpf_helpers.h), define maps withSEC("maps"). - Hook via tail call maps (
jt_tail_callarray). - Compile:
make bpfinbpf/; loader auto-pins.
- Add
-
Custom Maps:
- Extend
pkg/datapath/maps/maps.yaml. - Regenerate:
go generate ./pkg/datapath/maps. - Access in C:
bpf_map_lookup_elem(&map, key, val).
- Extend
-
Userspace Hooks:
- Implement
EndpointUpdaterinterface inpkg/datapath/types.go. - Regenerate on events:
d.UpdateIdentity()calls BPF map updates.
- Implement
-
Verifier Bypass:
- Use helpers like
bpf_redirect_map(); test withtc exec bpf dbg. - For complex logic: Split into tail-called subprogs (<512 instr each).
- Use helpers like
-
Plugins: XDP for custom drops (
bpf/xdp.c); Hubble for tracing ringbufs.
To hack: make -C bpf && cilium-agent --bpf-root ./bpf/ in dev; inspect maps with cilium bpf map list/dump.