eBPF Datapath

eBPF Datapath Deep-Dive

What This Component Does

The eBPF Datapath is Cilium’s kernel-level packet processing engine, leveraging eBPF programs and maps to enforce network security policies (L3/L4 identity-aware), perform service load balancing (kube-proxy replacement), handle encryption (WireGuard/IPsec), and enable observability (metrics, tracing) at line-rate performance.

It intercepts packets at kernel hooks like XDP (earliest possible) and TC (ingress/egress on veth/host interfaces) to minimize context switches and CPU overhead, outperforming iptables by 10-100x in benchmarks.

Use cases: Deploy in Kubernetes as CNI for pod-to-pod, host-to-pod, external traffic with zero-trust security. Enable when --enable-bpf-datapath=true (default in modern Cilium); fallback to iptables only if eBPF unsupported (rare on modern kernels >=4.9).

How It Works

The eBPF Datapath processes packets via a chain of eBPF programs loaded at kernel hooks, using tail calls for modularity (e.g., from entrypoint to policy/lb/CT subprograms). Core state is stored in eBPF maps (hash, LPM trie, LRU) shared across programs for identity resolution, connection tracking (CT), policies, and endpoints.

Internal Flow Diagram

flowchart TD
    A[Packet Ingress<br/>TC/XDP Hook<br/>e.g., bpf_host.c:handle_ipv4] --> B[Parse L3/L4<br/>Extract src/dst IP:port]
    B --> C[Lookup Prefix in LPM Map<br/>pkg/datapath/maps/maps.go:MapTypeIdentity<br/>Resolve Sec. Identities]
    C --> D[Conntrack Lookup<br/>maps: sock, CT4/6 global<br/>New? Create entry]
    D --> E[Policy Check<br/>maps: policy, caller<br/>Allow/Deny based on ID pair]
    E --> F{Allowed?}
    F -->|No| G[Drop/Redirect<br/>Metrics++ via perf event]
    F -->|Yes| H[Load Balancing?<br/>maps: lb4/6 services, backends<br/>Maglev/MMSC hashing]
    H -->|Yes| I[DNAT to backend<br/>Endpoint map lookup]
    H -->|No| J[Masquerade/SNAT if needed<br/>maps: nat]
    J --> K[Update CT state<br/>Tail call to egress/forward]
    K --> L[Tx to iface<br/>Encap if tunneling]
    
    style A fill:#f9f
    style G fill:#ff9
    style L fill:#9f9

Step-by-Step Process:

  1. Hook Attachment: Programs like cilium_tc attach to cls_bpf on interfaces (pkg/datapath/ifaces.go). XDP for RX drops.
  2. Parsing & Lookup: Entry functions (e.g., handle_ipv4() in bpf/bpf_host.c) parse skb, lookup source/destination identities in bpf_lpm_ipv4/6 maps (trie for CIDR->ID).
  3. Conntrack (CT): Check/update per-connection state in cilium_ct4_global maps (flags for reply/seen, timeouts via GC).
  4. Policy Enforcement: Query cilium_policy_<ID> or bpf_cilium_policy maps for L4 rules (allow/deny ports based on ID pairs). Clever: Proxy policy via from-to matrices.
  5. Load Balancing: If service traffic, hash (Maglev) selects backend from lb4_services/backends; DNAT + endpoint map for forwarding.
  6. Post-Processing: SNAT (masquerade), tunneling (VXLAN/Geneve), encryption (lookup WireGuard keys). Tail calls (bpf_tail_call) jump to helpers like do_nat_ipv4().
  7. Metrics/Tracing: Per-cpu arrays for drops/accepts; perf events ringbuf for Hubble observability.
  8. GC & Updates: Userspace (Cilium agent) regenerates maps via bpf_map_update/delete_elem on endpoint/policy changes.

Design Patterns & Trade-offs:

  • Tail Calls: Modular (up to 33 deep), verifier-friendly; trade-off: slight perf hit vs. monolith.
  • Maps as IPC: Kernel-userspace sync via BPF_PROG_TYPE_SCHED_[ACT|CLS]; dynamic sizing avoids OOM.
  • Verifier Constraints: Programs <1MB, bounded loops; trade-off: no recursion, manual state machines.
  • eBPFLess Fallback: Hybrid mode pins legacy maps for iptables compat (pkg/datapath/ebpfless).

Key Code Paths

Core BPF Programs (C sources, compiled to .o)

Userspace Loaders & Maps

Compilation

  • bpf/ Makefile compiles C to BPF via clang/LLVM; DAG gen for sockmaps.

Configuration

Controlled via Cilium agent flags (env vars prefixed CILIUM_) and ConfigMap:

Flag/EnvTypeDefaultEffect
--enable-bpf-datapathbooltrueEnables full eBPF mode (disables iptables).
--bpf-root=/var/lib/cilium/bpfstring/var/lib/cilium/bpfDirectory for pinned .o/maps.
--bpf-map-dynamic-size-ratio=0.002float0.002Auto-grow maps based on usage.
--bpf-lb-map-max=65536int65536Sizes lb4_services_v2 etc.; tune for scale.
--enable-bpf-masqueradebooltrueSNAT for NodePort/extern.
--bpf-policy-map-max=16384int16384Per-endpoint policy rules.
--enable-l7-proxyboolfalseEnvoy sidecar for L7 (uses sockmap).

Maps resized dynamically if dynamic-size-ratio >0. See pkg/option/config.go.

Extension Points

  1. New BPF Program:

    • Add .c in bpf/ (e.g., copy bpf_helpers.h), define maps with SEC("maps").
    • Hook via tail call maps (jt_tail_call array).
    • Compile: make bpf in bpf/; loader auto-pins.
  2. Custom Maps:

  3. Userspace Hooks:

    • Implement EndpointUpdater interface in pkg/datapath/types.go.
    • Regenerate on events: d.UpdateIdentity() calls BPF map updates.
  4. Verifier Bypass:

    • Use helpers like bpf_redirect_map(); test with tc exec bpf dbg.
    • For complex logic: Split into tail-called subprogs (<512 instr each).
  5. Plugins: XDP for custom drops (bpf/xdp.c); Hubble for tracing ringbufs.

To hack: make -C bpf && cilium-agent --bpf-root ./bpf/ in dev; inspect maps with cilium bpf map list/dump.