Cluster Manager

Cluster Manager Deep-Dive

What This Component Does

The Cluster Manager is Envoy’s upstream connection management system. It owns all upstream clusters (logical groups of backend hosts), their connection pools, load balancing policies, health checking, circuit breaking, and outlier detection. When the Router filter needs to forward a request to an upstream service, it asks the Cluster Manager for a thread-local cluster handle, which provides lock-free access to host sets and connection pools.

The Cluster Manager integrates with CDS (Cluster Discovery Service) and EDS (Endpoint Discovery Service) for dynamic cluster and endpoint management, enabling seamless scaling, rolling deployments, and multi-cluster topologies without proxy restarts.

Use cases: Every upstream request in Envoy flows through the Cluster Manager. It handles service discovery, load balancing algorithms (Round Robin, Least Request, Maglev, Ring Hash), active health checking, passive health monitoring (outlier detection), connection pooling (HTTP/1.1, HTTP/2, TCP), circuit breaking, and upstream TLS/mTLS origination.

How It Works

The Cluster Manager operates as a two-tier system: a global ClusterManagerImpl on the main thread that manages cluster lifecycle and configuration, and per-worker ThreadLocalClusterManagerImpl instances that provide lock-free access to host sets, load balancers, and connection pools.

Internal Flow Diagram

flowchart TD
    A[Router Filter<br/>Needs upstream connection] --> B[ClusterManager::getThreadLocalCluster<br/>Thread-local lookup by name]
    B --> C[ThreadLocalCluster<br/>Per-worker, lock-free]
    C --> D[LoadBalancer::chooseHost<br/>Select endpoint]
    D --> E{LB Algorithm}
    E -->|Round Robin| F[RoundRobinLB<br/>Rotate through healthy hosts]
    E -->|Least Request| G[LeastRequestLB<br/>Pick host with fewest active]
    E -->|Ring Hash| H[RingHashLB<br/>Consistent hashing by key]
    E -->|Maglev| I[MaglevLB<br/>Minimal disruption hashing]
    E -->|Random| J[RandomLB<br/>Uniform random selection]
    F & G & H & I & J --> K[Host Selected<br/>Check circuit breaker]
    K --> L{Circuit Breaker<br/>Open?}
    L -->|Yes| M[Return overflow error<br/>503 response]
    L -->|No| N[Connection Pool<br/>Get or create connection]
    N --> O{Pool Type}
    O -->|HTTP/1.1| P[Per-host connection<br/>Keep-alive or new]
    O -->|HTTP/2| Q[Multiplexed streams<br/>Over shared connections]
    O -->|TCP| R[Raw TCP connection]
    P & Q & R --> S[Upstream Request<br/>Forward headers + body]
    S --> T[Health Check / Outlier<br/>Update host status]

    style A fill:#f9f
    style S fill:#9f9
    style M fill:#ff9

Step-by-Step Process:

  1. Cluster Registration: At startup or via CDS, clusters are created by ClusterManagerImpl::addOrUpdateCluster() in source/common/upstream/cluster_manager_impl.cc. Each cluster creates its ClusterInfoImpl (config, stats, LB policy), host set, health checkers, and outlier detector.

  2. Thread-Local Distribution: The main thread posts cluster updates to all workers via TLS slots. Each worker maintains a ThreadLocalClusterManagerImpl with its own ThreadLocalCluster instances. This avoids any locking on the request path.

  3. Endpoint Discovery (EDS): EdsClusterImpl subscribes to EDS for dynamic endpoint updates (source/extensions/clusters/eds/eds.cc). Updates modify the PrioritySet (host sets grouped by priority/locality), triggering LB rebuild and TLS updates.

  4. Load Balancing: When the Router calls chooseHost(), the thread-local load balancer selects from healthy hosts in the PrioritySet. Priority-based failover: if priority 0 hosts are unhealthy, fall to priority 1. Locality-weighted routing distributes within priorities based on locality weights.

  5. Connection Pooling: ConnPoolImpl maintains per-host, per-protocol connection pools. HTTP/1.1 pools manage connection-per-request (or keep-alive reuse). HTTP/2 pools multiplex streams over a configurable number of connections. Connections are created lazily and drained on host health changes.

  6. Health Checking: Active health checkers (source/extensions/health_checkers/) periodically probe endpoints with HTTP, TCP, or gRPC checks. Unhealthy hosts are removed from the LB’s host set. Health check events propagate via HostSetCallbacks.

  7. Outlier Detection: Passive monitoring in source/common/upstream/outlier_detection_impl.cc tracks consecutive 5xx errors, gateway errors, success rate, and latency. Hosts exceeding thresholds are ejected for a configurable period, progressively increasing on repeat ejections.

  8. Circuit Breaking: Per-cluster limits on connections, pending requests, retries, and requests (source/common/upstream/resource_manager_impl.cc). When limits are hit, requests receive immediate 503s, preventing cascade failures.

Design Patterns & Trade-offs:

  • Thread-local cluster data: Lock-free hot path at the cost of memory duplication (each worker has its own copy of host sets and LB state). Updates are asynchronous (millisecond-level lag).
  • Priority-based failover: Allows zone-aware routing with graceful degradation. Trade-off: more complex LB configuration and potential uneven distribution during partial failures.
  • Connection pool per-host: Isolates failure domains but increases file descriptor usage. HTTP/2 multiplexing mitigates this significantly.
  • Separate health check vs. outlier detection: Active checks probe health independently of traffic; outlier detection reacts to real traffic patterns. Together they provide comprehensive failure detection but add configuration complexity.

Key Code Paths

Cluster Manager Core

Cluster Info & Upstream

Load Balancers

Health Checking

Outlier Detection & Circuit Breaking

EDS & Cluster Types

Configuration

Clusters are configured via envoy.config.cluster.v3.Cluster protobuf (bootstrap static_resources.clusters or CDS):

FieldTypeDefaultEffect
namestringrequiredCluster name referenced by routes
typeenumSTATICDiscovery type: STATIC, STRICT_DNS, LOGICAL_DNS, EDS, ORIGINAL_DST
lb_policyenumROUND_ROBINLB algorithm: ROUND_ROBIN, LEAST_REQUEST, RING_HASH, RANDOM, MAGLEV
load_assignmentClusterLoadAssignment-Static endpoint list (for STATIC / EDS type)
eds_cluster_configEdsClusterConfig-EDS service config (API source)
health_checksrepeated HealthCheck-Active health check configurations
outlier_detectionOutlierDetection-Passive health monitoring settings
circuit_breakersCircuitBreakerssee belowPer-priority resource limits
connect_timeoutDuration5sTCP connection timeout to upstream
per_connection_buffer_limit_bytesuint321MBBuffer limit per connection
http2_protocol_optionsHttp2ProtocolOptions-HTTP/2 settings (max concurrent streams, etc.)
upstream_connection_optionsUpstreamConnectionOptions-TCP keep-alive, SO options
transport_socketTransportSocket-TLS/mTLS config for upstream connections
common_lb_configCommonLbConfig-Zone-aware routing, healthy panic threshold

Circuit Breaker Defaults (per-priority):

ResourceDefault LimitEffect
max_connections1024Max TCP connections to cluster
max_pending_requests1024Max requests queued without a connection
max_requests1024Max concurrent requests (HTTP/2 multiplexing)
max_retries3Max concurrent retries

See api/envoy/config/cluster/v3/cluster.proto for the full protobuf definition.

Extension Points

  1. Custom Cluster Types:

    • Implement Upstream::ClusterFactory interface.
    • Register with REGISTER_FACTORY.
    • Examples: ORIGINAL_DST (transparent proxy), AGGREGATE (combines multiple clusters).
    • See source/extensions/clusters/.
  2. Custom Load Balancers:

  3. Custom Health Checkers:

  4. Custom Retry Policies:

    • Implement Upstream::RetryHostPredicate or Upstream::RetryPriority interfaces.
    • Control which hosts/priorities are tried on retry.
    • See source/extensions/retry/.
  5. Upstream Transport Sockets:

  6. Upstream HTTP Filters:

    • Similar to downstream HTTP filters but applied on the upstream connection.
    • Configured via upstream_http_filters in the cluster config.
    • Used for upstream-specific transformations.

To develop: Use bazel test //test/common/upstream/... for cluster manager tests. Integration tests in test/integration/ verify end-to-end upstream behavior. The admin endpoint /clusters provides runtime cluster status, host health, and stats for debugging.