Cluster Manager Deep-Dive

What This Component Does

The Cluster Manager is Envoy’s upstream connection management system. It owns all upstream clusters (logical groups of backend hosts), their connection pools, load balancing policies, health checking, circuit breaking, and outlier detection. When the Router filter needs to forward a request to an upstream service, it asks the Cluster Manager for a thread-local cluster handle, which provides lock-free access to host sets and connection pools.

The Cluster Manager integrates with CDS (Cluster Discovery Service) and EDS (Endpoint Discovery Service) for dynamic cluster and endpoint management, enabling seamless scaling, rolling deployments, and multi-cluster topologies without proxy restarts.

Use cases: Every upstream request in Envoy flows through the Cluster Manager. It handles service discovery, load balancing algorithms (Round Robin, Least Request, Maglev, Ring Hash), active health checking, passive health monitoring (outlier detection), connection pooling (HTTP/1.1, HTTP/2, TCP), circuit breaking, and upstream TLS/mTLS origination.

How It Works

The Cluster Manager operates as a two-tier system: a global ClusterManagerImpl on the main thread that manages cluster lifecycle and configuration, and per-worker ThreadLocalClusterManagerImpl instances that provide lock-free access to host sets, load balancers, and connection pools.

Internal Flow Diagram

flowchart TD
    A[Router Filter<br/>Needs upstream connection] --> B[ClusterManager::getThreadLocalCluster<br/>Thread-local lookup by name]
    B --> C[ThreadLocalCluster<br/>Per-worker, lock-free]
    C --> D[LoadBalancer::chooseHost<br/>Select endpoint]
    D --> E{LB Algorithm}
    E -->|Round Robin| F[RoundRobinLB<br/>Rotate through healthy hosts]
    E -->|Least Request| G[LeastRequestLB<br/>Pick host with fewest active]
    E -->|Ring Hash| H[RingHashLB<br/>Consistent hashing by key]
    E -->|Maglev| I[MaglevLB<br/>Minimal disruption hashing]
    E -->|Random| J[RandomLB<br/>Uniform random selection]
    F & G & H & I & J --> K[Host Selected<br/>Check circuit breaker]
    K --> L{Circuit Breaker<br/>Open?}
    L -->|Yes| M[Return overflow error<br/>503 response]
    L -->|No| N[Connection Pool<br/>Get or create connection]
    N --> O{Pool Type}
    O -->|HTTP/1.1| P[Per-host connection<br/>Keep-alive or new]
    O -->|HTTP/2| Q[Multiplexed streams<br/>Over shared connections]
    O -->|TCP| R[Raw TCP connection]
    P & Q & R --> S[Upstream Request<br/>Forward headers + body]
    S --> T[Health Check / Outlier<br/>Update host status]

    style A fill:#f9f
    style S fill:#9f9
    style M fill:#ff9

Step-by-Step Process:

Cluster Registration: At startup or via CDS, clusters are created by ClusterManagerImpl::addOrUpdateCluster() in source/common/upstream/cluster_manager_impl.cc. Each cluster creates its ClusterInfoImpl (config, stats, LB policy), host set, health checkers, and outlier detector.
Thread-Local Distribution: The main thread posts cluster updates to all workers via TLS slots. Each worker maintains a ThreadLocalClusterManagerImpl with its own ThreadLocalCluster instances. This avoids any locking on the request path.
Endpoint Discovery (EDS): EdsClusterImpl subscribes to EDS for dynamic endpoint updates (source/extensions/clusters/eds/eds.cc). Updates modify the PrioritySet (host sets grouped by priority/locality), triggering LB rebuild and TLS updates.
Load Balancing: When the Router calls chooseHost(), the thread-local load balancer selects from healthy hosts in the PrioritySet. Priority-based failover: if priority 0 hosts are unhealthy, fall to priority 1. Locality-weighted routing distributes within priorities based on locality weights.
Connection Pooling: ConnPoolImpl maintains per-host, per-protocol connection pools. HTTP/1.1 pools manage connection-per-request (or keep-alive reuse). HTTP/2 pools multiplex streams over a configurable number of connections. Connections are created lazily and drained on host health changes.
Health Checking: Active health checkers (source/extensions/health_checkers/) periodically probe endpoints with HTTP, TCP, or gRPC checks. Unhealthy hosts are removed from the LB’s host set. Health check events propagate via HostSetCallbacks.
Outlier Detection: Passive monitoring in source/common/upstream/outlier_detection_impl.cc tracks consecutive 5xx errors, gateway errors, success rate, and latency. Hosts exceeding thresholds are ejected for a configurable period, progressively increasing on repeat ejections.
Circuit Breaking: Per-cluster limits on connections, pending requests, retries, and requests (source/common/upstream/resource_manager_impl.cc). When limits are hit, requests receive immediate 503s, preventing cascade failures.

Design Patterns & Trade-offs:

Thread-local cluster data: Lock-free hot path at the cost of memory duplication (each worker has its own copy of host sets and LB state). Updates are asynchronous (millisecond-level lag).
Priority-based failover: Allows zone-aware routing with graceful degradation. Trade-off: more complex LB configuration and potential uneven distribution during partial failures.
Connection pool per-host: Isolates failure domains but increases file descriptor usage. HTTP/2 multiplexing mitigates this significantly.
Separate health check vs. outlier detection: Active checks probe health independently of traffic; outlier detection reacts to real traffic patterns. Together they provide comprehensive failure detection but add configuration complexity.

Key Code Paths

Cluster Manager Core

source/common/upstream/cluster_manager_impl.h - ClusterManagerImpl class with ThreadLocalClusterManagerImpl inner class.
source/common/upstream/cluster_manager_impl.cc - Core logic: addOrUpdateCluster() (~L300), getThreadLocalCluster() (~L500), cluster warming, CDS integration.

Cluster Info & Upstream

source/common/upstream/upstream_impl.h - ClusterInfoImpl (cluster config + stats), HostImpl (single endpoint).
source/common/upstream/upstream_impl.cc - Host metadata, locality, health flags, weight.
source/common/upstream/priority_conn_pool_map_impl.h - Per-priority connection pool management.

Load Balancers

source/common/upstream/load_balancer_impl.cc - RoundRobinLoadBalancer, LeastRequestLoadBalancer, RandomLoadBalancer implementations.
source/common/upstream/ring_hash_lb.cc - Consistent ring hash LB (Ketama-compatible).
source/common/upstream/maglev_lb.cc - Maglev consistent hashing (Google’s algorithm). Provides better distribution than ring hash with O(1) lookup.
source/extensions/load_balancing_policies/ - Pluggable LB policies (extension point).

Health Checking

source/common/upstream/health_checker_impl.cc - Base health checker, manages check intervals and host state transitions.
source/extensions/health_checkers/http/ - HTTP health checker (configurable path, expected status codes).
source/extensions/health_checkers/grpc/ - gRPC health checker (standard health protocol).

Outlier Detection & Circuit Breaking

source/common/upstream/outlier_detection_impl.cc - Tracks error rates, ejection logic, backoff timers.
source/common/upstream/resource_manager_impl.cc - Circuit breaker resource counters (connections, pending, retries, requests).

EDS & Cluster Types

source/extensions/clusters/eds/eds.cc - EdsClusterImpl handles endpoint updates from EDS.
source/extensions/clusters/static/static_cluster.cc - Static clusters with hardcoded endpoints.
source/extensions/clusters/strict_dns/strict_dns_cluster.cc - DNS-resolved clusters.
source/extensions/clusters/logical_dns/logical_dns_cluster.cc - Logical DNS (single resolved host).

Configuration

Clusters are configured via envoy.config.cluster.v3.Cluster protobuf (bootstrap static_resources.clusters or CDS):

Field	Type	Default	Effect
`name`	string	required	Cluster name referenced by routes
`type`	enum	STATIC	Discovery type: `STATIC`, `STRICT_DNS`, `LOGICAL_DNS`, `EDS`, `ORIGINAL_DST`
`lb_policy`	enum	ROUND_ROBIN	LB algorithm: `ROUND_ROBIN`, `LEAST_REQUEST`, `RING_HASH`, `RANDOM`, `MAGLEV`
`load_assignment`	ClusterLoadAssignment	-	Static endpoint list (for `STATIC` / `EDS` type)
`eds_cluster_config`	EdsClusterConfig	-	EDS service config (API source)
`health_checks`	repeated HealthCheck	-	Active health check configurations
`outlier_detection`	OutlierDetection	-	Passive health monitoring settings
`circuit_breakers`	CircuitBreakers	see below	Per-priority resource limits
`connect_timeout`	Duration	5s	TCP connection timeout to upstream
`per_connection_buffer_limit_bytes`	uint32	1MB	Buffer limit per connection
`http2_protocol_options`	Http2ProtocolOptions	-	HTTP/2 settings (max concurrent streams, etc.)
`upstream_connection_options`	UpstreamConnectionOptions	-	TCP keep-alive, SO options
`transport_socket`	TransportSocket	-	TLS/mTLS config for upstream connections
`common_lb_config`	CommonLbConfig	-	Zone-aware routing, healthy panic threshold

Circuit Breaker Defaults (per-priority):

Resource	Default Limit	Effect
`max_connections`	1024	Max TCP connections to cluster
`max_pending_requests`	1024	Max requests queued without a connection
`max_requests`	1024	Max concurrent requests (HTTP/2 multiplexing)
`max_retries`	3	Max concurrent retries

See api/envoy/config/cluster/v3/cluster.proto for the full protobuf definition.

Extension Points

Custom Cluster Types:
- Implement Upstream::ClusterFactory interface.
- Register with REGISTER_FACTORY.
- Examples: ORIGINAL_DST (transparent proxy), AGGREGATE (combines multiple clusters).
- See source/extensions/clusters/.
Custom Load Balancers:
- Implement Upstream::LoadBalancerFactory and Upstream::LoadBalancer interfaces.
- Register as a load balancing policy extension.
- See source/extensions/load_balancing_policies/ for pluggable LB policies.
Custom Health Checkers:
- Implement Upstream::HealthChecker interface.
- Register factory via REGISTER_FACTORY.
- Built-in: HTTP, TCP, gRPC, Redis, Thrift.
- See source/extensions/health_checkers/.
Custom Retry Policies:
- Implement Upstream::RetryHostPredicate or Upstream::RetryPriority interfaces.
- Control which hosts/priorities are tried on retry.
- See source/extensions/retry/.
Upstream Transport Sockets:
- Implement Network::TransportSocketFactory for custom upstream transport.
- Used for mTLS, ALTS (Google), and custom encryption.
- See source/extensions/transport_sockets/tls/.
Upstream HTTP Filters:
- Similar to downstream HTTP filters but applied on the upstream connection.
- Configured via upstream_http_filters in the cluster config.
- Used for upstream-specific transformations.

To develop: Use bazel test //test/common/upstream/... for cluster manager tests. Integration tests in test/integration/ verify end-to-end upstream behavior. The admin endpoint /clusters provides runtime cluster status, host health, and stats for debugging.