Cluster Manager Deep-Dive
What This Component Does
The Cluster Manager is Envoy’s upstream connection management system. It owns all upstream clusters (logical groups of backend hosts), their connection pools, load balancing policies, health checking, circuit breaking, and outlier detection. When the Router filter needs to forward a request to an upstream service, it asks the Cluster Manager for a thread-local cluster handle, which provides lock-free access to host sets and connection pools.
The Cluster Manager integrates with CDS (Cluster Discovery Service) and EDS (Endpoint Discovery Service) for dynamic cluster and endpoint management, enabling seamless scaling, rolling deployments, and multi-cluster topologies without proxy restarts.
Use cases: Every upstream request in Envoy flows through the Cluster Manager. It handles service discovery, load balancing algorithms (Round Robin, Least Request, Maglev, Ring Hash), active health checking, passive health monitoring (outlier detection), connection pooling (HTTP/1.1, HTTP/2, TCP), circuit breaking, and upstream TLS/mTLS origination.
How It Works
The Cluster Manager operates as a two-tier system: a global ClusterManagerImpl on the main thread that manages cluster lifecycle and configuration, and per-worker ThreadLocalClusterManagerImpl instances that provide lock-free access to host sets, load balancers, and connection pools.
Internal Flow Diagram
flowchart TD
A[Router Filter<br/>Needs upstream connection] --> B[ClusterManager::getThreadLocalCluster<br/>Thread-local lookup by name]
B --> C[ThreadLocalCluster<br/>Per-worker, lock-free]
C --> D[LoadBalancer::chooseHost<br/>Select endpoint]
D --> E{LB Algorithm}
E -->|Round Robin| F[RoundRobinLB<br/>Rotate through healthy hosts]
E -->|Least Request| G[LeastRequestLB<br/>Pick host with fewest active]
E -->|Ring Hash| H[RingHashLB<br/>Consistent hashing by key]
E -->|Maglev| I[MaglevLB<br/>Minimal disruption hashing]
E -->|Random| J[RandomLB<br/>Uniform random selection]
F & G & H & I & J --> K[Host Selected<br/>Check circuit breaker]
K --> L{Circuit Breaker<br/>Open?}
L -->|Yes| M[Return overflow error<br/>503 response]
L -->|No| N[Connection Pool<br/>Get or create connection]
N --> O{Pool Type}
O -->|HTTP/1.1| P[Per-host connection<br/>Keep-alive or new]
O -->|HTTP/2| Q[Multiplexed streams<br/>Over shared connections]
O -->|TCP| R[Raw TCP connection]
P & Q & R --> S[Upstream Request<br/>Forward headers + body]
S --> T[Health Check / Outlier<br/>Update host status]
style A fill:#f9f
style S fill:#9f9
style M fill:#ff9
Step-by-Step Process:
-
Cluster Registration: At startup or via CDS, clusters are created by
ClusterManagerImpl::addOrUpdateCluster()insource/common/upstream/cluster_manager_impl.cc. Each cluster creates itsClusterInfoImpl(config, stats, LB policy), host set, health checkers, and outlier detector. -
Thread-Local Distribution: The main thread posts cluster updates to all workers via TLS slots. Each worker maintains a
ThreadLocalClusterManagerImplwith its ownThreadLocalClusterinstances. This avoids any locking on the request path. -
Endpoint Discovery (EDS):
EdsClusterImplsubscribes to EDS for dynamic endpoint updates (source/extensions/clusters/eds/eds.cc). Updates modify thePrioritySet(host sets grouped by priority/locality), triggering LB rebuild and TLS updates. -
Load Balancing: When the Router calls
chooseHost(), the thread-local load balancer selects from healthy hosts in thePrioritySet. Priority-based failover: if priority 0 hosts are unhealthy, fall to priority 1. Locality-weighted routing distributes within priorities based on locality weights. -
Connection Pooling:
ConnPoolImplmaintains per-host, per-protocol connection pools. HTTP/1.1 pools manage connection-per-request (or keep-alive reuse). HTTP/2 pools multiplex streams over a configurable number of connections. Connections are created lazily and drained on host health changes. -
Health Checking: Active health checkers (
source/extensions/health_checkers/) periodically probe endpoints with HTTP, TCP, or gRPC checks. Unhealthy hosts are removed from the LB’s host set. Health check events propagate viaHostSetCallbacks. -
Outlier Detection: Passive monitoring in
source/common/upstream/outlier_detection_impl.cctracks consecutive 5xx errors, gateway errors, success rate, and latency. Hosts exceeding thresholds are ejected for a configurable period, progressively increasing on repeat ejections. -
Circuit Breaking: Per-cluster limits on connections, pending requests, retries, and requests (
source/common/upstream/resource_manager_impl.cc). When limits are hit, requests receive immediate 503s, preventing cascade failures.
Design Patterns & Trade-offs:
- Thread-local cluster data: Lock-free hot path at the cost of memory duplication (each worker has its own copy of host sets and LB state). Updates are asynchronous (millisecond-level lag).
- Priority-based failover: Allows zone-aware routing with graceful degradation. Trade-off: more complex LB configuration and potential uneven distribution during partial failures.
- Connection pool per-host: Isolates failure domains but increases file descriptor usage. HTTP/2 multiplexing mitigates this significantly.
- Separate health check vs. outlier detection: Active checks probe health independently of traffic; outlier detection reacts to real traffic patterns. Together they provide comprehensive failure detection but add configuration complexity.
Key Code Paths
Cluster Manager Core
source/common/upstream/cluster_manager_impl.h-ClusterManagerImplclass withThreadLocalClusterManagerImplinner class.source/common/upstream/cluster_manager_impl.cc- Core logic:addOrUpdateCluster()(~L300),getThreadLocalCluster()(~L500), cluster warming, CDS integration.
Cluster Info & Upstream
source/common/upstream/upstream_impl.h-ClusterInfoImpl(cluster config + stats),HostImpl(single endpoint).source/common/upstream/upstream_impl.cc- Host metadata, locality, health flags, weight.source/common/upstream/priority_conn_pool_map_impl.h- Per-priority connection pool management.
Load Balancers
source/common/upstream/load_balancer_impl.cc-RoundRobinLoadBalancer,LeastRequestLoadBalancer,RandomLoadBalancerimplementations.source/common/upstream/ring_hash_lb.cc- Consistent ring hash LB (Ketama-compatible).source/common/upstream/maglev_lb.cc- Maglev consistent hashing (Google’s algorithm). Provides better distribution than ring hash with O(1) lookup.source/extensions/load_balancing_policies/- Pluggable LB policies (extension point).
Health Checking
source/common/upstream/health_checker_impl.cc- Base health checker, manages check intervals and host state transitions.source/extensions/health_checkers/http/- HTTP health checker (configurable path, expected status codes).source/extensions/health_checkers/grpc/- gRPC health checker (standard health protocol).
Outlier Detection & Circuit Breaking
source/common/upstream/outlier_detection_impl.cc- Tracks error rates, ejection logic, backoff timers.source/common/upstream/resource_manager_impl.cc- Circuit breaker resource counters (connections, pending, retries, requests).
EDS & Cluster Types
source/extensions/clusters/eds/eds.cc-EdsClusterImplhandles endpoint updates from EDS.source/extensions/clusters/static/static_cluster.cc- Static clusters with hardcoded endpoints.source/extensions/clusters/strict_dns/strict_dns_cluster.cc- DNS-resolved clusters.source/extensions/clusters/logical_dns/logical_dns_cluster.cc- Logical DNS (single resolved host).
Configuration
Clusters are configured via envoy.config.cluster.v3.Cluster protobuf (bootstrap static_resources.clusters or CDS):
| Field | Type | Default | Effect |
|---|---|---|---|
name | string | required | Cluster name referenced by routes |
type | enum | STATIC | Discovery type: STATIC, STRICT_DNS, LOGICAL_DNS, EDS, ORIGINAL_DST |
lb_policy | enum | ROUND_ROBIN | LB algorithm: ROUND_ROBIN, LEAST_REQUEST, RING_HASH, RANDOM, MAGLEV |
load_assignment | ClusterLoadAssignment | - | Static endpoint list (for STATIC / EDS type) |
eds_cluster_config | EdsClusterConfig | - | EDS service config (API source) |
health_checks | repeated HealthCheck | - | Active health check configurations |
outlier_detection | OutlierDetection | - | Passive health monitoring settings |
circuit_breakers | CircuitBreakers | see below | Per-priority resource limits |
connect_timeout | Duration | 5s | TCP connection timeout to upstream |
per_connection_buffer_limit_bytes | uint32 | 1MB | Buffer limit per connection |
http2_protocol_options | Http2ProtocolOptions | - | HTTP/2 settings (max concurrent streams, etc.) |
upstream_connection_options | UpstreamConnectionOptions | - | TCP keep-alive, SO options |
transport_socket | TransportSocket | - | TLS/mTLS config for upstream connections |
common_lb_config | CommonLbConfig | - | Zone-aware routing, healthy panic threshold |
Circuit Breaker Defaults (per-priority):
| Resource | Default Limit | Effect |
|---|---|---|
max_connections | 1024 | Max TCP connections to cluster |
max_pending_requests | 1024 | Max requests queued without a connection |
max_requests | 1024 | Max concurrent requests (HTTP/2 multiplexing) |
max_retries | 3 | Max concurrent retries |
See api/envoy/config/cluster/v3/cluster.proto for the full protobuf definition.
Extension Points
-
Custom Cluster Types:
- Implement
Upstream::ClusterFactoryinterface. - Register with
REGISTER_FACTORY. - Examples:
ORIGINAL_DST(transparent proxy),AGGREGATE(combines multiple clusters). - See
source/extensions/clusters/.
- Implement
-
Custom Load Balancers:
- Implement
Upstream::LoadBalancerFactoryandUpstream::LoadBalancerinterfaces. - Register as a load balancing policy extension.
- See
source/extensions/load_balancing_policies/for pluggable LB policies.
- Implement
-
Custom Health Checkers:
- Implement
Upstream::HealthCheckerinterface. - Register factory via
REGISTER_FACTORY. - Built-in: HTTP, TCP, gRPC, Redis, Thrift.
- See
source/extensions/health_checkers/.
- Implement
-
Custom Retry Policies:
- Implement
Upstream::RetryHostPredicateorUpstream::RetryPriorityinterfaces. - Control which hosts/priorities are tried on retry.
- See
source/extensions/retry/.
- Implement
-
Upstream Transport Sockets:
- Implement
Network::TransportSocketFactoryfor custom upstream transport. - Used for mTLS, ALTS (Google), and custom encryption.
- See
source/extensions/transport_sockets/tls/.
- Implement
-
Upstream HTTP Filters:
- Similar to downstream HTTP filters but applied on the upstream connection.
- Configured via
upstream_http_filtersin the cluster config. - Used for upstream-specific transformations.
To develop: Use bazel test //test/common/upstream/... for cluster manager tests. Integration tests in test/integration/ verify end-to-end upstream behavior. The admin endpoint /clusters provides runtime cluster status, host health, and stats for debugging.