Sizing guidelines
The Consul server nodes process all read and write operations from the agents as well as maintain a consensus among the cluster, and as such are I/O bound for writes and CPU bound for reads. This needs to be taken into consideration and monitoring put in place to adjust as required depending on the type of workload inside of the cluster.
Workloads on virtual machines require Consul client agents for service discovery and service mesh. So the following guidelines are based on that requirement.
As a general rule, we recommend that the maximum size for a single datacenter is 5,000 Consul client agents. This estimate is based on impact of recovery time, write and read requests, and other factors. We recommend deploying Read Replicas for improved scalability in clusters that are Read heavy. We have customers who have scaled Consul to tens of thousands of agents per cluster, but it is highly dependent on the read and write workloads of the cluster. As such, customers must optimize for stability at the gossip layer as the cluster scales. The two main factors that affect this with client agent are:
- Total size of the gossip pool
- The churn of nodes/agents in the pool
Control plane on EC2
We recommend deploying at a minimum the following types of instances for the Consul Servers. These are broken down into Initial and Large clusters. We recommend starting with the Initial cluster size and once adoption occurs, vertically scaling the servers to the Production Cluster size.
Provider | Size | Potential Instance Type | CPU | Memory | Disk Capacity | Disk IO |
---|---|---|---|---|---|---|
AWS | Initial | m5.large | 2 | 8 | min: 100 GB (gp3) | min: 3000 IOPS |
AWS | Small | m5.xlarge | 4 | 16 | min: 100 GB (gp3) | min: 3000 IOPS |
AWS | Large | m5.2xlarge | 8 | 32 | min: 200 GB (gp3) | min: 7500 IOPS |
AWS | Extra-Large | m5.4xlarge | 16 | 64 | min: 200 GB (gp3) | min: 7500 IOPS |
The above architecture will support a high level of agents based clients, but we highly recommend that if a single Datacenter in the above architecture is provisioned, that customers monitor cluster metrics to both establish a baseline and set threshold levels.
Control plane on EKS
The CPU and memory recommendations can be used when you select the resources limits for the Consul pods. The disk recommendations can also be used when selecting the resources limits and configuring persistent volumes. You will need to set both limits and requests in the Helm chart. Below is an example snippet of Helm configuration for a Consul server in a large environment.
server:
resources: |
requests:
memory: "32Gi"
cpu: "4"
limits:
memory: "32Gi"
cpu: "4"
storage: 50Gi
HashiCorp recommends monitoring your production deployment to take data-driven informed decisions to scale your production server resource limits or vertically scale the VM deployments.