Multi-platform service discovery
The Consul service catalog is a core component of Consul's service discovery capabilities. Registering services in the catalog makes them discoverable by other services and tools within your infrastructure using regular DNS queries, even if they are unaware of Consul.
Service registration overview
Below is an example of a service registration definition in HCL format that an agent can load at startup.
service {
name = "hashicups"
id = "hashicups"
meta = {
environment = "prod"
version = "1.1"
az = “us-east-1a”
}
port = 8000
}
We specify a name for the service, the port it is listening to, and any identifiable information about this specific service instance in the meta section so that it can be filtered against when querying the service. These filters help steer traffic to particular instances of a common service during application deployments or when feature flagging specific instances.
Health checking
Consul's health checking system is another important aspect of its service discovery capabilities. By understanding and effectively implementing health checks, you can ensure that only healthy instances of your services are discoverable and receive traffic. Here is a deeper dive into the health-checking workflow:
At its core, a health check in Consul is a method to determine the status of a service or a node. Consul supports multiple types of health checks, each tailored to different scenarios. When a service instance fails a health check, Consul automatically removes it from the list of healthy instances for that service, ensuring that consumers of that service don't send traffic to an unhealthy instance.
HashiCorp Consul's proficiency in health checking stems from its foundational architecture and the protocols it employs. Central to Consul's efficiency is the Serf library, which utilizes the Gossip protocol to manage membership and detect node failures. Unlike many centralized systems, the Gossip protocol's decentralized nature allows for swift failure detection and minimizes single points of failure. This protocol also ensures efficient message broadcasting to all cluster nodes, ensuring rapid propagation of health state changes without overburdening the network. Moreover, its self-healing capabilities mean that if a node becomes unreachable, the protocol can route around it, ensuring uninterrupted propagation of health state information.
Another strength of Consul lies in its scalable health state distribution. Rather than relying on a polling mechanism, Consul adopts an event-based model. When there's a change in a service's health state, this change is immediately propagated to all relevant nodes, reducing unnecessary network traffic and ensuring timely updates.
Consul's integrated service discovery and health-checking approach offers a significant advantage. By combining these two functions, Consul ensures immediate updates to service discovery data when there's a change in a service's health state.
Application developers register health checks alongside their service registration definitions. You should use the same method to template or distribute your service registration definitions to include your applications' relevant health check definitions.
Below is an example of a service registration definition in HCL format that includes an HTTP health check.
service {
name = "hashicups"
id = "hashicups"
meta = {
environment = "prod"
version = "1.1"
az = “us-east-1a”
}
port = 8000
}
check = {
id = "api"
name = "HTTP API on port 5000"
http = "https://localhost:5000/health"
tls_server_name = ""
tls_skip_verify = false
method = "POST"
header = {
Content-Type = ["application/json"]
}
body = "{\"method\":\"health\"}"
disable_redirects = true
interval = "10s"
timeout = "1s"
}
Health check status
Health checks within Consul can return three different states when evaluating the health of your application.
Health check | Description |
---|---|
Passing | The service is healthy and can serve traffic. |
Warning | The service might be experiencing issues but can still serve traffic. |
Critical | The service is unhealthy and shouldn't receive traffic. |
When querying via DNS, Consul will include services with health reporting in the Passing and Warning states by default. You can change this default behavior to only return Passing services with the only_passing
configuration option in the dns_config
configuration block. All three of these states can also be queried via the health API within the service catalog.
User interface
The Consul UI is a useful reference for providing an overall view of all registered services. The UI also provides more detailed views that can help investigation into failure states for every individual instance.
The UI can be enabled using the ui_config block on an agent by agent basis, the documentation for which is available here.
Services
The service tab shows a list of all services registered to the cluster, by default emphasizing services that have unhealthy instances.
Service overview
Selecting a specific service loads the next level of detail which provides a view of all instances of the chosen service and their individual health status.
Instance detail
The response from each health check can be viewed in the instance detail view. In this example the instance in warning status can be seen to be responding with a 429 Too Many Requests HTTP code.
Health check types
Health check types | Descriptions |
---|---|
HTTP | HTTP checks make an HTTP GET request to the specified URL and wait for the specified amount of time. HTTP checks are one of the most common types of checks. |
TCP | TCP checks attempt to connect to an IP or hostname and port over TCP and wait for the specified amount of time. |
UDP | UDP checks send UDP datagrams to the specified IP or hostname and port and wait for the specified amount of time. |
OSService | OSService checks if an OS service is running on the host. OSService checks support Windows services on Windows hosts. |
Time-to-live (TTL) | Time-to-live (TTL) checks are passive checks that await updates from the service. If the check does not receive a status update before the specified duration, the health check enters a critical state. |
Docker | Docker checks are dependent on external applications packaged with a Docker container that are triggered by calls to the Docker exec API endpoint. |
gRPC | gRPC checks probe applications that support the standard gRPC health checking protocol. |
H2ping | H2ping checks test an endpoint that uses http2. The check connects to the endpoint and sends a ping frame. |
Alias | Alias checks represent the health state of another registered node or service. |
Script | Script checks invoke an external application that performs the health check, exits with an appropriate exit code, and potentially generates output. Script checks carry a higher risk because they allow for code execution on client machines if someone has access to the client agent API. |
Detailed configuration information for all health check types can be found here.
Service registration workflows
Application developers can register their services and health checks with the Consul Enterprise service catalog in several ways. We will highlight some of the most common methods to ensure that, as an operator, you understand the different approaches available to your consumers as they start adopting Consul Enterprise.
Configuration management (CM) service registration
Many organizations use configuration management tools like Chef, Ansible, or Puppet to manage their infrastructure. These tools can be leveraged to register services in Consul.
Workflow:
- First, define the service registration and health check definition in JSON or HCL format
- Use the templating functionality of your configuration management tool to render the service registration definition with any relevant variables substituted into the Consul configuration directory. Configuration is loaded in lexical order from the Consul configuration directory.
- Trigger a Consul service reload either by using the
consul reload
CLI command, which is non-disruptive and keeps the agent running, or by restarting the Consul agent service.
Application deployment service registration
For application developers, it is often convenient to register services as part of the application deployment process.
Workflow:
- First, define the service registration and health check definition in JSON or HCL format
- Include the service registration definition with the application code or deployment artifacts in a place where the Consul agent can pick up the definition. Configuration is loaded in lexical order from the Consul configuration directory.
- As part of the deployment automation (e.g., Jenkins, GitLab CI/CD, Spinnaker), use a script or tool to execute
consul reload
or restart the Consul agent service after deploying the application.
Terraform resource registration
Terraform, an Infrastructure as Code tool, can create resources like load balancers, databases, and VMs. These resources' IP addresses or DNS names can be registered in Consul as external services. This method does not support direct health checking of the service without using an external service monitor.
Workflow:
- In your Terraform code, use the
consul_service
resource to define an external service alongside any infrastructure code that you want the service to reference (e.g., a load balancer or IP address from an instance) - Tie the attribute from the infrastructure resource that includes the IP address or DNS name to the
consul_node
resource associated with the external service. - When you apply your Terraform configuration, Terraform will create the infrastructure resource (e.g., a load balancer) and register the information from the corresponding
consul_service
resource in the Consul service catalog.
Consul catalog sync for Kubernetes
For organizations using Kubernetes, Consul provides a catalog sync feature, which automatically syncs services between Consul and Kubernetes.
Workflow:
- Deploy one instance of the Consul catalog sync agent per Kubernetes cluster into a dedicated Partition inside your Consul Enterprise deployment using an external server configuration. Refer to our Solution Design Guide for Kubernetes specific deployment considerations.
- Configure the admin Partition and Namespaces references, disable the connectInject functionality, and enable ACL management in your helm chart configuration. The connect inject functionality is not required for helm service discovery deployments that won’t be leveraging sidecar containers for DNS proxying or mesh gateway components.
- Configure the catalog sync agent to synchronize services from Kubernetes into Consul for specific Kubernetes namespaces.
- Refer to the catalog sync helm reference for configuring settings for the service types that are applicable in your environment. eg: ClusterIP/NodePort/Ingress Controller
Example of configuration for step 2:
global:
adminPartitions:
enabled: true
name: <non-default Partition name>
enableConsulNamespaces: true
acls:
manageSystemACLs: true
connectInject:
enabled: false
externalServers:
enabled: true
hosts: ["<Consul API destination>"]
httpsPort: 8501
grpcPort: 8503 # The grpc_tls port on Consul Servers
tlsServerName: "<hostname tied to the https API>"
Example configuration for step 3:
syncCatalog:
enabled: true
toConsul: true
toK8S: false
k8sAllowNamespaces: ["specific-namespace1",”specific-namespace2”]
consulNamespaces:
mirroringK8s: true
addK8SNamespaceSuffix: false
syncClusterIPServices: null
ingress:
enabled: null
loadBalancerIPs: null
nodePortSyncType: null
aclSyncToken: nul
Library or API registration
Several existing libraries for programming languages or application platforms include native support for registering and discovering services within the Consul service catalog. If there isn’t library support directly in your application programming language, the REST API for registering services can also be leveraged to register your services dynamically at application runtime.
REST Workflow:
- First, define the service registration and health check definition as a JSON object within your application code.
- Use the agent catalog API to register the service to your local Consul agent where your application is running.
Operational considerations
As you begin working with application teams to register their services into Consul Enterprise, there are several critical operational considerations to think about to prepare each consumer from the beginning.
ACLs
The primary ACL type required for service registration is the service:write
permission tied to the specific service name you are registering. If creating policies or roles for a large group of consumers who share a particular Namespace, you can summarize services using prefixes and then grant service_prefix:write
with the appropriate prefix match for a group of services.
Service naming conventions
Maintain a consistent naming convention for your services across platforms and regions. This will help simplify the consumer workflow and create failover patterns with more advanced Consul features that can provide failover for similar services deployed in multiple places.
When breaking out Consul datacenters for different teams, you should leverage the enterprise namespace capabilities outlined in the initial configuration section or include prefixes in your service names based on application team names to ensure they are unique.
Leverage service meta and node meta fields
It would be best if you leveraged the meta field when registering services to include rich key-value pair information and associate it directly with your service instance. Meta values are preferred over tags because it allows easier filtering against the distinct keys or values.
In addition to adding meta values directly to your service instances, you can also apply meta values to the consul agents where the services are running in the node_meta
configuration block. These meta values allow you to create filters based on specific characteristics of the underlying infrastructure where the service is running (e.g. Operating System, Kernel Version), or unique attributes of the individual service instance (e.g. Version, Feature Flags).
One consideration is that tags are directly exposed when using the DNS query API by including them as a prefix when querying service names. To do more advanced filtering against meta and node meta values, leverage the prepared query functionality outlined in the service catalog discovery section.