Health Assessment
HCP Terraform can perform automatic health assessments in a workspace to assess whether its real infrastructure matches the requirements defined in its Terraform configuration. Health assessments include the following types of evaluations:
- Drift Detection: Think of your Terraform configuration as the blueprint for your infrastructure. Drift detection automatically spots if the real thing doesn't match the plan anymore. This helps you fix problems before they break things.
- Continuous validation: Terraform doesn't just build your infrastructure – it enforces your standards over time. Set up rules for security, cost, or anything important, and Continuous Validation checks that they're always being met. Continuous Validation will be covered in the HVD Terraform — Scaling Guide. Please refer to Docs for more details on Continuous Validation.
Note: Health assessments are available in HCP Terraform Plus. Refer to HCP Terraform pricing for details. In Terraform Enterprise, health assessments are part of the Governance & Policy module.
Before reviewing the best practices please review the official documentation for Health Assessments in HCP Terraform here.
Drift Detection
Drift detection in HCP Terraform is designed to ensure the integrity and consistency of your infrastructure. This feature identifies when the actual state of your infrastructure deviates from the expected state defined in your Terraform configurations, a scenario commonly referred to as configuration drift.
It’s important to understand the differences between configuration drift and state drift. Configuration drift is a change in the configuration of an application or infrastructure. State drift is a configuration state where a resource's actual configuration differs from its expected configuration.
Configuration drift is a change in the configuration of an application or infrastructure. This occurs when changes are made directly to the infrastructure outside of Terraform's managed processes, leading to discrepancies between the live state and the code-defined desired state. Common causes include manual interventions, emergency fixes, or changes applied through other automation tools.
Configuration drift differs from state drift. Drift detection does not detect state drift. State drift occurs when external changes affecting remote objects do not invalidate your infrastructure configuration.
Limitations
HCP Terraform's drift detection is a powerful tool, but it is important to understand its limitations. While it excels at identifying configuration drift for resources it actively manages, there are certain limitations to keep in mind:
Unmanaged attributes: Drift detection focuses on attributes explicitly managed by Terraform. Changes made directly to resource attributes outside of Terraform's control will not be flagged. For example, if you manually modify a security setting on a virtual machine instance, drift detection wouldn't detect this change.
External additions: HCP Terraform can't detect resources that are added to your infrastructure entirely outside of Terraform's purview. For instance, if you manually create new IAM users within your cloud environment, drift detection wouldn't identify these additions.
Prescriptive Guidance
With Organization owner permissions you can enforce health assessments for all workspaces or allow the setting to be controlled at the individual workspace level. Enforcing health assessments at an organization level overrides workspace-level settings. HashiCorp recommends that Health assessments feature be enabled for all workspaces. This global setting is available under the main Settings -> Health page.
The Platform Team admin optionally can elect to enable Health Assessments on specific workspaces in the following circumstances:
- Rate limiting from the provider API is triggered - requiring reduction in API calls.
- Terraform Enterprise (Self-hosted) - If load in your Terraform Enterprise instance or its agents is too great and vertically scaling the compute layer is undesirable or impossible (TFE admins have the ability to adjust the frequency health assessments are performed)
- Remediating every drift can lead to an increase in operational tasks.
Terraform Enterprise administrators can modify their installation's assessment frequency and number of maximum concurrent assessments from the admin settings console.
We recommend that Workspace notifications is enabled and the Workspace Admin/Owners are notified (e.g via Slack, Email) when a workspace has a health assessment issue. This will help the relevant team take immediate action to resolve issues with the infrastructure.
Drift Resolution workflow
Once a drift is detected, the workspace can notify the application team and it is their responsibility to decide the best way to resole the drift.
The Platform team can also use HCP Terraform explorer to review which workspaces have drift and contact the application teams to resolve the drift.
On the organization's workspaces page, HCP Terraform displays a health warning status for workspaces with infrastructure drift or failed continuous validation checks.
On the right of a workspace’s overview page, HCP Terraform displays a Health bar that summarizes the results of the last health assessment.
- The Drift summary shows the total number of resources in the configuration and the number of resources that have drifted.
- The Checks summary shows the number of passed, failed, and unknown statuses for objects with continuous validation checks.
- Overwrite Drift: For undesired changes, initiate a new Terraform plan and apply it to revert resources to their configuration-defined state.
- Update Terraform Configuration: If you want the drift's changes, modify your Terraform configuration to include the changes and push a new configuration version. This prevents Terraform from reverting the drift during the next apply.
In HCP Terraform, refresh state and update Terraform configuration are two distinct operations that play critical roles in infrastructure management, especially in the context of drift detection and remediation. Understanding the differences between these operations and knowing when to use each is vital for effective infrastructure as code practices.
- Refresh state: Refreshing the state is the process of updating Terraform's internal state file to match the actual state of the infrastructure as it exists in the cloud or on-premises environments. This operation does not modify the infrastructure; instead, it updates Terraform's record (the state file) to reflect any changes that have occurred outside of Terraform's management. The primary purpose of refreshing the state is to ensure that Terraform's understanding of the infrastructure is accurate. This accuracy is essential for identifying drift, the divergence between the desired state defined in Terraform configurations and the actual state of the infrastructure.
- Update Terraform configuration: Updating a Terraform configuration version involves submitting updated Terraform configuration files to HCP Terraform and executing a plan and apply operation based on these updates. This process changes the infrastructure to match the desired state defined in the new configuration version.
- Update Terraform configurations: Once drift is identified, update your Terraform configurations to either realign the infrastructure to the desired state or update the desired state to incorporate intentional changes made outside of Terraform.
- Review and apply changes: Submit the updated configurations as a new configuration version in HCP Terraform. Take advantage of HCP Terraform's planning and review process to ensure all team members understand the changes and their impacts before applying them.
- Monitor and validate: After applying the new configuration version, monitor the apply operation's outcome to ensure the intended changes were successfully implemented and validate that the infrastructure state aligns with the desired configuration.
Additional guidance
- Continuous integration/continuous deployment (CI/CD): Integrate both the state refresh and apply operations into your CI/CD pipelines to facilitate continuous monitoring and timely remediation of drift.
- Version control and code reviews: Use version control systems for managing Terraform configurations. Implement a code review process for changes to Terraform files to ensure accuracy and intent of changes before they are applied.
- Change management: Incorporate these Terraform operations within your organization's change management framework to ensure changes are tracked, audited, and aligned with organizational policies and compliance requirements.
- Documentation and training: Document your workflow and operations, including state refresh and configuration updates, and provide training to your team. This ensures a consistent approach to managing and remediating drift across your organization.
For a detailed example, refer to the manage resource drift tutorial.