In the evolving world of Infrastructure as Code (IaC), Terraform has made its mark. It’s now a top choice for provisioning and managing cloud resources. As more organizations embrace Terraform for their infrastructure automation, there’s a pressing need. They must consistently ensure that the infrastructure’s actual state aligns with its intended state. This challenge brings Terraform drift detection to the forefront.
What is Terraform drift? It’s when your live infrastructure deviates from what’s described in your Terraform configuration files. Such deviations can happen in multiple ways. One common cause is changes made outside Terraform’s oversight. Imagine manual adjustments made via a console or the CLI. These changes are on the cloud, but they’re absent from the Terraform state file, causing a mismatch.
Why does Terraform drift detection matter, especially for those managing AWS infrastructure?
The reasons are compelling. It helps you identify and tackle these mismatches. By doing so, you bolster security, ensure compliance, and boost both functionality and reliability. In this article, we’ll navigate these crucial areas.
What Does Terraform Drift Look Like?
Let’s start by talking about Terraform state file. So Terraform state file holds details about resources that Terraform has either created or managed. Within this file, you’ll find information like resource type, name, provider, configuration, and its present state. Terraform relies on this state file to track infrastructure changes and ensure everything aligns with your specific configurations.
Terraform drift emerges when there’s a disparity between how Terraform understands a resource’s state (based on the state file) and how that resource actually exists in reality.
Comparing Real-Time Infrastructure to the Terraform State File
So how do we identify such a drift? The Terraform state file serves as a snapshot of your infrastructure as overseen by Terraform.
By comparing this state file with the actual status of your infrastructure, you can pinpoint any alterations made outside of Terraform’s purview such as manual changes to the infrastructure and purposeful new changes defined in the terraform configuration file to facilitate this, you can run the ‘Terraform plan’ command, which calculates a new state file (in memory) with the existing state of your infrastructure and then compares it to the desired state defined in the Terraform configuration file. The output of the command shows the changes between the actual state and the desired state, i.e. the drift.
Spotting Terraform Drift in Action: A Practical Example
Let’s dive into a hands-on illustration to understand drift detection more concretely.
Step 1: Initial Configuration with Terraform
We use Terraform to set up a security group. In the initial code, only one rule is configured, bearing the description “SSH from VPC” :
Upon deploying this Terraform code, the AWS console will reflect this setup:
Step 2: Manual Modifications in AWS Console
Imagine a scenario where someone, maybe due to urgent requirements, goes directly into the AWS console and adds two new rules to the security group: HTTP and HTTPS.
Step 3: Terraform’s Response to Manual Changes
Post the manual changes, when you run the “terraform plan” command, Terraform detects the drift between its state file and the current AWS setup. The terminal indicates “1 to change”, signifying this disparity.
Reasons for Terraform Drift
1. Lack of Automation
Drift can arise when there’s an absence of systematic processes. When there’s no automation in place, infrastructure changes often happen manually, leading to potential errors and inconsistencies. Implementing a CI/CD (Continuous Integration/Continuous Deployment) system ensures changes are made consistently, tested, and deployed automatically. This minimizes drift by streamlining the creation, testing, and deployment of code. Furthermore, without a clear strategy for updating AWS infrastructure, ad hoc changes might go undocumented, laying the groundwork for drift.
2. Urgency of Hotfixes
When urgent issues arise, hotfixes can be a quick solution. However, when these fixes are applied manually, especially directly through the AWS interface, they might bypass regular procedures. Such changes can introduce drift as they aren’t reflected in the Terraform state file. For example, this may occur if an on-call team member fixes a resource configuration directly from the AWS console to address a production bug reported at 2 am in the morning.
3. Insufficient Team Training
A well-informed team is crucial for maintaining consistent infrastructure. If team members, unfamiliar with Terraform, opt to make updates directly through the AWS console, it creates a blind spot for Terraform. Since Terraform doesn’t recognize these console-based changes, drift can unintentionally be introduced.
4. Third-party Automation
Not all automation tools are created equal, and using third-party automation software can be a Terraform drift culprit. For instance, imagine a third-party security system that uses AWS CLI or AWS SDK to modify a rule in one of your AWS security groups.
These tools don’t have access to the Terraform configuration file and can’t really change the Terraform code, so the change is not updated in it.
In this situation, the Terraform state file won’t reflect the current condition of your infrastructure, thus causing a drift.
Implications of Terraform Drift
Security at Risk
Infrastructure drift in cloud systems can lead to heightened security risks. For instance, security group rules might be manually adjusted for testing purposes, granting unintended public access.
The consequences? Potential data breaches, financial setbacks, compliance breaches, and damage to the company’s reputation.
Compliance in Jeopardy
Terraform drift can cause breaches in compliance, introducing significant risks that deeply impact a business. This unmanaged change might alter operational practices, compromise data integrity, or weaken security safeguards. For example, if drift results in the unintentional public disclosure of user data or unauthorized resource access, it can lead to major compliance concerns.
Financial Implications
Infrastructure drift can also have financial implications. Human-induced changes might increase operational costs.
Imagine a team member who changes an RDS DB instance type from “db.m5.xlarge” to “db.m5.4xlarge”.
This means the organization is going to pay four times more than the expected price.
Let’s look at the numbers from the AWS pricing page for the above change to understand the cost implication better.
For example:
- Pricing for db.m5.xlarge DB server::
- Pricing for db.m5.4xlarge DB server:
In this example, the organization is going to pay an extra 10,000 USD due to the unintentional RDS instance type change.
Irrelevance of Terraform Code
Significant drift can make Terraform code obsolete.
When Terraform code becomes outdated, it indicates that the code you created to build and maintain your systems is no longer correct or current with the way the systems really appear and work.
Your code is not relevant anymore and you can’t use it to manage your Infrastructure.
Your code becomes less relevant as the drift gets older and larger.
How to Detect Terraform Drifts
Stay one step ahead of infrastructure drifts using the terraform plan command.
Think of it as a health check-up for your setup. It assesses how your current infrastructure measures up against the blueprint laid out in your Terraform configurations.
Spot a mismatch? This command will point out the changes needed to bridge that gap.
Periodic Checks
Regular checks, a few times a day or at least once a day, are a good habit. It’s like routine maintenance; catch the small issues before they snowball into bigger ones.
Visibility and Notification
To guarantee the security and compliance of your infrastructure, you must have a reliable visibility and alerting mechanism for Terraform drifts. You can instantly spot any problems that might cause risk.
A dashboard is an excellent tool for displaying all open drifts.
Such a dashboard should provide details on the type of drift, the resources impacted, and the drift severity.
The importance of the drift should be determined by its possible effects, such as whether it may result in a compliance or security breach.
Another good practice is to have a notification system in place to keep you updated on any new drifts. This ought to allow users to get notifications by email or other channels.
Terraform Drift Remediation: Two Effective Approaches
1. Reconcile
What’s the goal? Return everything to how the original Terraform code intended it to be.
When to use? Best for when changes, made outside the Terraform code, need to be reversed.
Here’s a Simple Breakdown:
- Your Terraform code sets up an EC2 instance type as t2.micro:
- This leads to the creation of the mentioned EC2 instance:
Oops! Someone manually changes the instance type to t2.nano from the AWS console.
Running the terraform plan highlights this discrepancy:
Oops! Someone manually changes the instance type to t2.nano from the AWS console.
Running the terraform plan highlights this discrepancy:
Oops! Someone manually changes the instance type to t2.nano from the AWS console.
Running the terraform plan highlights this discrepancy:
- Solution? Run ‘Terraform Apply’ again. It identifies the drift and swiftly rebuilds the EC2 instance to match the original t2.micro type. Just like that, it’s back to the original state. (In the case of the EC2 instance it will actually replace the instance but you get the idea).
2. Align the Code
What’s the goal? Update the Terraform code to mirror the real-time state of your infrastructure.
When to use? Ideal for when changes made outside Terraform are deemed necessary and should remain.
Here’s How it Works:
- This is the initial EC2 setup by Terraform
- This leads to the creation of the mentioned EC2:
- Yet again, someone switches the instance type to t2.nano.
- Solution? Instead of rolling back, you opt to adjust the Terraform code.
This means altering the instance type to t2.nano in the code to match the actual setup on the AWS console.
- Now running the plan shows ‘No changes’ since the code and the actual setup are aligned:
- And there you have it. Both the Terraform state file and AWS console are in sync and up-to-date.
Out-of-the-box Drift Detection & Remediation
To summarize, detecting Terraform drifts is vital for maintaining the security and efficiency of your cloud environments. It’s essential to take action when drift is identified, understand its causes, and be aware of the problems it might introduce. By using effective methods to detect and rectify drift, you can ensure your system stays in its desired state, and prevent incidents, while ensuring operational excellence.
ControlMonkey is a platform that enhances your Terraform operations and provides a Drift detection and remediation mechanism out of the box.
ControlMonkey’s Drift Detection consistently compares the current state of your infrastructure with the desired state, promptly notifying you of any discrepancies encountered.
Not only does ControlMonkey excel at pinpointing disparities, but it also offers effective means of rectifying drift.
It presents a user-friendly dashboard for managing detected drifts, issues timely notifications for new deviations, and offers a convenient one-click option for addressing and remediating any drifts.