The Definitive Guide For Terraform Drift Detection

In the evolving world of Infrastructure as Code (IaC), Terraform has made its mark. It’s now a top choice for provisioning and managing cloud resources. As more organizations embrace Terraform for their infrastructure automation, there’s a pressing need. They must consistently ensure that the infrastructure’s actual state aligns with its intended state. This challenge brings Terraform drift detection to the forefront.

What is Terraform drift? It’s when your live infrastructure deviates from what’s described in your Terraform configuration files. Such deviations can happen in multiple ways. One common cause is changes made outside Terraform’s oversight. Imagine manual adjustments made via a console or the CLI. These changes are on the cloud, but they’re absent from the Terraform state file, causing a mismatch.

Why does Terraform drift detection matter, especially for those managing AWS infrastructure?
The reasons are compelling. It helps you identify and tackle these mismatches. By doing so, you bolster security, ensure compliance, and boost both functionality and reliability. In this article, we’ll navigate these crucial areas.

What Does Terraform Drift Look Like?

Let’s start by talking about Terraform state file. So Terraform state file holds details about resources that Terraform has either created or managed. Within this file, you’ll find information like resource type, name, provider, configuration, and its present state. Terraform relies on this state file to track infrastructure changes and ensure everything aligns with your specific configurations.

Terraform drift emerges when there’s a disparity between how Terraform understands a resource’s state (based on the state file) and how that resource actually exists in reality.

Comparing Real-Time Infrastructure to the Terraform State File

So how do we identify such a drift? The Terraform state file serves as a snapshot of your infrastructure as overseen by Terraform.
By comparing this state file with the actual status of your infrastructure, you can pinpoint any alterations made outside of Terraform’s purview such as manual changes to the infrastructure and purposeful new changes defined in the terraform configuration file to facilitate this, you can run the ‘Terraform plan’ command, which calculates a new state file (in memory) with the existing state of your infrastructure and then compares it to the desired state defined in the Terraform configuration file. The output of the command shows the changes between the actual state and the desired state, i.e. the drift.

More about Terraform Drift and IaC Visibility

What is ClickOps? →
How is it related to Drift and How Can You Prevent It
How to get Total Cloud Inventory →
How to get full IaC Visibility
Why IaC Coverage Should Be Your Next Security Metric? →
Skills and Knowledge gaps creating Drift. here 5 ways how to close them

Spotting Terraform Drift in Action: A Practical Example

Let’s dive into a hands-on illustration to understand drift detection more concretely.

Step 1: Initial Configuration with Terraform

We use Terraform to set up a security group. In the initial code, only one rule is configured, bearing the description “SSH from VPC” :

Upon deploying this Terraform code, the AWS console will reflect this setup:

Step 2: Manual Modifications in AWS Console

Imagine a scenario where someone, maybe due to urgent requirements, goes directly into the AWS console and adds two new rules to the security group: HTTP and HTTPS.

Step 3: Terraform’s Response to Manual Changes

Post the manual changes, when you run the “terraform plan” command, Terraform detects the drift between its state file and the current AWS setup. The terminal indicates “1 to change”, signifying this disparity.

Reasons for Terraform Drift

1. Lack of Automation

Drift can arise when there’s an absence of systematic processes. When there’s no automation in place, infrastructure changes often happen manually, leading to potential errors and inconsistencies. Implementing a CI/CD (Continuous Integration/Continuous Deployment) system ensures changes are made consistently, tested, and deployed automatically. This minimizes drift by streamlining the creation, testing, and deployment of code. Furthermore, without a clear strategy for updating AWS infrastructure, ad hoc changes might go undocumented, laying the groundwork for drift.

2. Urgency of Hotfixes

When urgent issues arise, hotfixes can be a quick solution. However, when these fixes are applied manually, especially directly through the AWS interface, they might bypass regular procedures. Such changes can introduce drift as they aren’t reflected in the Terraform state file. For example, this may occur if an on-call team member fixes a resource configuration directly from the AWS console to address a production bug reported at 2 am in the morning.

3. Insufficient Team Training

A well-informed team is crucial for maintaining consistent infrastructure. If team members, unfamiliar with Terraform, opt to make updates directly through the AWS console, it creates a blind spot for Terraform. Since Terraform doesn’t recognize these console-based changes, drift can unintentionally be introduced.

4. Third-party Automation

Not all automation tools are created equal, and using third-party automation software can be a Terraform drift culprit. For instance, imagine a third-party security system that uses AWS CLI or AWS SDK to modify a rule in one of your AWS security groups.

These tools don’t have access to the Terraform configuration file and can’t really change the Terraform code, so the change is not updated in it.
In this situation, the Terraform state file won’t reflect the current condition of your infrastructure, thus causing a drift.

Implications of Terraform Drift

Security at Risk

Infrastructure drift in cloud systems can lead to heightened security risks. For instance, security group rules might be manually adjusted for testing purposes, granting unintended public access.
The consequences? Potential data breaches, financial setbacks, compliance breaches, and damage to the company’s reputation.

Compliance in Jeopardy

Terraform drift can cause breaches in compliance, introducing significant risks that deeply impact a business. This unmanaged change might alter operational practices, compromise data integrity, or weaken security safeguards. For example, if drift results in the unintentional public disclosure of user data or unauthorized resource access, it can lead to major compliance concerns.

Financial Implications

Infrastructure drift can also have financial implications. Human-induced changes might increase operational costs.
Imagine a team member who changes an RDS DB instance type from “db.m5.xlarge” to “db.m5.4xlarge”.
This means the organization is going to pay four times more than the expected price.
Let’s look at the numbers from the AWS pricing page for the above change to understand the cost implication better.
For example:

Pricing for db.m5.xlarge DB server:

Pricing for db.m5.4xlarge DB server:

In this example, the organization is going to pay an extra 10,000 USD due to the unintentional RDS instance type change.

Irrelevance of Terraform Code

Significant drift can make Terraform code obsolete.
When Terraform code becomes outdated, it indicates that the code you created to build and maintain your systems is no longer correct or current with the way the systems really appear and work.
Your code is not relevant anymore and you can’t use it to manage your Infrastructure.
Your code becomes less relevant as the drift gets older and larger.

How to Detect Terraform Drifts

Stay one step ahead of infrastructure drifts using the terraform plan command.
Think of it as a health check-up for your setup. It assesses how your current infrastructure measures up against the blueprint laid out in your Terraform configurations.
Spot a mismatch? This command will point out the changes needed to bridge that gap.

Periodic Checks

Regular checks, a few times a day or at least once a day, are a good habit. It’s like routine maintenance; catch the small issues before they snowball into bigger ones.

Visibility and Notification

To guarantee the security and compliance of your infrastructure, you must have a reliable visibility and alerting mechanism for Terraform drifts. You can instantly spot any problems that might cause risk.
A dashboard is an excellent tool for displaying all open drifts.
Such a dashboard should provide details on the type of drift, the resources impacted, and the drift severity.
The importance of the drift should be determined by its possible effects, such as whether it may result in a compliance or security breach.

Another good practice is to have a notification system in place to keep you updated on any new drifts. This ought to allow users to get notifications by email or other channels.

Terraform Drift Remediation: Two Effective Approaches

1. Reconcile

What’s the goal? Return everything to how the original Terraform code intended it to be.

When to use? Best for when changes, made outside the Terraform code, need to be reversed.

Here’s a Simple Breakdown:

Your Terraform code sets up an EC2 instance type as t2.micro:

This leads to the creation of the mentioned EC2 instance:

Solution? Run ‘Terraform Apply’ again. It identifies the drift and swiftly rebuilds the EC2 instance to match the original t2.micro type. Just like that, it’s back to the original state. (In the case of the EC2 instance it will actually replace the instance but you get the idea).

2. Align the Code

What’s the goal? Update the Terraform code to mirror the real-time state of your infrastructure.

When to use? Ideal for when changes made outside Terraform are deemed necessary and should remain.

Here’s How it Works:

This is the initial EC2 setup by Terraform

This leads to the creation of the mentioned EC2:

Yet again, someone switches the instance type to t2.nano.
Solution? Instead of rolling back, you opt to adjust the Terraform code.
This means altering the instance type to t2.nano in the code to match the actual setup on the AWS console.

Now running the plan shows ‘No changes’ since the code and the actual setup are aligned:

And there you have it. Both the Terraform state file and AWS console are in sync and up-to-date.

Out-of-the-box Terraform Drift Detection & Remediation

To summarize, detecting Terraform drifts is vital for maintaining the security and efficiency of your cloud environments. It’s essential to take action when drift is identified, understand its causes, and be aware of the problems it might introduce. By using effective methods to detect and rectify drift, you can ensure your system stays in its desired state, and prevent incidents, while ensuring operational excellence.

ControlMonkey is a platform that enhances your Terraform operations and provides a Drift detection and remediation mechanism out of the box.
ControlMonkey’s Terraform Drift Detection consistently compares the current state of your infrastructure with the desired state, promptly notifying you of any discrepancies encountered.
Not only does ControlMonkey excel at pinpointing disparities, but it also offers effective means of rectifying drift.

It presents a user-friendly dashboard for managing detected drifts, issues timely notifications for new deviations, and offers a convenient one-click option for addressing and remediating any drifts.

📙 We made a guide on top 10 common mistakes and errors we see with our customers and prospects + how to avoid them >> Read More

A 30-min meeting will save your team 1000s of hours

Book Intro Call

Author

Ori Yemini

CTO & Co-Founder

Ori Yemini is the CTO and Co-Founder of ControlMonkey. Before founding ControlMonkey, he spent five years at Spot (acquired by NetApp for $400M). Ori holds degrees from Tel Aviv and Hebrew University.

Sounds Interesting?

Request a Demo

The Definitive Guide For Terraform Drift Detection

What Does Terraform Drift Look Like?

Comparing Real-Time Infrastructure to the Terraform State File

More about Terraform Drift and IaC Visibility

Spotting Terraform Drift in Action: A Practical Example

Step 1: Initial Configuration with Terraform

Step 2: Manual Modifications in AWS Console

Step 3: Terraform’s Response to Manual Changes

Reasons for Terraform Drift

1. Lack of Automation

2. Urgency of Hotfixes

3. Insufficient Team Training

4. Third-party Automation

Implications of Terraform Drift

Security at Risk

Compliance in Jeopardy

Financial Implications

Irrelevance of Terraform Code

How to Detect Terraform Drifts

Periodic Checks

Visibility and Notification

Terraform Drift Remediation: Two Effective Approaches

1. Reconcile

2. Align the Code

Out-of-the-box Terraform Drift Detection & Remediation

A 30-min meeting will save your team 1000s of hours

A 30-min meeting will save your team 1000s of hours

Author

Sounds Interesting?

More about Terraform Drift and Iac Visibility

Cloud Governance Best Practices: 5 Ways to Prevent Drift

From Drift to Discipline: Regaining Enterprise Cloud Control Model

EdTech Enterprise Shift from Terraform Drifts to a Standardized Cloud Control

One-click Terraform Drift Remediation

Terraform and OpenTofu Drift Cost Optimization

Drift auto-sync