As cloud infrastructure becomes increasingly complex, many DevOps teams use AWS with Atlantis to automate Terraform workflows. This open-source tool links Git pull requests to Terraform operations. It helps teams improve Infrastructure as Code practices across different environments. It also helps maintain governance on a large scale.
Terraform is widely adopted for provisioning AWS infrastructure—but as environments grow, teams encounter new layers of complexity:
- Multiple DevOps teams making concurrent changes
- Hundreds of thousands of resources across accounts
- Complex dependencies between modules and services
- Security, IAM, and compliance constraints
- Need for consistent, auditable deployments at scale
Many teams start with Atlantis—but as infrastructure scales, so do the limitations. This post is your deep-dive guide to scaling Terraform on AWS with Atlantis—and making it work in high-scale, multi-team environments.
👉 Want to explore alternative tools beyond Atlantis? Read our comparison blog
What is Atlantis?
Atlantis is an open-source tool that automates the Terraform workflow using pull requests. It bridges your version control system (GitHub, GitLab, or Bitbucket) and Terraform execution and enables collaborative infrastructure development.
How Atlantis Works with Terraform
Atlantis listens for webhook events in your repository hosting service. When a pull request modifies Terraform configuration files, Atlantis automatically:
- Runs terraform plan on the changed files
- Post a comment directly on the pull request
- Provides a mechanism to deliver changes by commenting
- Lock workspaces to prevent multiple concurrent changes
Here’s a typical diagram of where Atlantis fits within your workflow:
Key Features of Atlantis:
- Pull Request-based Workflow: Atlantis syncs your Git repository and automatically triggers Terraform runs on open or updated pull requests.
- Approval Process: Atlantis integrates support for approval workflow so that teams may audit Terraform plans before deployment to guarantee that modifications are compliant and secure.
- Multi-Tenant Support: It enables multiple Terraform configurations for different environments so that multiple teams are unaffected by each other.
- State Locking: Terraform handles state locking internally to prevent concurrent runs from overriding each other.
To see how Atlantis compares to other Terraform automation tools, check out our in-depth Atlantis alternatives guide.
5 Best Practices for Scaling Terraform with AWS Atlantis
Before diving into Terraform scaling on AWS with Atlantis, you need to understand some basics about the tool. Here are five key points about Atlantis to help you start scaling your Terraform workflow:
1. Use Terraform Workspaces for Multi-Environment
When dealing with large AWS infrastructures, you must split your Infrastructure into multiple environments (e.g., dev, staging, production). Terraform workspaces fit well in Atlantis. You can have multiple state files for different environments. This allows you to keep one large codebase.
Example of Workspace Configuration:
terraform workspace new dev
terraform workspace select dev
terraform apply -var="environment=dev"
2. Custom Workflows for Complex Pipelines
Atlantis’s default workflow (plan → apply) works for simple cases, but complex Infrastructure often requires custom steps:
Custom workflow definition in atlantis.yaml:
workflows:
custom:
plan:
steps:
- run: terraform init -input=false
- run: terraform validate
- run: terraform plan -input=false -out=$PLANFILE
- run: aws s3 cp $PLANFILE s3://terraform-audit-bucket/plans/$WORKSPACE-$PULL_NUM.tfplan
apply:
steps:
- run: terraform apply -input=false $PLANFILE
- run: ./notify-slack.sh "Applied changes to $WORKSPACE by $USER"
3. Handling State Files Securely
Scaling and managing Terraform state becomes critical and Atlantis works best with remote state storage:
terraform {
terraform {
backend "s3" {
bucket = "terraform-state-${var.environment}"
key = "network/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
4. Security and Access Control for Atlantis
Atlantis also facilitates using SSH and IAM roles to secure AWS communications. Atlantis also allows you to lock down who will approve and execute Terraform plans as a security and accountability mechanism. You also can establish AWS IAM roles in Atlantis to communicate with AWS resources securely.
resource "aws_iam_role" "atlantis" {
name = "atlantis-execution-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy_attachment" "atlantis_policy" {
role = aws_iam_role.atlantis.name
policy_arn = "arn:aws:iam::aws:policy/PowerUserAccess"
}
Assuming Different Roles for Different Environments
#In your provider configuration
provider "aws" {
region = "us-west-2"
assume_role {
role_arn = "arn:aws:iam::${var.account_id}:role/TerraformExecutionRole"
}
}
5. Automating Terraform Plans and Applies
Using Atlantis after you set up Atlantis on your Git repository, the Terraform plan runs automatically. This happens for all updated or opened PRs. Atlantis also has a provision to apply Terraform changes directly once the PR has been approved. This removes the necessity for Terraform to run within the CI/CD pipeline.
AWS Atlantis Challenges When Scaling Terraform
1. Slow Plan and Apply Times
When the Infrastructure grows, Terraform operations begin to slow. Large infrastructures have 5-10-min or longer plans that act as bottlenecks.
Solution: Use Workspace Splitting
Divide monolithic designs into separate, focused work areas:
atlantis.yaml with parallel execution:
version: 3
parallel_plan: true
parallel_apply: true
projects:
- name: networking
dir: networking
- name: databases
dir: databases
- name: compute
dir: compute
2: Managing Permissions Across Multiple AWS Accounts
In the case of multiple AWS accounts, managing permissions becomes complex.
Solution: Use Cross-Account Role Assumption
Create roles in each account that Atlantis can assume
resource "aws_iam_role" "terraform_execution_role" {
name = "terraform-execution-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${var.atlantis_account_id}:role/atlantis-role"
}
}]
})
}
#In your provider configuration
provider "aws" {
alias = "production"
region = "us-west-2"
assume_role {
role_arn = "arn:aws:iam::${var.production_account_id}:role/terraform-execution-role"
}
}
3: Managing Terraform Version Compatibility
As your Infrastructure expands, it becomes challenging to manage Terraform version updates.
Solution: Use Terraform Version Control with Atlantis
#atlantis.yaml
version: 3
projects:
- name: legacy-system
dir: legacy
terraform_version: 0.14.11
- name: new-system
dir: new
terraform_version: 1.5.7
4: Sensitive Variable Control
Managing secrets securely with Terraform and Atlantis requires careful consideration.
Solution: AWS Secrets Manager Integration
Create a wrapper script for Terraform that fetches secrets:
#!/bin/bash
fetch-secrets.sh
Get database password from Secrets Manager
DB_PASSWORD=$(aws secretsmanager get-secret-value --secret-id db/password --query SecretString --output text)
Export as environment variable for Terraform
export TF_VAR_db_password="$DB_PASSWORD"
Execute terraform with all arguments passed to this script
terraform "$@"
Then update your Atlantis workflow:
workflows:
secure:
plan:
steps:
- run: ./fetch-secrets.sh init -input=false
- run: ./fetch-secrets.sh plan -input=false -out=$PLANFILE
apply:
steps:
- run: ./fetch-secrets.sh apply -input=false $PLANFILE
How Teams Automate Workflows to Scale Terraform Deployments on AWS
Step 1: Implement Repository Structure for Scale
Organize your Terraform code for maximum parallelization and clear ownership:
Step 2: Set Up Advanced Atlantis Configuration
#atlantis.yaml
version: 3
automerge: true
delete_source_branch_on_merge: true
parallel_plan: true
parallel_apply: true
workflows:
production:
plan:
steps:
- run: terraform init -input=false
- run: terraform validate
- run: terraform plan -input=false -out=$PLANFILE
- run: ./policy-check.sh
apply:
steps:
- run: ./pre-apply-checks.sh
- run: terraform apply -input=false $PLANFILE
- run: ./post-apply-validation.sh
- run: ./notify-teams.sh "$WORKSPACE changes applied by $USER"
projects:
- name: prod-network
dir: accounts/production/networking
workflow: production
autoplan:
when_modified: ["*.tf", "../../../modules/networking/**/*.tf"]
apply_requirements: ["approved", "mergeable"]
- name: prod-databases
dir: accounts/production/databases
workflow: production
autoplan:
when_modified: ["*.tf", "../../../modules/database/**/*.tf"]
apply_requirements: ["approved", "mergeable"]
#Additional projects would be defined similarly
Step 3: Implement Dependency Management
Create a script to manage dependencies between projects:
#!/bin/bash
dependency-manager.sh
Define dependencies
declare -A dependencies
dependencies["prod-compute"]="prod-network prod-databases"
dependencies["staging-compute"]="staging-network staging-databases"
Check if dependencies have been successfully applied
check_dependency() {
local dependency=$1
local status=$(curl -s "http://atlantis-server:4141/api/projects/$dependency" | jq -r '.status')
if [[ "$status" == "applied" ]]; then
return 0
else
return 1
fi
}
Check all dependencies for the current project
PROJECT_NAME=$1
if [[ -n "${dependencies[$PROJECT_NAME]}" ]]; then
for dep in ${dependencies[$PROJECT_NAME]}; do
if ! check_dependency "$dep"; then
echo "Dependency $dep is not in applied state. Cannot proceed."
exit 1
fi
done
fi
If we get here, all dependencies are met
echo "All dependencies satisfied, proceeding with Terraform operation"
exit 0
Step 4: Implement Drift Detection
Create a scheduled task to detect infrastructure drift:
resource "aws_cloudwatch_event_rule" "drift_detection" {
name = "terraform-drift-detection"
description = "Triggers Terraform drift detection"
schedule_expression = "cron(0 4 ? *)" # Run daily at 4 AM
}
resource "aws_cloudwatch_event_target" "drift_detection_lambda" {
rule = aws_cloudwatch_event_rule.drift_detection.name
target_id = "DriftDetectionLambda"
arn = aws_lambda_function.drift_detection.arn
}
resource "aws_lambda_function" "drift_detection" {
function_name = "terraform-drift-detection"
role = aws_iam_role.drift_detection_lambda.arn
handler = "index.handler"
runtime = "nodejs16.x"
timeout = 300
environment {
variables = {
ATLANTIS_URL = "https://atlantis.controlmonkey.com"
GITHUB_TOKEN = "{{resolve:secretsmanager:github/token:SecretString:token}}"
}
}
}
Step 5: Implement Approval Workflows with AWS Services
resource "aws_lambda_function" "approval_notification" {
function_name = "terraform-approval-notification"
role = aws_iam_role.approval_lambda.arn
handler = "index.handler"
runtime = "nodejs16.x"
environment {
variables = {
SNS_TOPIC_ARN = aws_sns_topic.terraform_approvals.arn
}
}
}
resource "aws_sns_topic" "terraform_approvals" {
name = "terraform-approval-requests"
}
resource "aws_sns_topic_subscription" "approval_email" {
topic_arn = aws_sns_topic.terraform_approvals.arn
protocol = "email"
endpoint = "[email protected]"
}
resource "aws_api_gateway_resource" "webhook" {
rest_api_id = aws_api_gateway_rest_api.atlantis_extensions.id
parent_id = aws_api_gateway_rest_api.atlantis_extensions.root_resource_id
path_part = "webhook"
}
resource "aws_api_gateway_method" "webhook_post" {
rest_api_id = aws_api_gateway_rest_api.atlantis_extensions.id
resource_id = aws_api_gateway_resource.webhook.id
http_method = "POST"
authorization_type = "NONE"
}
What If Atlantis with AWS Isn’t Enough?
If your team is managing thousands of Terraform resources, dozens of AWS accounts, or struggling with policy enforcement and visibility—you may have outgrown Atlantis.
While Atlantis is a solid open-source tool for automating Terraform plans and applies through pull requests, it wasn’t designed for enterprise-scale cloud governance. Teams scaling Terraform on AWS often face challenges around:
- Large, complex configurations
- Multi-account IAM permissions
- Policy enforcement and compliance gaps
- ClickOps and infrastructure drift
This is where a platform like ControlMonkey comes in—offering full visibility, automated drift detection, real-time policy enforcement, and Terraform CI/CD that works across cloud and code.
Infrastructure automation should grow with your cloud footprint. If Atlantis is slowing you down, it’s time to explore what’s next.
👉 Book a demo and see how ControlMonkey scales what Atlantis started.