A leading virtual education platform serves millions of students and educators worldwide. It manages a distributed architecture on AWS and relies heavily on Terraform to orchestrate its infrastructure. With over five years of Terraform code spread across 50 repositories, it faced challenges in managing large-scale cloud environments, such as Terraform drifts, inconsistent configurations, and a lack of visibility.
They turned to ControlMonkey to address these issues, aiming to modernize its Terraform operations and create a standardized, scalable infrastructure.
The Challenges
Before adopting ControlMonkey, the DevOps team encountered several operational challenges that worsened as they scaled.
- Terraform Drifts: Years of Terraform code and multiple users modifying resources in production through ClickOps have led to mismatched versions and frequent drifts, necessitating hours of manual investigation.
- Lack of Visibility: Understanding the state of resources across 50 repositories was challenging without a central control plane.
- Inconsistent CI/CD Pipelines: Each repository used different CI scripts, leading to inefficiencies, confusion, and lack of standardization.
- Compliance Audits: Ensuring compliance with CIS and PCI standards in multiple regions was time-consuming, especially during audit preparation.
The Solution
After discovering ControlMonkey through an online search, they quickly started a POC and began implementing the platform. Within a day, they had onboarded their first organization, and its benefits became evident to other DevOps team members and engineering group teams. The POC’s results were promising, so they chose ControlMonkey as its Terraform Automation Platform.
The Key capabilities leveraged by the DevOps team include the following:
- Drift Detection & Remediation: Real-time detection of infrastructure drifts with one-click remediation significantly reduced manual investigation time, saving the DevOps team countless hours.
- Terraform CI/CD: Unified pipelines with GitOps standardization ensured compliance across all repositories, simplifying audits for standards like CIS and PCI. This boosted developer confidence in Terraform and made it more user-friendly.
- Complete Asset Inventory: A centralized control plane provided a clear yet deep view of the infrastructure, allowing the team to fully understand the current state of their cloud resources and transition from reactive to proactive management.
- Identifying Unused Resources: With the missing visibility provided by the ControlMonkey platform, the team was able to identify and eliminate zombie resources, resulting in immediate savings that covered the cost of the platform.
The Results
Within a year of using ControlMonkey to automate and govern cloud infrastructure, they have achieved:
- Improved DevOps Efficiency: Terraform task lead times were cut by an average of 20%, freeing the team to focus on higher-value work and strategic innovations.
- Terraform Accessibility and Collaboration: ControlMonkey is used by 15 DevOps engineers daily and by more than 100 developers weekly.
ControlMonkey is a cross-team infrastructure collaboration platform that makes Terraform more accessible to less proficient developers. - Standardization Across Repos: ControlMonkey brought consistency to their CI/CD pipelines, reducing errors and misconfigurations, enhancing reliability, and ensuring compliance.
- Increased Scalability: The ControlMonkey platform enabled faster, more efficient scaling of cloud resources on AWS infrastructure.
- Cost Savings: Identifying unused VPCs and other zombie resources has already saved the company significant costs.
“The cost savings were so significant that we could cover the annual expense of the ControlMonkey platform, which I found remarkable.”
Ivan Carrion, Principal DevOps Engineer
Future Plans
ControlMonkey is helping the DevOps team to implement new Terraform CI/CD policies to restrict resource usage, enhance compliance, and cut costs.
The platform has allowed for the complete shutdown of console operations (ClickOps), enabling them to fully utilize Infrastructure as Code (IaC) and maximize Terraform’s potential.
Conclusion
ControlMonkey has been crucial in helping the DevOps team modernize its Terraform operations, improve visibility, and reduce costs. By resolving significant bottlenecks and optimizing workflows with Terraform Automation, they are better prepared to scale infrastructure and concentrate more on innovation.