Terraform Infrastructure as Code
Overview
Our infrastructure is managed using Terraform, providing declarative, version-controlled infrastructure deployment across Google Cloud Platform. All production resources are defined in code, ensuring reproducibility and consistency.
Repository Structure
infrastructure/
├── terraform/
│ ├── environments/
│ │ ├── production/
│ │ │ ├── main.tf
│ │ │ ├── variables.tf
│ │ │ ├── outputs.tf
│ │ │ └── terraform.tfvars
│ │ └── staging/
│ ├── modules/
│ │ ├── gke-cluster/
│ │ ├── tigerbeetle/
│ │ ├── cloudsql/
│ │ └── networking/
│ └── backend.tf
Core Modules
GKE Cluster Module
Manages the Google Kubernetes Engine cluster configuration:
- Node pools (TigerBeetle dedicated, default)
- Autoscaling policies
- Network configuration
- Workload Identity setup
TigerBeetle Module
Deploys and configures TigerBeetle financial ledger:
- StatefulSet with 3 replicas
- Local SSD storage provisioning
- Service and LoadBalancer configuration
- Monitoring integration
CloudSQL Module
Manages PostgreSQL instances for Temporal:
- Instance configuration (db-g1-small)
- Database creation
- Private IP allocation
- Backup policies
Networking Module
Configures VPC and network security:
- Custom VPC network
- Subnets with secondary ranges
- Cloud NAT for outbound connectivity
- Firewall rules
State Management
Remote State
terraform {
backend "gcs" {
bucket = "earna-terraform-state"
prefix = "production"
}
}
State Locking
- Uses GCS native locking
- Prevents concurrent modifications
- Automatic lock release on errors
Deployment Workflow
Local Development
# Initialize Terraform
cd infrastructure/terraform/environments/production
terraform init
# Plan changes
terraform plan -out=tfplan
# Apply changes
terraform apply tfplan
CI/CD Pipeline
# GitHub Actions workflow
- name: Terraform Plan
run: |
terraform init
terraform plan -out=tfplan
- name: Terraform Apply
if: github.ref == 'refs/heads/main'
run: terraform apply -auto-approve tfplan
Resource Tagging
All resources are tagged for cost tracking and management:
labels = {
environment = "production"
managed_by = "terraform"
team = "platform"
cost_center = "engineering"
}
Key Resources Managed
Compute Resources
- GKE cluster:
platform-production
- Node pools:
tigerbeetle-pool
,default-pool
- Compute instances for TigerBeetle nodes
Storage Resources
- Local SSDs for TigerBeetle (375GB per node)
- Persistent disks for stateful workloads
- GCS buckets for backups and artifacts
Networking Resources
- VPC:
platform-network
- Subnets with /20 CIDR blocks
- Cloud NAT gateway
- External LoadBalancers
Database Resources
- CloudSQL PostgreSQL:
temporal-production
- Databases:
temporal
,temporal_visibility
IAM & Security
- Service accounts for workload identity
- IAM bindings for GKE nodes
- Secret Manager for sensitive data
Terraform Variables
Required Variables
variable "project_id" {
description = "GCP Project ID"
type = string
}
variable "region" {
description = "GCP Region"
type = string
default = "us-central1"
}
variable "cluster_name" {
description = "GKE Cluster Name"
type = string
default = "platform-production"
}
Environment-Specific Values
# terraform.tfvars
project_id = "production-earna-ai"
region = "us-central1"
tigerbeetle_replicas = 3
tigerbeetle_storage_size = 375
node_pool_configs = {
tigerbeetle = {
machine_type = "c3-standard-4-lssd"
node_count = 3
}
default = {
machine_type = "e2-standard-4"
node_count = 2
}
}
Outputs
Key outputs exported for use by other tools:
output "cluster_endpoint" {
value = google_container_cluster.primary.endpoint
}
output "tigerbeetle_external_ip" {
value = kubernetes_service.tigerbeetle.status[0].load_balancer[0].ingress[0].ip
}
output "grafana_url" {
value = "http://${kubernetes_service.grafana.status[0].load_balancer[0].ingress[0].ip}"
}
Best Practices
Version Pinning
terraform {
required_version = ">= 1.5.0"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.23"
}
}
}
Resource Naming Convention
resource "google_compute_instance" "example" {
name = "${var.environment}-${var.service}-${var.component}"
# Example: production-tigerbeetle-node-1
}
Cost Optimization
- Use preemptible nodes for non-critical workloads
- Implement auto-scaling policies
- Regular review of unused resources
- Committed use discounts for stable workloads
Disaster Recovery
Backup Strategy
- Terraform state backed up to GCS with versioning
- Infrastructure can be recreated from code
- Data backups handled separately
Recovery Procedures
# Restore from state backup
gsutil cp gs://earna-terraform-state-backup/latest.tfstate .
terraform init -reconfigure
terraform refresh
terraform plan
Security Considerations
Secrets Management
- Never commit secrets to repository
- Use Secret Manager for sensitive values
- Reference secrets in Terraform:
data "google_secret_manager_secret_version" "api_key" {
secret = "api-key"
}
Access Control
- Terraform service account with minimal permissions
- Separate accounts for different environments
- Audit logging enabled for all changes
Monitoring & Alerts
Terraform Cloud Integration
- Remote execution for production changes
- Policy as code with Sentinel
- Cost estimation before apply
Change Notifications
- Slack notifications for apply operations
- Email alerts for failed runs
- Audit trail in Cloud Logging
Common Operations
Scaling Node Pools
# Update node count
terraform apply -var="default_node_count=3"
Updating TigerBeetle
# Update version
terraform apply -var="tigerbeetle_version=0.15.6"
Adding New Resources
- Create module in
modules/
- Reference in environment
main.tf
- Run
terraform plan
to preview - Apply after review
Troubleshooting
State Lock Issues
# Force unlock (use with caution)
terraform force-unlock <lock-id>
Import Existing Resources
# Import GKE cluster
terraform import google_container_cluster.primary projects/production-earna-ai/locations/us-central1/clusters/platform-production
Debugging
# Enable debug logging
export TF_LOG=DEBUG
terraform plan
Future Improvements
- Implement Terragrunt for DRY configurations
- Add automated testing with Terratest
- Implement cost policies with Infracost
- Multi-region disaster recovery setup
Last updated on