I've been managing infrastructure for three teams across staging and production, and I've learned the hard way that Terraform state is either your best friend or your worst enemy depending on how you handle it.
I started with local state files committed to Git. This lasted about two weeks before someone accidentally pushed credentials and we had a production incident. Then I moved everything to S3 with state locking via DynamoDB, which is what I'd recommend now.
The difference is night and day. With remote state in S3, I can:
Here's my actual setup:
terraform {
backend "s3" {
bucket = "my-org-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
I also split state by environment (dev/staging/prod) and by domain (networking/databases/services). This prevents one person's VPC experiment from blocking production database changes.
The only gotcha I hit was IAM permissions. Teams needed to access their state files without accessing everything. I use resource-based policies to lock access by prefix patterns.
Local state still has a place for personal experimentation, but for anything shared or production-bound, remote state with locking is non-negotiable. The cost of S3 and DynamoDB is negligible compared to the headaches it prevents.
S3 + DynamoDB is the right move, but honestly the real win is splitting state by environment and ownership. One monolithic state file across three teams is a recipe for conflicts and accidental rollbacks.
What actually saved us: separate Terraform workspaces per team, remote state in S3, and strict IAM policies so teams can't touch each other's infrastructure. Added a pre-commit hook to catch credential leaks before they hit Git.
The credentials thing never stops being a problem though. I've seen it happen even with remote state. Consider using something like Vault or AWS Secrets Manager for anything sensitive, not Terraform variables.
S3 + DynamoDB is the right call, yeah. But honestly, the real win is splitting state by environment and team boundary, not just throwing everything behind locking. We run separate state files per service per environment. Makes rollbacks way less scary and keeps blast radius small.
One thing I'd add: lock timeout tuning matters more than people think. Set it too high and a crashed CI job blocks everyone for hours. We use 30s with exponential backoff. Also, encrypt that state file at rest. S3 encryption is free.
S3 + DynamoDB is solid, but honestly I'd push harder on the team process side. We had the same setup and still had someone manually run terraform apply from their laptop because they "just needed to fix one thing quickly."
Remote state only solves half the problem. You need policy: one person deploys to prod, state locking actually enforced (not just configured), maybe a plan approval step. We started running terraform plan in CI and posting diffs to Slack before anyone touched prod. Sounds heavy but caught mistakes constantly.
The credentials thing though - use AWS IAM roles instead of storing keys anywhere. Game changer for peace of mind.
S3 with DynamoDB locking is solid, but I'd add a few operational things I've learned:
Use separate state files per environment and per service. One monolithic state file means one person locks everyone out. I structure mine like terraform/services/{service-name}/{environment}/.
Enable versioning and MFA delete on your S3 bucket. Saves you when someone runs terraform destroy in the wrong workspace.
Also enforce read-only access for most team members. Use IAM roles so people can only plan, not apply. Approvals go through CI/CD—I use GitHub Actions to run terraform plan, then require manual approval before the workflow runs apply.
terraform {
backend "s3" {
bucket = "my-tfstate"
key = "services/api/prod/terraform.tfstate"
dynamodb_table = "terraform-locks"
}
}
The real win is making it so developers can't accidentally blow up production from their laptop.
I'd add one critical piece: separate state files by environment and team ownership. I've seen teams try to manage everything in one state, and scaling becomes a nightmare—one typo risks the entire infrastructure.
What worked for me: one state per environment (staging/prod), organized by logical component. Use terraform_remote_state data sources for cross-stack references. This way teams own their boundaries clearly.
Also enforce state locking religently. I've watched people disable it "just this once" during deploys. Never again. DynamoDB locking saved us from concurrent modifications more than once.
The real win: pair this with clear RBAC on S3 and DynamoDB. State contains secrets and sensitive outputs—treat it like production data.
S3 + DynamoDB is solid. One thing I'd add though: encrypt that S3 bucket and enable versioning. Saw a team lose state to a bad terraform apply once because they skipped versioning. Also lock down IAM aggressively. I've found state files become a privilege escalation vector if you're not careful.
One gotcha: DynamoDB locking can fail silently under network partitions. We use it but monitor for stuck locks. Otherwise you get devs force-unlocking and clobbering each other's changes. Been there.
S3 + DynamoDB is the baseline. Good call moving off local state.
What actually saved us though was splitting state by environment and team ownership. One big monolithic state file becomes a nightmare when three teams need to deploy simultaneously. We went repo-per-team with their own backend config pointing to separate S3 buckets. Eliminates lock contention and makes blast radius predictable.
Also got tired of people manually running terraform destroy in prod. Now everything flows through CI/CD. GitHub Actions plan output gets reviewed, approve button triggers apply. Single source of truth, audit trail, no late night surprises.
Tom Lindgren
Senior dev. PostgreSQL and data engineering.
S3 + DynamoDB is solid, but I'd push back on one thing: you still need to think hard about state organization. I've seen teams put everything in one bucket and it becomes a nightmare when you need to audit or rotate credentials.
What actually works: separate state per environment per service, with IAM policies tight enough that teams can't read production state they shouldn't. And terraform_remote_state data sources are your friend for cross-stack references, not shared state files.
The credentials thing you mentioned. S3 doesn't encrypt state by default. Enable it, always.