Building a Production VPC with Terraform: My Standard Setup
The VPC architecture I deploy for every new AWS project - multi-AZ, public/private subnets, NAT gateways, and cost optimizations.
Every AWS project starts with a VPC. Over time, I've developed a standard architecture that handles most production workloads. It's multi-AZ for high availability, properly segmented for security, and includes cost optimizations I've learned through experience.
Here's what I deploy and why.
The Architecture
┌─────────────────────────────────────────────────────┐
│ VPC (10.0.0.0/16) │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ AZ us-east-1a │ │ AZ us-east-1b │ │
│ │ ┌───────────┐ │ │ ┌───────────┐ │ │
│ │ │ Public │ │ │ │ Public │ │ │
│ │ │ 10.0.1.0 │ │ │ │ 10.0.2.0 │ │ │
│ │ └───────────┘ │ │ └───────────┘ │ │
│ │ ┌───────────┐ │ │ ┌───────────┐ │ │
│ │ │ Private │ │ │ │ Private │ │ │
│ │ │ 10.0.11.0 │ │ │ │ 10.0.12.0 │ │ │
│ │ └───────────┘ │ │ └───────────┘ │ │
│ │ ┌───────────┐ │ │ ┌───────────┐ │ │
│ │ │ Database │ │ │ │ Database │ │ │
│ │ │ 10.0.21.0 │ │ │ │ 10.0.22.0 │ │ │
│ │ └───────────┘ │ │ └───────────┘ │ │
│ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────┘
Three tiers across two availability zones:
- Public subnets - Load balancers, NAT gateways, bastion hosts
- Private subnets - Application servers, containers
- Database subnets - RDS, ElastiCache, no internet access
Core VPC Setup
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project}-${var.environment}-vpc"
Environment = var.environment
}
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
}
I use /16 for the VPC CIDR. It provides 65,536 addresses - more than enough room to grow without re-architecting.
Subnet Strategy
I calculate CIDRs dynamically to avoid manual errors:
locals {
azs = ["us-east-1a", "us-east-1b"]
public_cidrs = [for i, az in local.azs : cidrsubnet("10.0.0.0/16", 8, i + 1)]
private_cidrs = [for i, az in local.azs : cidrsubnet("10.0.0.0/16", 8, i + 11)]
database_cidrs = [for i, az in local.azs : cidrsubnet("10.0.0.0/16", 8, i + 21)]
}
resource "aws_subnet" "public" {
count = length(local.azs)
vpc_id = aws_vpc.main.id
cidr_block = local.public_cidrs[count.index]
availability_zone = local.azs[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project}-public-${local.azs[count.index]}"
Tier = "public"
}
}
resource "aws_subnet" "private" {
count = length(local.azs)
vpc_id = aws_vpc.main.id
cidr_block = local.private_cidrs[count.index]
availability_zone = local.azs[count.index]
tags = {
Name = "${var.project}-private-${local.azs[count.index]}"
Tier = "private"
}
}
resource "aws_subnet" "database" {
count = length(local.azs)
vpc_id = aws_vpc.main.id
cidr_block = local.database_cidrs[count.index]
availability_zone = local.azs[count.index]
tags = {
Name = "${var.project}-database-${local.azs[count.index]}"
Tier = "database"
}
}
NAT Gateway Configuration
NAT gateways enable private subnet internet access. They're also expensive - about $32/month each plus data processing fees.
resource "aws_eip" "nat" {
count = var.single_nat_gateway ? 1 : length(local.azs)
domain = "vpc"
}
resource "aws_nat_gateway" "main" {
count = var.single_nat_gateway ? 1 : length(local.azs)
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
depends_on = [aws_internet_gateway.main]
}
My approach: One NAT per AZ in production for high availability. Single NAT in dev/staging to save costs. The single_nat_gateway variable controls this.
Routing
Public subnets route to the internet gateway:
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
}
Private subnets route through NAT:
resource "aws_route_table" "private" {
count = length(local.azs)
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = var.single_nat_gateway ? aws_nat_gateway.main[0].id : aws_nat_gateway.main[count.index].id
}
}
Database subnets get no internet route - they should never reach out directly.
VPC Endpoints for Cost Savings
This is often overlooked. Traffic to S3 and DynamoDB through NAT gateway costs money. Gateway endpoints are free and keep traffic private.
resource "aws_vpc_endpoint" "s3" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.s3"
route_table_ids = concat(
[aws_route_table.public.id],
aws_route_table.private[*].id
)
}
resource "aws_vpc_endpoint" "dynamodb" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${var.region}.dynamodb"
route_table_ids = aws_route_table.private[*].id
}
I've seen accounts save hundreds per month just by adding these endpoints.
Security Group Baseline
I lock down the default security group to prevent accidental use:
resource "aws_default_security_group" "default" {
vpc_id = aws_vpc.main.id
# No rules - blocks all traffic
}
Then create explicit security groups for each tier with minimal required access.
What I Output
output "vpc_id" {
value = aws_vpc.main.id
}
output "public_subnet_ids" {
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
value = aws_subnet.private[*].id
}
output "database_subnet_ids" {
value = aws_subnet.database[*].id
}
These outputs feed into other modules - EKS clusters, RDS instances, load balancers.
Cost Optimization
| Component | Monthly Cost | How I Optimize |
|---|---|---|
| NAT Gateway | $32 + data | Single NAT for non-prod |
| Elastic IPs | $3.65 unused | Delete when not attached |
| Interface Endpoints | $7.30 each | Only enable what's needed |
| Gateway Endpoints | Free | Always enable S3 + DynamoDB |
For dev environments, I sometimes skip NAT entirely and use SSM Session Manager for access. No internet egress needed, no NAT cost.
Lessons Learned
Always deploy across multiple AZs. I've seen single-AZ deployments go down during AWS outages. The extra NAT gateway cost is insurance.
Database subnets should never have internet routes. If your database needs to reach the internet, something is wrong with your architecture.
Enable VPC Flow Logs for production. When security incidents happen, you'll want the network traffic logs. Store them in S3 with lifecycle policies to manage costs.
Tag everything. Proper tagging enables cost allocation and makes troubleshooting easier. I tag with Project, Environment, and Tier at minimum.
Key Takeaways
- Three-tier architecture - Public, private, database subnets provide proper isolation
- Multi-AZ is essential - Single AZ is a single point of failure
- NAT gateways are expensive - Use single NAT for non-prod, VPC endpoints where possible
- Gateway endpoints are free - Always enable S3 and DynamoDB endpoints
- Lock down defaults - Restrict the default security group
- Plan for growth - Use /16 CIDR to avoid future re-architecture
Written by Bar Tsveker
Senior CloudOps Engineer specializing in AWS, Terraform, and infrastructure automation.
Thanks for reading! Have questions or feedback?