ferkakta.dev

90 AWS resources in 5 minutes — automating multi-tenant SaaS tenant lifecycle

I recorded our entire tenant lifecycle — create, test, destroy — with no edits. Here’s what 5 minutes of infrastructure automation looks like when there are no tickets, no handoffs, and no “can someone set up the database.”

What happens on tenant create

One GitHub Actions workflow backed by Terraform + a Kubernetes operator:

  1. Validates the tenant name, resolves container images from the latest release branch
  2. Provisions ACM wildcard cert + Route53 DNS records
  3. Creates the Tenant CRD → operator provisions PostgreSQL databases on shared RDS, seeds credentials to SSM
  4. Terraform deploys ExternalSecrets, Deployments, Ingress — 3 services per tenant
  5. SSM parameters auto-seeded: Redis credentials, auth URLs, signing keys — ~40 config values per tenant
  6. Zero static credentials anywhere — IRSA for everything, secrets injected at runtime from SSM via External Secrets Operator

About 5 minutes from nothing to 90 AWS resources and running pods.

The 12-layer test suite

tenant test validates every layer of the stack:

  1. IAM role assumption — IRSA works, pod can assume its tenant-scoped role
  2. Bedrock API — live Claude 3 Haiku call with token cost calculation (down to the fraction of a cent)
  3. S3 Vectors — tenant bucket access works
  4. CloudWatch log routing — logs arrive in the right log group
  5. Log isolation — proves the tenant CANNOT read other tenants’ logs
  6. RDS connectivity — both databases (apiserver + tenant-auth-service) accept connections
  7. HTTP health checks — all three services return 200
  8. Auth handler — shared service is reachable from tenant namespace

Log isolation

The log isolation test is my favorite. It’s not enough to prove your tenant can access its own logs — you have to prove it can’t access anyone else’s. The test attempts to describe log streams in another tenant’s log group and asserts the call is denied. That’s the test that makes auditors smile.

Token cost tracking

After chasing surprise Bedrock bills in another AWS account with no attribution, every tenant test now prints exactly what the AI call cost. Input tokens, output tokens, price per million, total. No more mystery invoices:

Bedrock response: 127 input tokens ($0.000032), 45 output tokens ($0.000056)
Total cost: $0.000088

Tenant destroy

tenant destroy tears it all down cleanly — CRD deletion triggers operator database cleanup (drops PostgreSQL databases and users), Terraform destroys the remaining 87 resources. Clean terraform plan after destroy confirms nothing is left behind.

Why this matters

Multi-tenant SaaS provisioning is the kind of thing that starts with a wiki page (“Step 1: create the database, Step 2: add the DNS record…”) and eventually becomes a one-click workflow. The distance between those two states is enormous, but the end result should look boring. Boring is the point.

#aws #eks #terraform #multi-tenant