ferkakta.dev

The Over-Mighty Subject: why your site repos have too much power

Josh Marshall borrows a phrase from medieval history to describe a modern political problem: the Over-Mighty Subject. A feudal lord whose personal wealth, private army, and territorial control grew so large that he rivaled the crown itself. Not a rebel — still nominally a subject — but operating with enough independent power that the sovereign’s authority became theoretical.

I had three of them in my infrastructure. They were Terraform roots for static sites.

How a site repo becomes a feudal lord

I manage a dozen domains across several site repos. Each site gets a Terraform root that provisions its S3 bucket, CloudFront distribution, ACM certificate, and DNS records. The DNS lives in Cloudflare and Route53 — a multi-provider setup where Cloudflare is the registrar and Route53 hosts the zone. Creating DNS records requires a Cloudflare API token.

The first token I created was scoped to a single domain. Fine. Then I started building a reusable DNS module — multi-provider, handles NS mirroring, MX, SPF, DKIM — and I needed to test it against a second domain. I widened the token’s scope to include both zones. Then I kept adding domains to the module, and each one meant another zone on the token. Eventually I just scoped it to the whole account. I didn’t want to manage one token per domain. Reasonable at the time.

I dropped the token into a terraform.tfvars file, gitignored it, and moved on. When I set up the second site repo, I copy-pasted it into that repo’s terraform.tfvars. Then the third.

Three repos. Same account-scoped token. Each one could modify DNS for every domain I own. A site repo whose only job is to deploy a Hugo blog to S3 had the credentials to rewrite MX records, redirect traffic, or silently add a subdomain pointing anywhere.

That’s an over-mighty subject. The repo’s actual authority — “deploy this one site” — bore no resemblance to its effective power. And the scope didn’t start wide — it grew incrementally, for defensible reasons, one zone at a time.

The gitignore illusion

I’d gitignored the tfvars files, which felt like due diligence. Git would never see the token. But grep -r cloudflare_api_token ~/Sites/ returned three hits instantly. The token was plaintext on disk, in Time Machine backups, on every machine that had ever cloned these repos. Gitignore is a version control convenience, not a security boundary. It tells one tool — git — to look away. Everything else on the filesystem treats the file as readable plaintext, because that’s what it is.

I’d been confusing “not in the repo” with “not exposed.” Those aren’t the same property.

Separating the credential from the consumer

The first fix was moving the token out of the repos entirely. SSM Parameter Store holds it as a SecureString, encrypted at rest with KMS. direnv loads it into the shell when I cd into the project:

# .envrc
export TF_VAR_cloudflare_api_token=$(aws ssm get-parameter \
  --name /vanity-dns/cloudflare-api-token \
  --with-decryption --query Parameter.Value --output text)

Terraform picks up TF_VAR_ environment variables as input variables. The token lives in memory for the duration of the shell session and never touches the filesystem. Rotation is one aws ssm put-parameter --overwrite — not editing three files and hoping I didn’t miss one.

But this only solved storage. The real problem was scope.

Revoking the barony

The site repos never needed a Cloudflare API token. They needed DNS records to exist — a different thing. The records are a dependency, not a responsibility.

I moved all DNS management into a single repo — vanity-dns — built on a unified DNS module that handles both providers in one call. One module invocation per domain: Route53 zone, Cloudflare NS mirroring, MX records, SPF, DKIM — everything. The module’s v0.4.0 release fixed a for_each chicken-and-egg bug where nameserver values weren’t known at plan time on first apply, so it now works in a single pass. Twelve domains, one repo, one token.

vanity-dns is the regent — the entity authorized to wield the credential on behalf of the crown. It holds the Cloudflare API token because managing DNS across all domains is its explicit, singular purpose. The site repos are subjects. They reference their zones as data sources:

data "aws_route53_zone" "main" {
  name = "ferkakta.dev"
}

No Cloudflare provider. No API token. No terraform.tfvars with secrets. The site repo can read its zone to create ACM validation records and CloudFront aliases, but it cannot modify DNS for any other domain. Its power matches its purpose. The regent holds the army. The subjects get what they need and nothing more.

The pattern I missed

Over-mighty subjects don’t announce themselves. Nobody writes a Terraform root and thinks “this repo has too much power.” The scope creep is invisible because each repo works fine in isolation. The token authenticates, the DNS records get created, the site deploys. Everything functions. The problem isn’t operational — it’s architectural. The blast radius of a compromised repo, a leaked backup, or a careless terraform destroy extends far beyond what that repo should be able to touch.

Three site repos with account-level Cloudflare credentials meant that compromising any one of them compromised all twelve domains. After the migration, compromising a site repo gets you access to one S3 bucket and one CloudFront distribution. The DNS is untouchable.

I didn’t set out to build over-mighty subjects. I scoped a token to one domain, then two, then all of them — each step defensible, each step widening the blast radius. I copy-pasted it across repos because it was fast, gitignored it because it felt responsible, and didn’t revisit the decision until grep showed me what I’d actually built. The repos had been accumulating power for months. Nobody complained. Everything worked. That’s how over-mighty subjects operate — quietly, one reasonable concession at a time, until a vassal has more power than the crown.

#terraform #aws #security #cloudflare #iam