What building infrastructure for a startup actually looks like

2026-02-11

I spent a day doing the unglamorous infrastructure work that keeps a startup alive. Here’s everything that happened.

Morning: security audit

Audited two EKS clusters for a K8s privilege escalation vulnerability. Found 9 service accounts with cluster-admin that didn’t need it. Deleted two dead deployments — ArgoCD and Velero, both mine, both abandoned months ago. The rest are kubeflow components we can’t touch until 1.36 ships the fix in April.

Then traced VPC peering between our Jenkins box and both clusters, proved making the control planes private has zero blockers, and posted the receipts. Sometimes the most valuable thing you can do is show a teammate that the thing they’ve been afraid to do is already safe.

EKS park/unpark

Built one-click EKS park/unpark — scale node groups to zero or bring them back. A GitHub Actions workflow with a status badge in the README so anyone can check cluster state without a terminal:

# One workflow, two inputs
on:
  workflow_dispatch:
    inputs:
      action:
        type: choice
        options: [park, unpark]

Saves $87/month, which matters pre-revenue.

The finops rabbit hole

Provisioning a WorkSpace for our compliance advisor led me to discover a sister company had an orphaned SimpleAD running for four years — $1,752 wasted. Unattached EIPs added $2,400. A zombie transit gateway nobody remembered. No tag contract, no expiry enforcement.

So I designed a system:

Mandatory tags on every Terraform resource: Owner, ExpiresOn, CostCenter
Organizations tag policies + SCP denies for untagged resources
Daily Lambda scanning for expired resources, Slack notifications, auto-quarantine after 14 days of silence
Weekly sweeps for orphaned EIPs, idle NAT gateways, unattached volumes, stopped instances
Monthly justify-or-kill reviews — three questions:
1. Has it handled traffic in 30 days?
2. Is anything referencing it?
3. Can we recreate it in an hour?

No three yeses, no survival without written justification.

SES blocker

Applied for SES production access so our IdP can send MFA codes. AWS denied it — they detected a related account with SES production already. Separate legal entities, separate AWS Organizations, but shared personnel email addresses.

Appealing. Meanwhile a teammate bypassed it with a cross-account IAM role to the other org’s dev SES. Not a permanent solution, but it unblocked us.

The badge took five iterations

The security audit uncovered credentials that shouldn’t exist. A Bitbucket API token had expired and took 45 minutes to debug. None of this is content-worthy by itself.

But this is what building infrastructure for a startup actually looks like. Not architecture diagrams. Not conference talks. Just methodically reducing risk and cost, one kubectl delete at a time.

#aws #eks #finops #security