What building infrastructure for a startup actually looks like
I spent a day doing the unglamorous infrastructure work that keeps a startup alive. Here’s everything that happened.
Morning: security audit
Audited two EKS clusters for a K8s privilege escalation vulnerability. Found 9 service accounts with cluster-admin that didn’t need it. Deleted two dead deployments — ArgoCD and Velero, both mine, both abandoned months ago. The rest are kubeflow components we can’t touch until 1.36 ships the fix in April.
Then traced VPC peering between our Jenkins box and both clusters, proved making the control planes private has zero blockers, and posted the receipts. Sometimes the most valuable thing you can do is show a teammate that the thing they’ve been afraid to do is already safe.
EKS park/unpark
Built one-click EKS park/unpark — scale node groups to zero or bring them back. A GitHub Actions workflow with a status badge in the README so anyone can check cluster state without a terminal:
# One workflow, two inputs
on:
workflow_dispatch:
inputs:
action:
type: choice
options: [park, unpark]
Saves $87/month, which matters pre-revenue.
The finops rabbit hole
Provisioning a WorkSpace for our compliance advisor led me to discover a sister company had an orphaned SimpleAD running for four years — $1,752 wasted. Unattached EIPs added $2,400. A zombie transit gateway nobody remembered. No tag contract, no expiry enforcement.
So I designed a system:
- Mandatory tags on every Terraform resource:
Owner,ExpiresOn,CostCenter - Organizations tag policies + SCP denies for untagged resources
- Daily Lambda scanning for expired resources, Slack notifications, auto-quarantine after 14 days of silence
- Weekly sweeps for orphaned EIPs, idle NAT gateways, unattached volumes, stopped instances
- Monthly justify-or-kill reviews — three questions:
- Has it handled traffic in 30 days?
- Is anything referencing it?
- Can we recreate it in an hour?
No three yeses, no survival without written justification.
SES blocker
Applied for SES production access so our IdP can send MFA codes. AWS denied it — they detected a related account with SES production already. Separate legal entities, separate AWS Organizations, but shared personnel email addresses.
Appealing. Meanwhile a teammate bypassed it with a cross-account IAM role to the other org’s dev SES. Not a permanent solution, but it unblocked us.
The badge took five iterations
The security audit uncovered credentials that shouldn’t exist. A Bitbucket API token had expired and took 45 minutes to debug. None of this is content-worthy by itself.
But this is what building infrastructure for a startup actually looks like. Not architecture diagrams. Not conference talks. Just methodically reducing risk and cost, one kubectl delete at a time.