Finops on ferkakta.dev

FinOps portfolio: 71 tickets over 5 years

Wed, 01 Apr 2026 15:30:00 -0500

My first finops ticket was called “Optimize the AWS infrastcuture.” The typo is still there. That was 2021 — a one-person infrastructure team at a startup that didn’t have the word finops in its vocabulary and didn’t know it needed one.

Five years later I went looking for every cost-related ticket I’d ever created. I expected maybe thirty. I found 71, spread across 8 Jira projects, touching every layer of the stack from EBS volumes to LLM inference spend. Nobody asked me to create a finops practice. I just kept looking at the bill and refusing to pay for things that didn’t earn their keep.

The $233 Day, Part 2: The Inference Iceberg

Fri, 20 Mar 2026 17:00:00 -0500

I posted the part 1 findings to the team thread — model switch, cache invalidation, 20× call volume, $173 training run. Case closed. The numbers were clean, the explanation was satisfying, and the model got reverted within the hour.

Except $173 was wrong. Not wrong in the analysis — the training run did cost that much. Wrong in scope. I’d found the visible part of the spend and stopped looking.

The $173 Training Run

Fri, 20 Mar 2026 15:00:00 -0500

The Slack message landed at 3pm on a Wednesday: “model training successful, previously 20min, now 1h30m.” I had finished an EKS 1.32-to-1.33 upgrade on the ramparts cluster that morning. My upgrade, my timeline, my problem.

The first theory wrote itself. New cluster version, fresh nodes, cold image caches. I’d fixed a broken cluster autoscaler earlier that day — the old autoscaler deployment was pinned to a node selector that no longer matched after the upgrade, so pods were stacking up in Pending until I caught it. First-run penalties after a major version bump are real. Everyone on the call nodded. I almost typed up that explanation and moved on.

Your employees are tenants and you should bill them like it

Mon, 16 Mar 2026 14:00:00 -0600

I built a Lambda that enriches every Bedrock invocation with cost data and routes it to per-tenant CloudWatch log groups. Model ID, input tokens, output tokens, estimated cost in USD, all written to /bedrock/tenants/{tenant} so each customer’s AI spend is visible in near-real-time.

Then a developer on the team needed Bedrock access for local development, and I had a problem I hadn’t anticipated.

The invisible burn

The developer’s use case was reasonable. He was building features against the Bedrock API and needed to iterate against real models, not mocks. I created an SSO permission set with bedrock:InvokeModel and handed him the profile name.

I replaced $489/mo in AWS Client VPN with a $3 t4g.nano running Headscale

Sat, 21 Feb 2026 09:00:00 -0600

A finops sprint surfaced $489/mo in AWS Client VPN charges. Three endpoints across two accounts, plus connection-hour fees. For a VPN that four people used. I had provisioned two of them.

At the time, they felt indispensable — secure customer access, familiar tooling, predictable behavior. In reality, they were architectural inertia.

I replaced all three with a single t4g.nano running Headscale — the open-source Tailscale coordination server. Total cost: ~$3/mo.

I genericized the Terraform and open-sourced the module.

What building infrastructure for a startup actually looks like

Wed, 11 Feb 2026 09:00:00 -0600

I spent a day doing the unglamorous infrastructure work that keeps a startup alive. Here’s everything that happened.

Morning: security audit

Audited two EKS clusters for a K8s privilege escalation vulnerability. Found 9 service accounts with cluster-admin that didn’t need it. Deleted two dead deployments — ArgoCD and Velero, both mine, both abandoned months ago. The rest are kubeflow components we can’t touch until 1.36 ships the fix in April.