<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Finops on ferkakta.dev</title><link>https://ferkakta.dev/tags/finops/</link><description>Recent content in Finops on ferkakta.dev</description><generator>Hugo</generator><language>en-US</language><copyright>Copyright fizz.</copyright><lastBuildDate>Wed, 01 Apr 2026 15:30:00 -0500</lastBuildDate><atom:link href="https://ferkakta.dev/tags/finops/index.xml" rel="self" type="application/rss+xml"/><item><title>FinOps portfolio: 71 tickets over 5 years</title><link>https://ferkakta.dev/finops-portfolio/</link><pubDate>Wed, 01 Apr 2026 15:30:00 -0500</pubDate><guid>https://ferkakta.dev/finops-portfolio/</guid><description>&lt;p&gt;My first finops ticket was called &amp;ldquo;Optimize the AWS infrastcuture.&amp;rdquo; The typo is still there. That was 2021 — a one-person infrastructure team at a startup that didn&amp;rsquo;t have the word finops in its vocabulary and didn&amp;rsquo;t know it needed one.&lt;/p&gt;
&lt;p&gt;Five years later I went looking for every cost-related ticket I&amp;rsquo;d ever created. I expected maybe thirty. I found 71, spread across 8 Jira projects, touching every layer of the stack from EBS volumes to LLM inference spend. Nobody asked me to create a finops practice. I just kept looking at the bill and refusing to pay for things that didn&amp;rsquo;t earn their keep.&lt;/p&gt;</description></item><item><title>The $233 Day, Part 2: The Inference Iceberg</title><link>https://ferkakta.dev/233-dollar-day-part-2/</link><pubDate>Fri, 20 Mar 2026 17:00:00 -0500</pubDate><guid>https://ferkakta.dev/233-dollar-day-part-2/</guid><description>&lt;p&gt;I posted the part 1 findings to the team thread — model switch, cache invalidation, 20× call volume, $173 training run. Case closed. The numbers were clean, the explanation was satisfying, and the model got reverted within the hour.&lt;/p&gt;
&lt;p&gt;Except $173 was wrong. Not wrong in the analysis — the training run did cost that much. Wrong in scope. I&amp;rsquo;d found the visible part of the spend and stopped looking.&lt;/p&gt;</description></item><item><title>The $173 Training Run</title><link>https://ferkakta.dev/173-dollar-training-run/</link><pubDate>Fri, 20 Mar 2026 15:00:00 -0500</pubDate><guid>https://ferkakta.dev/173-dollar-training-run/</guid><description>&lt;p&gt;The Slack message landed at 3pm on a Wednesday: &amp;ldquo;model training successful, previously 20min, now 1h30m.&amp;rdquo; I had finished an EKS 1.32-to-1.33 upgrade on the ramparts cluster that morning. My upgrade, my timeline, my problem.&lt;/p&gt;
&lt;p&gt;The first theory wrote itself. New cluster version, fresh nodes, cold image caches. I&amp;rsquo;d fixed a broken cluster autoscaler earlier that day — the old autoscaler deployment was pinned to a node selector that no longer matched after the upgrade, so pods were stacking up in Pending until I caught it. First-run penalties after a major version bump are real. Everyone on the call nodded. I almost typed up that explanation and moved on.&lt;/p&gt;</description></item><item><title>Your employees are tenants and you should bill them like it</title><link>https://ferkakta.dev/employees-as-tenants/</link><pubDate>Mon, 16 Mar 2026 14:00:00 -0600</pubDate><guid>https://ferkakta.dev/employees-as-tenants/</guid><description>&lt;p&gt;I built a Lambda that enriches every Bedrock invocation with cost data and routes it to per-tenant CloudWatch log groups. Model ID, input tokens, output tokens, estimated cost in USD, all written to &lt;code&gt;/bedrock/tenants/{tenant}&lt;/code&gt; so each customer&amp;rsquo;s AI spend is visible in near-real-time.&lt;/p&gt;
&lt;p&gt;Then a developer on the team needed Bedrock access for local development, and I had a problem I hadn&amp;rsquo;t anticipated.&lt;/p&gt;
&lt;h2 id="the-invisible-burn"&gt;The invisible burn&lt;/h2&gt;
&lt;p&gt;The developer&amp;rsquo;s use case was reasonable. He was building features against the Bedrock API and needed to iterate against real models, not mocks. I created an SSO permission set with &lt;code&gt;bedrock:InvokeModel&lt;/code&gt; and handed him the profile name.&lt;/p&gt;</description></item><item><title>I replaced $489/mo in AWS Client VPN with a $3 t4g.nano running Headscale</title><link>https://ferkakta.dev/headscale-aws-open-source-terraform-module/</link><pubDate>Sat, 21 Feb 2026 09:00:00 -0600</pubDate><guid>https://ferkakta.dev/headscale-aws-open-source-terraform-module/</guid><description>&lt;p&gt;A finops sprint surfaced $489/mo in AWS Client VPN charges. Three endpoints across two accounts, plus connection-hour fees. For a VPN that four people used. I had provisioned two of them.&lt;/p&gt;
&lt;p&gt;At the time, they felt indispensable — secure customer access, familiar tooling, predictable behavior.
In reality, they were architectural inertia.&lt;/p&gt;
&lt;p&gt;I replaced all three with a single t4g.nano running &lt;a href="https://github.com/juanfont/headscale"&gt;Headscale&lt;/a&gt; — the open-source Tailscale coordination server. Total cost: ~$3/mo.&lt;/p&gt;
&lt;p&gt;I genericized the Terraform and open-sourced the module.&lt;/p&gt;</description></item><item><title>What building infrastructure for a startup actually looks like</title><link>https://ferkakta.dev/startup-infra-unglamorous-work/</link><pubDate>Wed, 11 Feb 2026 09:00:00 -0600</pubDate><guid>https://ferkakta.dev/startup-infra-unglamorous-work/</guid><description>&lt;p&gt;I spent a day doing the unglamorous infrastructure work that keeps a startup alive. Here&amp;rsquo;s everything that happened.&lt;/p&gt;
&lt;h2 id="morning-security-audit"&gt;Morning: security audit&lt;/h2&gt;
&lt;p&gt;Audited two EKS clusters for a K8s privilege escalation vulnerability. Found 9 service accounts with &lt;code&gt;cluster-admin&lt;/code&gt; that didn&amp;rsquo;t need it. Deleted two dead deployments — ArgoCD and Velero, both mine, both abandoned months ago. The rest are kubeflow components we can&amp;rsquo;t touch until 1.36 ships the fix in April.&lt;/p&gt;</description></item></channel></rss>