<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Platformengineering on ferkakta.dev</title><link>https://ferkakta.dev/tags/platformengineering/</link><description>Recent content in Platformengineering on ferkakta.dev</description><generator>Hugo</generator><language>en-US</language><copyright>Copyright fizz.</copyright><lastBuildDate>Fri, 03 Apr 2026 23:00:00 -0500</lastBuildDate><atom:link href="https://ferkakta.dev/tags/platformengineering/index.xml" rel="self" type="application/rss+xml"/><item><title>I answered 114 AWS Well-Architected Review questions from my terminal</title><link>https://ferkakta.dev/well-architected-review-from-terminal/</link><pubDate>Fri, 03 Apr 2026 23:00:00 -0500</pubDate><guid>https://ferkakta.dev/well-architected-review-from-terminal/</guid><description>&lt;p&gt;I was fourteen questions into the AWS Well-Architected Review when my wrists told me to stop. Each question is a page: read the description, check the boxes, type notes into a 2084-character text field, click Next. The Container Build Lens alone has 28 questions. I had two more lenses queued — the main Well-Architected Framework (57 questions) and the Generative AI Lens (29). That&amp;rsquo;s 114 questions total, and the console wants me to click through every one.&lt;/p&gt;</description></item><item><title>I replaced the AWS CLI completer with a datalake</title><link>https://ferkakta.dev/aws-completer-datalake-replacement/</link><pubDate>Thu, 02 Apr 2026 08:00:00 -0500</pubDate><guid>https://ferkakta.dev/aws-completer-datalake-replacement/</guid><description>&lt;p&gt;I needed to tell someone in Italy my availability in their timezone, typed &lt;code&gt;TZ=&lt;/code&gt; and hit tab, and &lt;a href="https://ferkakta.dev/blog/zsh-completions-vocabulary-construction-kit/"&gt;discovered a completer that&amp;rsquo;s apparently been sitting in zsh since the Pleistocene&lt;/a&gt;. That made me finally look at how completion actually works: &lt;code&gt;#compdef&lt;/code&gt;, the dispatch table, &lt;code&gt;_files&lt;/code&gt;, the whole vocabulary kit I&amp;rsquo;d been leaning on for years without really seeing. And in the middle of that I remembered the thing that had made me write off tab completion in the first place: &lt;code&gt;aws_completer&lt;/code&gt;, the Python-spawning hog that claims every argument position and still makes a mockery of my left pinky finger when it innocently asks for a filename, interrupting to say: &lt;em&gt;but wait, are you sure you don&amp;rsquo;t want to marry one of my 428 eligible daughters first?&lt;/em&gt;&lt;/p&gt;</description></item><item><title>FinOps portfolio: 71 tickets over 5 years</title><link>https://ferkakta.dev/finops-portfolio/</link><pubDate>Wed, 01 Apr 2026 15:30:00 -0500</pubDate><guid>https://ferkakta.dev/finops-portfolio/</guid><description>&lt;p&gt;My first finops ticket was called &amp;ldquo;Optimize the AWS infrastcuture.&amp;rdquo; The typo is still there. That was 2021 — a one-person infrastructure team at a startup that didn&amp;rsquo;t have the word finops in its vocabulary and didn&amp;rsquo;t know it needed one.&lt;/p&gt;
&lt;p&gt;Five years later I went looking for every cost-related ticket I&amp;rsquo;d ever created. I expected maybe thirty. I found 71, spread across 8 Jira projects, touching every layer of the stack from EBS volumes to LLM inference spend. Nobody asked me to create a finops practice. I just kept looking at the bill and refusing to pay for things that didn&amp;rsquo;t earn their keep.&lt;/p&gt;</description></item><item><title>I was wrong about shell completions for 15 years</title><link>https://ferkakta.dev/zsh-completions-vocabulary-construction-kit/</link><pubDate>Wed, 01 Apr 2026 15:00:00 -0500</pubDate><guid>https://ferkakta.dev/zsh-completions-vocabulary-construction-kit/</guid><description>&lt;p&gt;I thought shell completions were tab-operated Jinja — fill in the blank, template-style. Type half a filename, press tab, get the rest. That&amp;rsquo;s all I thought they did.&lt;/p&gt;
&lt;p&gt;And the AWS CLI completer kept confirming that impression in the worst way. When you&amp;rsquo;re typing &lt;code&gt;aws s3 cp somefile.txt s3://bucket/&lt;/code&gt;, the completer hijacks tab on the local path argument and sits there forever trying to guess AWS-side completions that will never come. You&amp;rsquo;re done with the AWS part of the command. You just want to tab-complete a filename. It won&amp;rsquo;t let you. AWS hogs the empty space where your possibilities want to go, because it thinks it&amp;rsquo;s important to block your progress in case you wanted to speak to any of its 867 pet names. As John Roderick and Merlin Mann put it on &lt;a href="http://www.intotv.com/rotl/ep1"&gt;Roderick on the Line&lt;/a&gt;: keep moving and get out of the way.&lt;/p&gt;</description></item><item><title>Three holes in the partition wall</title><link>https://ferkakta.dev/three-holes-in-the-partition-wall/</link><pubDate>Tue, 31 Mar 2026 20:00:00 -0500</pubDate><guid>https://ferkakta.dev/three-holes-in-the-partition-wall/</guid><description>&lt;p&gt;I assumed GovCloud was AWS with a different region code. I wrote a whole post about how wrong that was. The partition wall between commercial AWS and GovCloud is real — no shared IAM, no cross-partition role assumption, no federated identity, no common STS endpoints. An &lt;code&gt;arn:aws:&lt;/code&gt; principal cannot see an &lt;code&gt;arn:aws-us-gov:&lt;/code&gt; resource. They are separate universes connected by a billing relationship and nothing else.&lt;/p&gt;
&lt;p&gt;Except that&amp;rsquo;s not quite true either. There are three holes in the wall, and I found them one at a time over the course of a month.&lt;/p&gt;</description></item><item><title>Bananas Acquisition: a CMMC CRM playbook</title><link>https://ferkakta.dev/cmmc-crm-acquisition-playbook/</link><pubDate>Mon, 30 Mar 2026 20:00:00 -0500</pubDate><guid>https://ferkakta.dev/cmmc-crm-acquisition-playbook/</guid><description>&lt;p&gt;I spent a Monday getting the same document from two cloud service providers. AWS took five minutes and a command-line PDF extraction tool. Google took eight hours, two simultaneous support chats, an LLM-drafted support ticket, an escalation sherpa, and a tripartite NDA structure whose existence is unknown to Google&amp;rsquo;s own frontline support.&lt;/p&gt;
&lt;p&gt;Both vendors publish a CMMC Customer Responsibility Matrix — the spreadsheet that maps NIST 800-171 controls to inherited, shared, or customer responsibility. Both are legally required to provide it. The experience of obtaining them could not be more different.&lt;/p&gt;</description></item><item><title>Every AI session starts from zero. Mine doesn't.</title><link>https://ferkakta.dev/session-start-skill/</link><pubDate>Sat, 28 Mar 2026 01:00:00 -0500</pubDate><guid>https://ferkakta.dev/session-start-skill/</guid><description>&lt;p&gt;The first thing I do in every Claude Code session is the same thing I&amp;rsquo;d do if I woke up with amnesia in my own office. Where am I? What branch? What did I ship yesterday? What tickets moved overnight? Did someone create a new workflow I haven&amp;rsquo;t seen? Is there a live patch on the cluster that never made it into terraform?&lt;/p&gt;
&lt;p&gt;The model doesn&amp;rsquo;t know any of this. Every session starts from zero. The context window is empty. The handoff docs exist but nobody reads them unless you make it a rule. The Jira board changed while you were asleep. The teammate who was blocked yesterday merged a PR at 3am and now the branch you were working on has conflicts.&lt;/p&gt;</description></item><item><title>Do not fake smallness</title><link>https://ferkakta.dev/do-not-fake-smallness/</link><pubDate>Fri, 27 Mar 2026 15:00:00 -0500</pubDate><guid>https://ferkakta.dev/do-not-fake-smallness/</guid><description>&lt;p&gt;I found an old gist from my BAM consulting days. It shows &lt;code&gt;csvq&lt;/code&gt; — a one-line alias that turns any CSV stream into a queryable in-memory SQLite database with JSON output:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span style="display:flex;"&gt;&lt;span&gt;alias csvq&lt;span style="color:#f92672"&gt;=&lt;/span&gt;&lt;span style="color:#e6db74"&gt;&amp;#34;sqlite3 :memory: -cmd &amp;#39;.mode csv&amp;#39; -cmd &amp;#39;.import /dev/stdin s3&amp;#39; &amp;#39;.mode json&amp;#39;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Pipe a CSV into it, write SQL, get JSON. I wrote it for a colleague I was initiating into my ways. The gist is polite. It shows the output being piped into &lt;code&gt;jq '.[]'&lt;/code&gt;, which is perfectly respectable Unix.&lt;/p&gt;</description></item><item><title>One module block per service per tenant</title><link>https://ferkakta.dev/one-module-block-per-service-per-tenant/</link><pubDate>Fri, 27 Mar 2026 00:00:00 -0500</pubDate><guid>https://ferkakta.dev/one-module-block-per-service-per-tenant/</guid><description>&lt;p&gt;Every tenant on my platform gets three services: an API server, an auth service, and a frontend. Each one is a single module block in Terraform that creates a Kubernetes deployment, a ClusterIP service, an ALB ingress, IRSA for AWS access, ESO-synced secrets from SSM, and a feature flag discovery mechanism. The module is the same for all three services. The variables are different.&lt;/p&gt;
&lt;p&gt;I extracted it into an open source module because I kept explaining the design decisions to people who asked &amp;ldquo;how do you deploy services to EKS?&amp;rdquo; and the answer was always &amp;ldquo;let me show you the module.&amp;rdquo; The module is the answer.&lt;/p&gt;</description></item><item><title>Every tool I've ever used is a CloudFormation frontend</title><link>https://ferkakta.dev/cloudformation-frontends/</link><pubDate>Thu, 26 Mar 2026 18:00:00 -0500</pubDate><guid>https://ferkakta.dev/cloudformation-frontends/</guid><description>&lt;p&gt;I was reading a job description that wanted CloudFormation experience, and I had the thought that derails the actual task: I&amp;rsquo;ve spent my entire career using tools that compile down to CloudFormation and don&amp;rsquo;t mention it until something breaks. I&amp;rsquo;ve just never framed it that way.&lt;/p&gt;
&lt;p&gt;My career is a parade of progressively nicer frontends for the same underlying control plane — but one at a time.&lt;/p&gt;
&lt;p&gt;The first one was the AWS console. Click, wait, refresh, click. Then CloudFormation itself, which was an improvement in the way that a paper map is an improvement over asking for directions — technically correct, nearly unusable in practice. Then Serverless Framework, which promised to abstract the whole stack into a YAML file and a deploy command. Then Terraform, which promised cloud-agnostic infrastructure as code with a state model that actually worked.&lt;/p&gt;</description></item><item><title>Newspapers aren't dead. You read one every morning.</title><link>https://ferkakta.dev/newspapers-arent-dead/</link><pubDate>Thu, 26 Mar 2026 16:00:00 -0500</pubDate><guid>https://ferkakta.dev/newspapers-arent-dead/</guid><description>&lt;p&gt;Every morning I open Slack, scan three channels, check Jira for overnight transitions, read Google Meet transcripts from the India-side scrum, glance at AWS support cases, and pull up the git log to see what merged while I slept. By the time standup starts I&amp;rsquo;ve assembled a mental model of where everything is. I do this every day. I&amp;rsquo;ve never called it what it is.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;m reading the morning paper.&lt;/p&gt;</description></item><item><title>from feature_flags import *</title><link>https://ferkakta.dev/from-feature-flags-import-star/</link><pubDate>Wed, 25 Mar 2026 21:00:00 -0500</pubDate><guid>https://ferkakta.dev/from-feature-flags-import-star/</guid><description>&lt;p&gt;A colleague needed a feature flag enabled on one tenant. &lt;code&gt;FEATURE_FLAG_ENABLE_AGENTS=True&lt;/code&gt; — one environment variable, one pod. I added it to the K8s secret manually, restarted the pod, and he was unblocked in two minutes.&lt;/p&gt;
&lt;p&gt;Then I realized: the next terraform apply would overwrite that secret without the flag. The ExternalSecret syncs from SSM, and the flag wasn&amp;rsquo;t in SSM through any path terraform knew about. My manual fix had a shelf life of one deploy.&lt;/p&gt;</description></item><item><title>The $233 Day, Part 2: The Inference Iceberg</title><link>https://ferkakta.dev/233-dollar-day-part-2/</link><pubDate>Fri, 20 Mar 2026 17:00:00 -0500</pubDate><guid>https://ferkakta.dev/233-dollar-day-part-2/</guid><description>&lt;p&gt;I posted the part 1 findings to the team thread — model switch, cache invalidation, 20× call volume, $173 training run. Case closed. The numbers were clean, the explanation was satisfying, and the model got reverted within the hour.&lt;/p&gt;
&lt;p&gt;Except $173 was wrong. Not wrong in the analysis — the training run did cost that much. Wrong in scope. I&amp;rsquo;d found the visible part of the spend and stopped looking.&lt;/p&gt;</description></item><item><title>The $173 Training Run</title><link>https://ferkakta.dev/173-dollar-training-run/</link><pubDate>Fri, 20 Mar 2026 15:00:00 -0500</pubDate><guid>https://ferkakta.dev/173-dollar-training-run/</guid><description>&lt;p&gt;The Slack message landed at 3pm on a Wednesday: &amp;ldquo;model training successful, previously 20min, now 1h30m.&amp;rdquo; I had finished an EKS 1.32-to-1.33 upgrade on the ramparts cluster that morning. My upgrade, my timeline, my problem.&lt;/p&gt;
&lt;p&gt;The first theory wrote itself. New cluster version, fresh nodes, cold image caches. I&amp;rsquo;d fixed a broken cluster autoscaler earlier that day — the old autoscaler deployment was pinned to a node selector that no longer matched after the upgrade, so pods were stacking up in Pending until I caught it. First-run penalties after a major version bump are real. Everyone on the call nodded. I almost typed up that explanation and moved on.&lt;/p&gt;</description></item></channel></rss>