<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Eks on ferkakta.dev</title><link>https://ferkakta.dev/tags/eks/</link><description>Recent content in Eks on ferkakta.dev</description><generator>Hugo</generator><language>en-US</language><copyright>Copyright fizz.</copyright><lastBuildDate>Mon, 02 Mar 2026 09:00:00 -0600</lastBuildDate><atom:link href="https://ferkakta.dev/tags/eks/index.xml" rel="self" type="application/rss+xml"/><item><title>Zero-touch multi-tenant deploys: removing myself from the critical path</title><link>https://ferkakta.dev/zero-touch-multi-tenant-deploys-eks-terraform/</link><pubDate>Mon, 02 Mar 2026 09:00:00 -0600</pubDate><guid>https://ferkakta.dev/zero-touch-multi-tenant-deploys-eks-terraform/</guid><description>&lt;p&gt;I had provisioned two tenants when I realized the deploy process didn&amp;rsquo;t scale to three. Each tenant on &lt;a href="https://ramparts.dev"&gt;ramparts&lt;/a&gt; runs three services &amp;ndash; &lt;code&gt;api-server&lt;/code&gt;, &lt;code&gt;web-client&lt;/code&gt; (the React frontend), &lt;code&gt;tenant-auth&lt;/code&gt; &amp;ndash; each with its own Docker image in ECR. Deploying a release meant running &lt;code&gt;gh workflow run deploy-tenant.yml -f tenant_name=acme -f action=apply -f update_images=true&lt;/code&gt;, then doing it again for the next tenant. With 3 services resolving per run and N tenants, I was the bottleneck. Not Terraform, not GitHub Actions, not ECR. Me, remembering which tenants existed and typing their names correctly.&lt;/p&gt;</description></item><item><title>Per-Tenant CloudWatch Log Isolation on EKS, or: Why I Stopped Using aws-for-fluent-bit</title><link>https://ferkakta.dev/per-tenant-cloudwatch-log-isolation-eks/</link><pubDate>Mon, 02 Mar 2026 00:00:00 +0000</pubDate><guid>https://ferkakta.dev/per-tenant-cloudwatch-log-isolation-eks/</guid><description>&lt;h2 id="the-starting-assumption"&gt;The starting assumption&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m building &lt;a href="https://ramparts.dev"&gt;ramparts&lt;/a&gt;, a multi-tenant compliance platform running on EKS. Each tenant gets a Kubernetes namespace &amp;ndash; &lt;code&gt;tenant-acme&lt;/code&gt;, &lt;code&gt;tenant-globex&lt;/code&gt;, whatever &amp;ndash; and the compliance controls require that their application logs land in isolated storage with 365-day retention. CMMC maps this to AU-2 (audit events), AU-3 (audit content), AU-11 (retention), and AC-4 (information flow isolation). A tenant cannot read another tenant&amp;rsquo;s container output.&lt;/p&gt;
&lt;p&gt;The obvious first move was &lt;code&gt;aws-for-fluent-bit&lt;/code&gt;, AWS&amp;rsquo;s own Helm chart and container image for shipping logs to CloudWatch. AWS service, AWS chart, AWS logging destination. The blessed path.&lt;/p&gt;</description></item><item><title>Why we removed aws-for-fluent-bit from EKS</title><link>https://ferkakta.dev/why-we-removed-aws-for-fluent-bit-from-eks/</link><pubDate>Mon, 02 Mar 2026 00:00:00 +0000</pubDate><guid>https://ferkakta.dev/why-we-removed-aws-for-fluent-bit-from-eks/</guid><description>&lt;p&gt;We deployed &lt;code&gt;aws-for-fluent-bit&lt;/code&gt; because AWS recommends it.&lt;/p&gt;
&lt;p&gt;If you follow the EKS logging documentation, that&amp;rsquo;s the default path. It assumes you use AWS&amp;rsquo;s distribution of Fluent Bit rather than the upstream Helm chart.&lt;/p&gt;
&lt;p&gt;We did.&lt;/p&gt;
&lt;p&gt;Two days later, we ripped it out.&lt;/p&gt;
&lt;p&gt;The AWS chart and the upstream chart are not the same thing. The differences aren&amp;rsquo;t cosmetic. They affect how quickly you receive security patches, how transparently your configuration maps to the underlying plugin, and how many boundaries sit between your logs and the CloudWatch API.&lt;/p&gt;</description></item><item><title>An orderly EKS and Kubeflow upgrade path</title><link>https://ferkakta.dev/orderly-eks-kubeflow-upgrade-path/</link><pubDate>Fri, 27 Feb 2026 00:00:00 +0000</pubDate><guid>https://ferkakta.dev/orderly-eks-kubeflow-upgrade-path/</guid><description>&lt;p&gt;When EKS extended-support pricing is on the horizon, upgrade planning gets emotional fast.&lt;/p&gt;
&lt;p&gt;The worst time to discover platform ambiguity is when finance and timelines are both tightening.&lt;/p&gt;
&lt;p&gt;Our first impulse was to ask, &amp;ldquo;how quickly can we upgrade?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The better question was, &amp;ldquo;what order of operations prevents us from compounding hidden drift during upgrade churn?&amp;rdquo;&lt;/p&gt;
&lt;h2 id="why-one-shot-upgrades-fail-in-controller-heavy-stacks"&gt;Why one-shot upgrades fail in controller-heavy stacks&lt;/h2&gt;
&lt;p&gt;On paper, &amp;ldquo;upgrade EKS then bump Kubeflow&amp;rdquo; sounds linear.&lt;/p&gt;</description></item><item><title>Your terraform apply is silently rolling back your container images</title><link>https://ferkakta.dev/state-aware-ecr-image-resolution-github-actions/</link><pubDate>Tue, 17 Feb 2026 09:00:00 -0600</pubDate><guid>https://ferkakta.dev/state-aware-ecr-image-resolution-github-actions/</guid><description>&lt;p&gt;Every &amp;ldquo;deploy to EKS with GitHub Actions&amp;rdquo; tutorial solves the same problem: build an image, push to ECR, deploy it. The tutorial ends at &amp;ldquo;your pod is running.&amp;rdquo; Nobody talks about day two.&lt;/p&gt;
&lt;h2 id="the-silent-rollback"&gt;The silent rollback&lt;/h2&gt;
&lt;p&gt;Day two: you have a running EKS cluster with three services per tenant. You need to change an IAM policy. You open a PR, touch one line of Terraform, run &lt;code&gt;terraform apply&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Your IAM policy updates. Your container images also update — to whatever was hardcoded in &lt;code&gt;variables.tf&lt;/code&gt; as the default. That default was correct three months ago. Your services just rolled back to a three-month-old image and nobody noticed because the deployment succeeded.&lt;/p&gt;</description></item><item><title>What building infrastructure for a startup actually looks like</title><link>https://ferkakta.dev/startup-infra-unglamorous-work/</link><pubDate>Wed, 11 Feb 2026 09:00:00 -0600</pubDate><guid>https://ferkakta.dev/startup-infra-unglamorous-work/</guid><description>&lt;p&gt;I spent a day doing the unglamorous infrastructure work that keeps a startup alive. Here&amp;rsquo;s everything that happened.&lt;/p&gt;
&lt;h2 id="morning-security-audit"&gt;Morning: security audit&lt;/h2&gt;
&lt;p&gt;Audited two EKS clusters for a K8s privilege escalation vulnerability. Found 9 service accounts with &lt;code&gt;cluster-admin&lt;/code&gt; that didn&amp;rsquo;t need it. Deleted two dead deployments — ArgoCD and Velero, both mine, both abandoned months ago. The rest are kubeflow components we can&amp;rsquo;t touch until 1.36 ships the fix in April.&lt;/p&gt;</description></item><item><title>90 AWS resources in 5 minutes — automating multi-tenant SaaS tenant lifecycle</title><link>https://ferkakta.dev/multi-tenant-saas-tenant-lifecycle/</link><pubDate>Tue, 10 Feb 2026 09:00:00 -0600</pubDate><guid>https://ferkakta.dev/multi-tenant-saas-tenant-lifecycle/</guid><description>&lt;p&gt;I recorded our entire tenant lifecycle — create, test, destroy — with no edits. Here&amp;rsquo;s what 5 minutes of infrastructure automation looks like when there are no tickets, no handoffs, and no &amp;ldquo;can someone set up the database.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="what-happens-on-tenant-create"&gt;What happens on &lt;code&gt;tenant create&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;One GitHub Actions workflow backed by Terraform + a Kubernetes operator:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Validates the tenant name, resolves container images from the latest release branch&lt;/li&gt;
&lt;li&gt;Provisions ACM wildcard cert + Route53 DNS records&lt;/li&gt;
&lt;li&gt;Creates the &lt;code&gt;Tenant&lt;/code&gt; CRD → operator provisions PostgreSQL databases on shared RDS, seeds credentials to SSM&lt;/li&gt;
&lt;li&gt;Terraform deploys ExternalSecrets, Deployments, Ingress — 3 services per tenant&lt;/li&gt;
&lt;li&gt;SSM parameters auto-seeded: Redis credentials, auth URLs, signing keys — ~40 config values per tenant&lt;/li&gt;
&lt;li&gt;Zero static credentials anywhere — IRSA for everything, secrets injected at runtime from SSM via External Secrets Operator&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;About 5 minutes from nothing to 90 AWS resources and running pods.&lt;/p&gt;</description></item></channel></rss>