<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Multi-Tenant on ferkakta.dev</title><link>https://ferkakta.dev/tags/multi-tenant/</link><description>Recent content in Multi-Tenant on ferkakta.dev</description><generator>Hugo</generator><language>en-US</language><copyright>Copyright fizz.</copyright><lastBuildDate>Mon, 16 Mar 2026 14:00:00 -0600</lastBuildDate><atom:link href="https://ferkakta.dev/tags/multi-tenant/index.xml" rel="self" type="application/rss+xml"/><item><title>Your employees are tenants and you should bill them like it</title><link>https://ferkakta.dev/employees-as-tenants/</link><pubDate>Mon, 16 Mar 2026 14:00:00 -0600</pubDate><guid>https://ferkakta.dev/employees-as-tenants/</guid><description>&lt;p&gt;I built a Lambda that enriches every Bedrock invocation with cost data and routes it to per-tenant CloudWatch log groups. Model ID, input tokens, output tokens, estimated cost in USD, all written to &lt;code&gt;/bedrock/tenants/{tenant}&lt;/code&gt; so each customer&amp;rsquo;s AI spend is visible in near-real-time.&lt;/p&gt;
&lt;p&gt;Then a developer on the team needed Bedrock access for local development, and I had a problem I hadn&amp;rsquo;t anticipated.&lt;/p&gt;
&lt;h2 id="the-invisible-burn"&gt;The invisible burn&lt;/h2&gt;
&lt;p&gt;The developer&amp;rsquo;s use case was reasonable. He was building features against the Bedrock API and needed to iterate against real models, not mocks. I created an SSO permission set with &lt;code&gt;bedrock:InvokeModel&lt;/code&gt; and handed him the profile name.&lt;/p&gt;</description></item><item><title>I assumed GovCloud was AWS with a different region code. It took two weeks to prove me wrong.</title><link>https://ferkakta.dev/govcloud-surprises/</link><pubDate>Wed, 11 Mar 2026 23:00:00 -0400</pubDate><guid>https://ferkakta.dev/govcloud-surprises/</guid><description>&lt;p&gt;I needed a GovCloud account for a multi-tenant NIST compliance platform. I&amp;rsquo;d been running commercial AWS infrastructure for months — EKS, Terraform, tenant provisioning, the whole stack. GovCloud would be the same thing in a different region. That was the assumption. It lasted about four hours.&lt;/p&gt;
&lt;h2 id="the-account-that-doesnt-exist-yet"&gt;The account that doesn&amp;rsquo;t exist yet&lt;/h2&gt;
&lt;p&gt;My management account couldn&amp;rsquo;t call &lt;code&gt;CreateGovCloudAccount&lt;/code&gt;. The API returned &lt;code&gt;ConstraintViolationException&lt;/code&gt; with a message about not being &amp;ldquo;enabled for access to GovCloud&amp;rdquo; and no guidance on what that meant. I filed a support case. AWS enabled the permission two days later, and as a side effect created a standalone GovCloud account that had no relationship to my Organizations structure — an orphan floating in the partition with disconnected root credentials. I still had to find it and deal with it.&lt;/p&gt;</description></item><item><title>From eight manual steps to one command</title><link>https://ferkakta.dev/eight-manual-steps-to-one-command/</link><pubDate>Tue, 03 Mar 2026 09:00:00 -0600</pubDate><guid>https://ferkakta.dev/eight-manual-steps-to-one-command/</guid><description>&lt;p&gt;I provisioned two tenants by hand before I decided that nobody should ever provision a tenant by hand.&lt;/p&gt;
&lt;p&gt;The provisioning flow for our multi-tenant SaaS platform was 8 steps across 4 tools — a Python CLI, a shell script with 5 flags per invocation, a GitHub Actions workflow, and two Kubernetes job manifests requiring injected DB connection strings. Each step had different inputs, different env files, and subtly different flag names for the same concept. The two &lt;code&gt;populate&lt;/code&gt; runs used &lt;code&gt;--appname apiserver&lt;/code&gt; and &lt;code&gt;--appname tenant_auth_service&lt;/code&gt; — note the underscore in one and not the other. That naming inconsistency is a guaranteed typo on a Friday afternoon. Each flag is a chance to silently write 24 SSM parameters to the wrong path.&lt;/p&gt;</description></item><item><title>Your onboarding flow is your architecture's report card</title><link>https://ferkakta.dev/onboarding-flow-architecture-report-card/</link><pubDate>Tue, 03 Mar 2026 00:00:00 +0000</pubDate><guid>https://ferkakta.dev/onboarding-flow-architecture-report-card/</guid><description>&lt;p&gt;I ran a colleague&amp;rsquo;s manual tenant onboarding flow for a multi-tenant SaaS platform. Five steps, two attempts, and a list of errors that mapped precisely to every automation gap in the system. The onboarding flow wasn&amp;rsquo;t broken. It was a diagnostic.&lt;/p&gt;
&lt;h2 id="the-five-steps"&gt;The five steps&lt;/h2&gt;
&lt;p&gt;The flow to bring a new tenant from nothing to working:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Run a Python registration script that calls the auth-handler API, creates an org in the identity provider, and sends a confirmation email to the devops team.&lt;/li&gt;
&lt;li&gt;Read the devops email. Manually extract two values: a tenant hash and an org code.&lt;/li&gt;
&lt;li&gt;Run populate scripts that seed 38 SSM parameters — 24 for &lt;code&gt;apiserver&lt;/code&gt;, 14 for &lt;code&gt;tenant-auth-service&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Trigger a GitHub Actions workflow. Terraform creates the namespace, deployments, ExternalSecrets, DNS records, HTTPS.&lt;/li&gt;
&lt;li&gt;Manually apply Kubernetes jobs for ETL seed data and first-user creation.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Step 4 is automated. Steps 1, 2, 3, and 5 are manual. The manual steps are where the architecture&amp;rsquo;s seams show.&lt;/p&gt;</description></item><item><title>Zero-touch multi-tenant deploys: removing myself from the critical path</title><link>https://ferkakta.dev/zero-touch-multi-tenant-deploys-eks-terraform/</link><pubDate>Mon, 02 Mar 2026 09:00:00 -0600</pubDate><guid>https://ferkakta.dev/zero-touch-multi-tenant-deploys-eks-terraform/</guid><description>&lt;p&gt;I had provisioned two tenants when I realized the deploy process didn&amp;rsquo;t scale to three. Each tenant on &lt;a href="https://ramparts.dev"&gt;ramparts&lt;/a&gt; runs three services &amp;ndash; &lt;code&gt;api-server&lt;/code&gt;, &lt;code&gt;web-client&lt;/code&gt; (the React frontend), &lt;code&gt;tenant-auth&lt;/code&gt; &amp;ndash; each with its own Docker image in ECR. Deploying a release meant running &lt;code&gt;gh workflow run deploy-tenant.yml -f tenant_name=acme -f action=apply -f update_images=true&lt;/code&gt;, then doing it again for the next tenant. With 3 services resolving per run and N tenants, I was the bottleneck. Not Terraform, not GitHub Actions, not ECR. Me, remembering which tenants existed and typing their names correctly.&lt;/p&gt;</description></item><item><title>Per-Tenant CloudWatch Log Isolation on EKS, or: Why I Stopped Using aws-for-fluent-bit</title><link>https://ferkakta.dev/per-tenant-cloudwatch-log-isolation-eks/</link><pubDate>Mon, 02 Mar 2026 00:00:00 +0000</pubDate><guid>https://ferkakta.dev/per-tenant-cloudwatch-log-isolation-eks/</guid><description>&lt;h2 id="the-starting-assumption"&gt;The starting assumption&lt;/h2&gt;
&lt;p&gt;I&amp;rsquo;m building &lt;a href="https://ramparts.dev"&gt;ramparts&lt;/a&gt;, a multi-tenant compliance platform running on EKS. Each tenant gets a Kubernetes namespace &amp;ndash; &lt;code&gt;tenant-acme&lt;/code&gt;, &lt;code&gt;tenant-globex&lt;/code&gt;, whatever &amp;ndash; and the compliance controls require that their application logs land in isolated storage with 365-day retention. CMMC maps this to AU-2 (audit events), AU-3 (audit content), AU-11 (retention), and AC-4 (information flow isolation). A tenant cannot read another tenant&amp;rsquo;s container output.&lt;/p&gt;
&lt;p&gt;The obvious first move was &lt;code&gt;aws-for-fluent-bit&lt;/code&gt;, AWS&amp;rsquo;s own Helm chart and container image for shipping logs to CloudWatch. AWS service, AWS chart, AWS logging destination. The blessed path.&lt;/p&gt;</description></item><item><title>Making a Kopf operator idempotent: three-layer existence checks and the redisReady race</title><link>https://ferkakta.dev/kopf-operator-idempotency-three-layer-check/</link><pubDate>Fri, 20 Feb 2026 12:00:00 -0500</pubDate><guid>https://ferkakta.dev/kopf-operator-idempotency-three-layer-check/</guid><description>&lt;p&gt;Our tenant operator provisions databases, cache users, and credentials for each tenant in a multi-tenant SaaS platform. PostgreSQL roles on shared RDS, ElastiCache RBAC users, SSM parameters with generated passwords. It worked exactly once per tenant. The second time it ran, it regenerated every password and overwrote every SSM parameter. Running services holding the old credentials immediately lost their database and cache connections.&lt;/p&gt;
&lt;p&gt;This was the blocker for auto-deploy.&lt;/p&gt;
&lt;h2 id="every-deploy-was-a-coordinated-outage"&gt;Every deploy was a coordinated outage&lt;/h2&gt;
&lt;p&gt;The orchestrator runs &lt;code&gt;terraform apply&lt;/code&gt; for each tenant on every deploy. Terraform reconciles the Tenant CRD, which fires Kopf&amp;rsquo;s &lt;code&gt;on_tenant_create&lt;/code&gt; handler. The handler doesn&amp;rsquo;t distinguish between &amp;ldquo;new tenant&amp;rdquo; and &amp;ldquo;existing tenant whose CRD was re-applied.&amp;rdquo; It generates fresh passwords, creates new PostgreSQL roles (which fail because the role exists, or worse, succeed and orphan the old one), and overwrites SSM parameters with credentials that no running pod knows about.&lt;/p&gt;</description></item><item><title>Cross-repo auto-deploy with GitHub Actions: the orchestrator pattern</title><link>https://ferkakta.dev/cross-repo-auto-deploy-orchestration-github-actions/</link><pubDate>Fri, 20 Feb 2026 10:00:00 -0500</pubDate><guid>https://ferkakta.dev/cross-repo-auto-deploy-orchestration-github-actions/</guid><description>&lt;p&gt;Two repos merged within seconds of each other. The first orchestrator run failed — &lt;code&gt;web-client&lt;/code&gt;&amp;rsquo;s ECR image didn&amp;rsquo;t exist yet because the build was still running. The GitHub Actions log showed a red X, an error annotation, and a Slack notification I didn&amp;rsquo;t need to read.&lt;/p&gt;
&lt;p&gt;Four minutes later, the second run deployed both changes. No retry logic. No manual intervention. Nobody touched anything.&lt;/p&gt;
&lt;p&gt;I&amp;rsquo;d spent my day building a cross-repo deploy pipeline for a multi-tenant platform — three app repos pushing Docker images to ECR, one infra repo deploying the new tenant service images to EKS. The race condition was the first real test. It failed exactly the way I wanted it to.&lt;/p&gt;</description></item><item><title>90 AWS resources in 5 minutes — automating multi-tenant SaaS tenant lifecycle</title><link>https://ferkakta.dev/multi-tenant-saas-tenant-lifecycle/</link><pubDate>Tue, 10 Feb 2026 09:00:00 -0600</pubDate><guid>https://ferkakta.dev/multi-tenant-saas-tenant-lifecycle/</guid><description>&lt;p&gt;I recorded our entire tenant lifecycle — create, test, destroy — with no edits. Here&amp;rsquo;s what 5 minutes of infrastructure automation looks like when there are no tickets, no handoffs, and no &amp;ldquo;can someone set up the database.&amp;rdquo;&lt;/p&gt;
&lt;h2 id="what-happens-on-tenant-create"&gt;What happens on &lt;code&gt;tenant create&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;One GitHub Actions workflow backed by Terraform + a Kubernetes operator:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Validates the tenant name, resolves container images from the latest release branch&lt;/li&gt;
&lt;li&gt;Provisions ACM wildcard cert + Route53 DNS records&lt;/li&gt;
&lt;li&gt;Creates the &lt;code&gt;Tenant&lt;/code&gt; CRD → operator provisions PostgreSQL databases on shared RDS, seeds credentials to SSM&lt;/li&gt;
&lt;li&gt;Terraform deploys ExternalSecrets, Deployments, Ingress — 3 services per tenant&lt;/li&gt;
&lt;li&gt;SSM parameters auto-seeded: Redis credentials, auth URLs, signing keys — ~40 config values per tenant&lt;/li&gt;
&lt;li&gt;Zero static credentials anywhere — IRSA for everything, secrets injected at runtime from SSM via External Secrets Operator&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;About 5 minutes from nothing to 90 AWS resources and running pods.&lt;/p&gt;</description></item></channel></rss>