<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Reliability on ferkakta.dev</title><link>https://ferkakta.dev/tags/reliability/</link><description>Recent content in Reliability on ferkakta.dev</description><generator>Hugo</generator><language>en-US</language><copyright>Copyright fizz.</copyright><lastBuildDate>Fri, 27 Feb 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://ferkakta.dev/tags/reliability/index.xml" rel="self" type="application/rss+xml"/><item><title>An orderly EKS and Kubeflow upgrade path</title><link>https://ferkakta.dev/orderly-eks-kubeflow-upgrade-path/</link><pubDate>Fri, 27 Feb 2026 00:00:00 +0000</pubDate><guid>https://ferkakta.dev/orderly-eks-kubeflow-upgrade-path/</guid><description>&lt;p&gt;When EKS extended-support pricing is on the horizon, upgrade planning gets emotional fast.&lt;/p&gt;
&lt;p&gt;The worst time to discover platform ambiguity is when finance and timelines are both tightening.&lt;/p&gt;
&lt;p&gt;Our first impulse was to ask, &amp;ldquo;how quickly can we upgrade?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;The better question was, &amp;ldquo;what order of operations prevents us from compounding hidden drift during upgrade churn?&amp;rdquo;&lt;/p&gt;
&lt;h2 id="why-one-shot-upgrades-fail-in-controller-heavy-stacks"&gt;Why one-shot upgrades fail in controller-heavy stacks&lt;/h2&gt;
&lt;p&gt;On paper, &amp;ldquo;upgrade EKS then bump Kubeflow&amp;rdquo; sounds linear.&lt;/p&gt;</description></item><item><title>Drift is an availability bug</title><link>https://ferkakta.dev/drift-is-an-availability-bug/</link><pubDate>Fri, 27 Feb 2026 00:00:00 +0000</pubDate><guid>https://ferkakta.dev/drift-is-an-availability-bug/</guid><description>&lt;p&gt;I used to think of drift as a config hygiene issue.&lt;/p&gt;
&lt;p&gt;Annoying, expensive, embarrassing — but fundamentally administrative.&lt;/p&gt;
&lt;p&gt;Then I watched two control-plane components fall into &lt;code&gt;CrashLoopBackOff&lt;/code&gt; inside a production incident and realized the framing was wrong.&lt;/p&gt;
&lt;p&gt;Drift is not a paperwork problem. Drift is an availability bug.&lt;/p&gt;
&lt;h2 id="the-incident-looked-like-random-failure"&gt;The incident looked like random failure&lt;/h2&gt;
&lt;p&gt;We were already deep in one fire: a Kubeflow Pipelines frontend image that kept reverting to an old tag.&lt;/p&gt;</description></item><item><title>Kubeflow is a version matrix, not a version</title><link>https://ferkakta.dev/kubeflow-is-a-version-matrix-not-a-version/</link><pubDate>Fri, 27 Feb 2026 00:00:00 +0000</pubDate><guid>https://ferkakta.dev/kubeflow-is-a-version-matrix-not-a-version/</guid><description>&lt;p&gt;&amp;ldquo;What version of Kubeflow are we on?&amp;rdquo;&lt;/p&gt;
&lt;p&gt;That looks like a simple platform inventory question.&lt;/p&gt;
&lt;p&gt;In practice, it was one of the most misleading questions in our incident.&lt;/p&gt;
&lt;p&gt;We had already fixed one visible symptom — image reconciliation behavior that kept reverting a frontend component — when we started asking version questions to prevent recurrence.&lt;/p&gt;
&lt;p&gt;The expected answer was one number.&lt;/p&gt;
&lt;p&gt;The real answer was a matrix.&lt;/p&gt;
&lt;h2 id="the-false-confidence-moment"&gt;The false confidence moment&lt;/h2&gt;
&lt;p&gt;The dangerous moment was not when something failed. It was when everything looked green enough to stop looking.&lt;/p&gt;</description></item></channel></rss>