FinOps portfolio: 71 tickets over 5 years
My first finops ticket was called “Optimize the AWS infrastcuture.” The typo is still there. That was 2021 — a one-person infrastructure team at a startup that didn’t have the word finops in its vocabulary and didn’t know it needed one.
Five years later I went looking for every cost-related ticket I’d ever created. I expected maybe thirty. I found 71, spread across 8 Jira projects, touching every layer of the stack from EBS volumes to LLM inference spend. Nobody asked me to create a finops practice. I just kept looking at the bill and refusing to pay for things that didn’t earn their keep.
When I pulled the wider Jira archive, this portfolio stopped looking like an isolated sprint and started looking like the explicit cost-and-governance slice of a much longer substrate line. Before there was a FinOps spike, there were DNS cutovers, VPN setups, RDS recreations, kubecost installs, spot-instance experiments, external DNS work, and tenant plumbing. The project names changed. The instinct did not.
The arc
The tickets tell a story if you read them in order. Not a strategy — a maturation. The 71-ticket set is the part where the company had enough shared vocabulary for me to call the work FinOps. The lineage is older: infrastructure hygiene became cost visibility, then right-sizing, then deletion, then governance, and finally architecture that makes future waste harder to create.
First came audits. “Audit our AWS costs.” “Quantify the cost of our current setup.” “Create test plan for spot instances.” I was trying to see the bill clearly, because nobody else was looking.
Then visibility. I installed kubecost in 2021 to get per-namespace cost attribution on EKS. Later I enabled the AWS Cost Optimization Hub and started exporting its recommendations as FOCUS-format Parquet to S3 — which I query with DuckDB instead of clicking through the console, because ClickOps is schlepping.
Then right-sizing. gp2 to gp3. io1 to gp3 ($198/mo on one RDS instance). Magnetic EBS volumes nobody remembered creating. 40 orphaned EBS volumes that AWS recommended I snapshot and delete. 25 orphaned target groups. A t2.micro running for years with no name and no owner ($8/mo). I converted 17 volumes to gp3 in one evening and knocked $39/mo off the bill before dinner.
Then elimination. Three AWS Client VPN endpoints that had accumulated over four years of incremental security decisions — $489/mo replaced by a $3 t4g.nano running Headscale. A Transit Gateway that never carried traffic. A SAML VPN, a cert-based VPN, a NAT gateway, a Simple AD directory — all deleted, all ticketed, all with dollar amounts. A Jenkins instance and its ALB ($55/mo). A second ALB nobody was using ($22/mo). An EKS cluster that was part of an abandoned upgrade path ($72/mo for the control plane alone). Ancient stopped instances from 2016 that were still paying for EBS volumes. A mystery AWS Data Pipeline that had been billing $1/mo from a service AWS itself had canceled — the console was gone, but the CLI still worked.
Then replacement architecture. The VPN replacement wasn’t just a deletion — it was a design decision. I replaced a managed service with open-source infrastructure, open-sourced the Terraform module, and published the post. The Bedrock log router went from per-tenant subscription filters to a shared Lambda — consolidation as architecture, not just as cost reduction.
The current estate looks the same to me. A Well-Architected Review, support-plan downgrades, GovCloud auth-handler parity, tenant onboarding cleanup, and redundant rebuild elimination are not separate chores. They are cost, governance, tenancy, and operability showing up as one system again.
The monitoring stack
I don’t check the bill once a quarter. I have three systems watching it for me.
MiserBot sends a daily Slack report showing spend changes. It installs as a CloudFormation stack — one IAM role with read-only access plus CUR write permissions, and Concurrency Labs does the rest. When MiserBot flagged a new line item in March 2026, I traced it to RDS entering extended support at $330/mo. I stopped everything and upgraded PostgreSQL 13 to 16 that week.
AWS daily spend budget alarms fire when the annualized daily spend (daily × 30.3) crosses a threshold. I add a new lower threshold every time I drive costs down. The alarm history is the progress narrative: $6,000/mo was the original. $4,000/mo was the first achievement — I wanted to know if it ever crept back. Then $3,000. Then $2,000. Each alarm is a ratchet that locks in the gains and makes regression visible.
AWS anomaly detection emails handle the rest. Combined with bitter experience — I’ve been burned by extended support charges more than once — they drive preemptive upgrades on RDS and EKS before AWS starts billing penalty rates.
The feedback loop: MiserBot flags a change → I open DuckDB → query the Cost Optimization Hub Parquet in S3 → investigate → act. During the January sprint, I ran the same query every morning and watched $716 in available savings drop to $490 as I worked through the recommendations. $226/mo captured in two weeks.
The savings plan that made itself unnecessary
I’d been renewing a 1-year Compute Savings Plan annually. Coverage reports showed 50% of our compute was covered in November, December, January — not good enough. I bought a 3-year plan to push coverage to ~82% and created a daily budget alarm: if coverage drops below 80%, I get an email.
The plan was to wait for the 1-year to expire, then stagger monthly purchases of 3-year Compute Savings Plans — replacing the outgoing annual capacity with cheaper three-year commitments, spreading the risk across months instead of one annual cliff.
Phase 1 of the finops spike eliminated so much obsolete compute that when the 1-year plan expired, there was no gap left to fill. I canceled it instead of renewing. The optimization had made the discount instrument unnecessary. That’s the best possible outcome — you don’t need the coupon because you stopped buying the thing.
The archaeology
In March I went back through an older product’s infrastructure — a platform from the company’s earlier life. g2.2xlarge GPU instances from an era when that was a reasonable instance type. CloudFormation stacks for deployment configurations nobody could name. CodeDeploy applications for services that hadn’t run in years. An RDS instance that AWS kept restarting for maintenance every week because it was “stopped temporarily” — I had to race them to delete it.
I deleted 71 EBS volumes from an abandoned cluster in a single session. I inventoried 4.7 TB of RDS snapshots in dev, some dating back to 2016. I stopped five running instances that predated Kubernetes. Every one of these was an ancestor of the current infrastructure — still breathing, still billing, invisible to anyone who didn’t go looking.
The discipline
Every optimization gets a ticket. Every ticket has a dollar amount where possible. Every threshold gets an alarm. I created a private Slack channel mid-sprint as an operational diary — daily logs of what I deleted, converted, or proposed, with volume IDs, DuckDB queries, and CLI output. I wish I’d created it years earlier.
The portfolio now spans 71 tickets across 8 Jira projects, 5 years, and a spend reduction from $6,000/mo to under $2,000/mo. Most of those tickets took less than an hour. Some took five minutes. A few — the VPN replacement, the EKS cluster consolidation proposal, the savings plan strategy — took real architectural thinking.
None of them required a finops team. They required an engineer who looked at the bill every day and kept asking: does this earn its keep?
The receipts
Everything above is the narrative. Below is the evidence — the daily diary entries from the sprint and the full ticket portfolio. Scroll if you want the story. Stay if you want the proof.
The January sprint — daily Slack diary
I kept a private channel during the sprint. Here’s what a month of daily finops work looks like when one person is doing it between other responsibilities:
| Date | What I did |
|---|---|
| 02/10 | Deleted the AD directory ($32/mo) |
| 02/11 | Created daily savings plan coverage budget (80% threshold). Bought 3-year savings plan. Coverage: 50% → ~82% |
| 02/12 | Deleted mystery AWS Data Pipeline “Test” ($1/mo for years, from a canceled service). Set overall agenda: intelligent tiering, Graviton migrations, right-size everything, budget for everything |
| 02/13 | Decommissioned two unused RDS instances |
| 02/14 | Replaced x86 RDS instance with Graviton for 20% off |
| 02/17 | Backed up AMI for GPU workstation, terminated stopped instance + 500GB volume ($50/mo) |
| 02/18 | Removed EKS observability addon ($8/day anomaly alert) — will re-add scoped to specific namespaces |
| 02/22 | Deleted 71 EBS volumes from abandoned EKS cluster in prod |
| 02/26 | Cost Optimization Hub: $716 available savings → $490 ($226 captured). Proposed EBS lightning round: 18 volumes to gp3, 40 volumes to snapshot+delete ($138/mo). Proposed deleting unused EKS cluster ($72/mo). Proposed deleting unused AWS accounts. Killed CloudWatch logging (cost anomaly, not working). Converted 17 volumes to gp3 ($39/mo) |
| 02/27 | Converted and deleted second tranche of volumes |
| 02/28 | Applied $25 AWS customer council credit. Planned Textract cost optimization strategy |
| 03/18 | Legacy platform archaeology: stopped 5 ancient running instances (g2.2xlarge, c4.xlarge, t2s). Inventoried 11 CloudFormation stacks, 20 CodeDeploy applications. Caught RDS instance before AWS restarted it for maintenance |
| 03/22 | Inventoried 4.7 TB of RDS snapshots in dev (some from 2016) |
The full ticket portfolio — 71 tickets, 8 projects, 2021–2026
Every ticket below is real. The project prefixes are sanitized, the dollar amounts are not. Read the “Done” column as a progress bar — most of the value has already been captured. The backlog items are either blocked by other teams, deliberately on hold, or waiting for the right moment.
Epics
| Ticket | Summary | Status |
|---|---|---|
| RM-434 | FinOps Spike: Reduce Ferkakta.net AWS from $4k to <$2k | In Progress |
| RM-465 | FinOps Phase 2: Cloud Governance — Give Every Dollar a Job | Backlog |
| RM-176 | Cost reduction for infrastructure | Done |
Under Epic RM-434 (28 tickets)
| Ticket | Summary | Status | $ Impact |
|---|---|---|---|
| RM-435 | Switch RDS app-db from io1 to gp3 | Done | $198/mo |
| RM-436 | ECR cleanup: lifecycle policies + delete orphaned repos | Backlog | $200+/mo |
| RM-437 | Kill unused internal ALB | Done | $22/mo |
| RM-438 | Kill Jenkins-new instance + ALB | Done | $55/mo |
| RM-439 | Release 3 idle EIPs | Done | $15/mo |
| RM-440 | Kill unnamed t2.micro | Done | $8/mo |
| RM-441 | Delete orphaned CloudWatch log groups | Done | $5/mo |
| RM-442 | Apply S3 lifecycle policies to all buckets | Done | $30/mo |
| RM-443 | Replace Textract with Claude Vision for COI extraction | Backlog | $400/mo |
| RM-444 | Disable GuardDuty EKS Runtime Monitoring | Done | $126/mo |
| RM-445 | Investigate Simple AD in us-west-2 | Done | $37/mo |
| RM-446 | Migrate gp2 to gp3 EBS volumes | Done | $16/mo |
| RM-447 | Investigate Magnetic EBS volumes | Done | $28/mo |
| RM-448 | TGW to VPC Peering migration | Backlog | $146/mo |
| RM-449 | Consolidate 2 EKS clusters to 1 | Backlog | $148/mo |
| RM-450 | Delete 25 orphaned target groups | Done | free |
| RM-451 | Deploy Falco for K8s runtime security (replaces GuardDuty) | Backlog | — |
| RM-452 | Release 3 idle EIPs in ap-south-1 | Done | $11/mo |
| RM-453 | Delete cert-based VPN | Done | — |
| RM-454 | Delete TGW (never functional - 0 traffic) | Done | — |
| RM-455 | Deploy Headscale VPN (replace dev VPNs) | Done | — |
| RM-456 | Delete SAML VPN (after Tailscale validated) | Done | — |
| RM-457 | Delete orphaned EBS volumes (CloudWatch verified) | Done | — |
| RM-458 | Delete unused NAT Gateway | Done | — |
| RM-459 | Terminate ancient stopped instances (2016) | Done | — |
| RM-460 | Delete orphaned Simple AD | Done | — |
| RM-461 | RDS: Migrate remaining gp2 to gp3 | Done | — |
| RM-462 | Terminated abandoned spot instance | Done | — |
| RM-463 | Delete 2 orphaned EBS volumes | Done | — |
| RM-464 | RDS Storage Right-sizing | In Progress | $30/mo |
VPN replacement arc (~$489/mo → $3/mo)
| Ticket | Summary | Status |
|---|---|---|
| RM-173 | Secure VPC resources behind VPN (original setup) | Done |
| RM-394 | AWS VPN endpoints — disable endpoints not needed | Done |
| RM-453 | Delete cert-based VPN | Done |
| RM-455 | Deploy Headscale VPN (replace dev VPNs) | Done |
| RM-456 | Delete SAML VPN (after Tailscale validated) | Done |
Cost Optimization Hub cluster (INF)
| Ticket | Summary | Status |
|---|---|---|
| RM-345 | Set Up Cost Optimization Hub in AWS | Done |
| RM-346 | Follow Up on Cost Optimization Hub Findings | Pending Review |
| RM-347 | Clean Up Unused Airsim Instances | Done |
| RM-348 | Optimize unused PostgreSQL RDS instance | Done |
| RM-349 | Review and cleanup unused MySQL RDS instances | Done |
| RM-371 | Convert EBS Volumes to gp3 | Done |
| RM-372 | Snapshot and Delete Underutilized EBS Volumes | Done |
Infrastructure cleanup
| Ticket | Summary | Status |
|---|---|---|
| RM-342 | Upgrade RDS PostgreSQL 12 before end of standard support | Done |
| RM-343 | Upgrade RDS MySQL before deprecation | Done |
| RM-369 | Delete Sandbox Account in AWS | Pending Review |
| RM-370 | Delete unused EKS cluster | Done |
| RM-465 | Upgrade PostgreSQL 13→16 ($330/mo extended support) | Done |
Scale-down / auto-scaling
| Ticket | Summary | Status |
|---|---|---|
| RM-77 | Create pipeline for scaling down deployments (60min timeout) | Done |
| RM-78 | Create API for scale-down pipeline | Done |
| RM-83 | Create resource for scaling up/down APIs | Done |
| RM-377 | Fix scale-down-delay and scale-to-zero annotations | Done |
| RM-401 | Add Windows idle disconnect policy to WorkSpaces bootstrap | Done |
SaaS platform cost optimization
| Ticket | Summary | Status |
|---|---|---|
| RM-256 | Replace per-service log filters with shared router Lambda | Done |
| RM-261 | EKS cluster scale-to-zero for cost optimization | Done |
| RM-262 | Validate full cluster teardown and rebuild | Obsolete |
| RM-442 | Fix EKS park/unpark Terraform min_size drift | Done |
| RM-452 | Add pre-destroy step to delete ALB bootstrap ingress | To Do |
Right-sizing
| Ticket | Summary | Status |
|---|---|---|
| RM-14 | Optimize the AWS infrastcuture | Done |
| RM-37 | Install kubecost | Done |
| RM-42 | Create test plan for spot instances | Done |
| RM-43 | Downsize EKS nodegroup from 5 to 3 | Done |
| RM-61 | Implement EBS cleanup strategy | Done |
| RM-83 | Audit our AWS costs | Done |
| RM-201 | Reduce EBS volume cost | Done |
| RM-415 | Explore cost reduction options with archera platform | Backlog |
| RM-39 | Update EBS cleanup Jenkins jobs to delete PVCs directly | Backlog |
| RM-184 | Quantify the cost of our current setup | Done |
| RM-313 | Change default instance type to cheapest available | Done |
| RM-436 | Self-hosted GHA runner on EKS Graviton for ARM64 builds | To Do |
Visibility & governance
| Ticket | Summary | Status |
|---|---|---|
| RM-189 | Enable tenant cost allocation tag | Obsolete |
| RM-204 | Enable cost allocation tags in management account | Done |
| RM-269 | Add finops tags to all Terraform root modules | To Do |
| RM-270 | FinOps automation: expiry enforcement, cost sweeps | To Do |
Spend controls (LLM)
| Ticket | Summary | Status |
|---|---|---|
| RM-472 | Lock down expensive LLM models | Backlog |
| RM-473 | Per-model daily spend limits on LiteLLM proxy | Backlog |
Direct cost reduction
| Ticket | Summary | Status |
|---|---|---|
| RM-501 | Downgrade AWS support plans | To Do |