FinOps portfolio: 71 tickets over 5 years

2026-04-01

My first finops ticket was called “Optimize the AWS infrastcuture.” The typo is still there. That was 2021 — a one-person infrastructure team at a startup that didn’t have the word finops in its vocabulary and didn’t know it needed one.

Five years later I went looking for every cost-related ticket I’d ever created. I expected maybe thirty. I found 71, spread across 8 Jira projects, touching every layer of the stack from EBS volumes to LLM inference spend. Nobody asked me to create a finops practice. I just kept looking at the bill and refusing to pay for things that didn’t earn their keep.

When I pulled the wider Jira archive, this portfolio stopped looking like an isolated sprint and started looking like the explicit cost-and-governance slice of a much longer substrate line. Before there was a FinOps spike, there were DNS cutovers, VPN setups, RDS recreations, kubecost installs, spot-instance experiments, external DNS work, and tenant plumbing. The project names changed. The instinct did not.

The arc

The tickets tell a story if you read them in order. Not a strategy — a maturation. The 71-ticket set is the part where the company had enough shared vocabulary for me to call the work FinOps. The lineage is older: infrastructure hygiene became cost visibility, then right-sizing, then deletion, then governance, and finally architecture that makes future waste harder to create.

First came audits. “Audit our AWS costs.” “Quantify the cost of our current setup.” “Create test plan for spot instances.” I was trying to see the bill clearly, because nobody else was looking.

Then visibility. I installed kubecost in 2021 to get per-namespace cost attribution on EKS. Later I enabled the AWS Cost Optimization Hub and started exporting its recommendations as FOCUS-format Parquet to S3 — which I query with DuckDB instead of clicking through the console, because ClickOps is schlepping.

Then right-sizing. gp2 to gp3. io1 to gp3 ($198/mo on one RDS instance). Magnetic EBS volumes nobody remembered creating. 40 orphaned EBS volumes that AWS recommended I snapshot and delete. 25 orphaned target groups. A t2.micro running for years with no name and no owner ($8/mo). I converted 17 volumes to gp3 in one evening and knocked $39/mo off the bill before dinner.

Then elimination. Three AWS Client VPN endpoints that had accumulated over four years of incremental security decisions — $489/mo replaced by a $3 t4g.nano running Headscale. A Transit Gateway that never carried traffic. A SAML VPN, a cert-based VPN, a NAT gateway, a Simple AD directory — all deleted, all ticketed, all with dollar amounts. A Jenkins instance and its ALB ($55/mo). A second ALB nobody was using ($22/mo). An EKS cluster that was part of an abandoned upgrade path ($72/mo for the control plane alone). Ancient stopped instances from 2016 that were still paying for EBS volumes. A mystery AWS Data Pipeline that had been billing $1/mo from a service AWS itself had canceled — the console was gone, but the CLI still worked.

Then replacement architecture. The VPN replacement wasn’t just a deletion — it was a design decision. I replaced a managed service with open-source infrastructure, open-sourced the Terraform module, and published the post. The Bedrock log router went from per-tenant subscription filters to a shared Lambda — consolidation as architecture, not just as cost reduction.

The current estate looks the same to me. A Well-Architected Review, support-plan downgrades, GovCloud auth-handler parity, tenant onboarding cleanup, and redundant rebuild elimination are not separate chores. They are cost, governance, tenancy, and operability showing up as one system again.

The monitoring stack

I don’t check the bill once a quarter. I have three systems watching it for me.

MiserBot sends a daily Slack report showing spend changes. It installs as a CloudFormation stack — one IAM role with read-only access plus CUR write permissions, and Concurrency Labs does the rest. When MiserBot flagged a new line item in March 2026, I traced it to RDS entering extended support at $330/mo. I stopped everything and upgraded PostgreSQL 13 to 16 that week.

AWS daily spend budget alarms fire when the annualized daily spend (daily × 30.3) crosses a threshold. I add a new lower threshold every time I drive costs down. The alarm history is the progress narrative: $6,000/mo was the original. $4,000/mo was the first achievement — I wanted to know if it ever crept back. Then $3,000. Then $2,000. Each alarm is a ratchet that locks in the gains and makes regression visible.

AWS anomaly detection emails handle the rest. Combined with bitter experience — I’ve been burned by extended support charges more than once — they drive preemptive upgrades on RDS and EKS before AWS starts billing penalty rates.

The feedback loop: MiserBot flags a change → I open DuckDB → query the Cost Optimization Hub Parquet in S3 → investigate → act. During the January sprint, I ran the same query every morning and watched $716 in available savings drop to $490 as I worked through the recommendations. $226/mo captured in two weeks.

The savings plan that made itself unnecessary

I’d been renewing a 1-year Compute Savings Plan annually. Coverage reports showed 50% of our compute was covered in November, December, January — not good enough. I bought a 3-year plan to push coverage to ~82% and created a daily budget alarm: if coverage drops below 80%, I get an email.

The plan was to wait for the 1-year to expire, then stagger monthly purchases of 3-year Compute Savings Plans — replacing the outgoing annual capacity with cheaper three-year commitments, spreading the risk across months instead of one annual cliff.

Phase 1 of the finops spike eliminated so much obsolete compute that when the 1-year plan expired, there was no gap left to fill. I canceled it instead of renewing. The optimization had made the discount instrument unnecessary. That’s the best possible outcome — you don’t need the coupon because you stopped buying the thing.

The archaeology

In March I went back through an older product’s infrastructure — a platform from the company’s earlier life. g2.2xlarge GPU instances from an era when that was a reasonable instance type. CloudFormation stacks for deployment configurations nobody could name. CodeDeploy applications for services that hadn’t run in years. An RDS instance that AWS kept restarting for maintenance every week because it was “stopped temporarily” — I had to race them to delete it.

I deleted 71 EBS volumes from an abandoned cluster in a single session. I inventoried 4.7 TB of RDS snapshots in dev, some dating back to 2016. I stopped five running instances that predated Kubernetes. Every one of these was an ancestor of the current infrastructure — still breathing, still billing, invisible to anyone who didn’t go looking.

The discipline

Every optimization gets a ticket. Every ticket has a dollar amount where possible. Every threshold gets an alarm. I created a private Slack channel mid-sprint as an operational diary — daily logs of what I deleted, converted, or proposed, with volume IDs, DuckDB queries, and CLI output. I wish I’d created it years earlier.

The portfolio now spans 71 tickets across 8 Jira projects, 5 years, and a spend reduction from $6,000/mo to under $2,000/mo. Most of those tickets took less than an hour. Some took five minutes. A few — the VPN replacement, the EKS cluster consolidation proposal, the savings plan strategy — took real architectural thinking.

None of them required a finops team. They required an engineer who looked at the bill every day and kept asking: does this earn its keep?

The receipts

Everything above is the narrative. Below is the evidence — the daily diary entries from the sprint and the full ticket portfolio. Scroll if you want the story. Stay if you want the proof.

The January sprint — daily Slack diary

I kept a private channel during the sprint. Here’s what a month of daily finops work looks like when one person is doing it between other responsibilities:

Date	What I did
02/10	Deleted the AD directory ($32/mo)
02/11	Created daily savings plan coverage budget (80% threshold). Bought 3-year savings plan. Coverage: 50% → ~82%
02/12	Deleted mystery AWS Data Pipeline “Test” ($1/mo for years, from a canceled service). Set overall agenda: intelligent tiering, Graviton migrations, right-size everything, budget for everything
02/13	Decommissioned two unused RDS instances
02/14	Replaced x86 RDS instance with Graviton for 20% off
02/17	Backed up AMI for GPU workstation, terminated stopped instance + 500GB volume ($50/mo)
02/18	Removed EKS observability addon ($8/day anomaly alert) — will re-add scoped to specific namespaces
02/22	Deleted 71 EBS volumes from abandoned EKS cluster in prod
02/26	Cost Optimization Hub: $716 available savings → $490 ($226 captured). Proposed EBS lightning round: 18 volumes to gp3, 40 volumes to snapshot+delete ($138/mo). Proposed deleting unused EKS cluster ($72/mo). Proposed deleting unused AWS accounts. Killed CloudWatch logging (cost anomaly, not working). Converted 17 volumes to gp3 ($39/mo)
02/27	Converted and deleted second tranche of volumes
02/28	Applied $25 AWS customer council credit. Planned Textract cost optimization strategy
03/18	Legacy platform archaeology: stopped 5 ancient running instances (g2.2xlarge, c4.xlarge, t2s). Inventoried 11 CloudFormation stacks, 20 CodeDeploy applications. Caught RDS instance before AWS restarted it for maintenance
03/22	Inventoried 4.7 TB of RDS snapshots in dev (some from 2016)

The full ticket portfolio — 71 tickets, 8 projects, 2021–2026

Every ticket below is real. The project prefixes are sanitized, the dollar amounts are not. Read the “Done” column as a progress bar — most of the value has already been captured. The backlog items are either blocked by other teams, deliberately on hold, or waiting for the right moment.

Epics

Ticket	Summary	Status
RM-434	FinOps Spike: Reduce Ferkakta.net AWS from $4k to <$2k	In Progress
RM-465	FinOps Phase 2: Cloud Governance — Give Every Dollar a Job	Backlog
RM-176	Cost reduction for infrastructure	Done

Under Epic RM-434 (28 tickets)

Ticket	Summary	Status	$ Impact
RM-435	Switch RDS app-db from io1 to gp3	Done	$198/mo
RM-436	ECR cleanup: lifecycle policies + delete orphaned repos	Backlog	$200+/mo
RM-437	Kill unused internal ALB	Done	$22/mo
RM-438	Kill Jenkins-new instance + ALB	Done	$55/mo
RM-439	Release 3 idle EIPs	Done	$15/mo
RM-440	Kill unnamed t2.micro	Done	$8/mo
RM-441	Delete orphaned CloudWatch log groups	Done	$5/mo
RM-442	Apply S3 lifecycle policies to all buckets	Done	$30/mo
RM-443	Replace Textract with Claude Vision for COI extraction	Backlog	$400/mo
RM-444	Disable GuardDuty EKS Runtime Monitoring	Done	$126/mo
RM-445	Investigate Simple AD in us-west-2	Done	$37/mo
RM-446	Migrate gp2 to gp3 EBS volumes	Done	$16/mo
RM-447	Investigate Magnetic EBS volumes	Done	$28/mo
RM-448	TGW to VPC Peering migration	Backlog	$146/mo
RM-449	Consolidate 2 EKS clusters to 1	Backlog	$148/mo
RM-450	Delete 25 orphaned target groups	Done	free
RM-451	Deploy Falco for K8s runtime security (replaces GuardDuty)	Backlog	—
RM-452	Release 3 idle EIPs in ap-south-1	Done	$11/mo
RM-453	Delete cert-based VPN	Done	—
RM-454	Delete TGW (never functional - 0 traffic)	Done	—
RM-455	Deploy Headscale VPN (replace dev VPNs)	Done	—
RM-456	Delete SAML VPN (after Tailscale validated)	Done	—
RM-457	Delete orphaned EBS volumes (CloudWatch verified)	Done	—
RM-458	Delete unused NAT Gateway	Done	—
RM-459	Terminate ancient stopped instances (2016)	Done	—
RM-460	Delete orphaned Simple AD	Done	—
RM-461	RDS: Migrate remaining gp2 to gp3	Done	—
RM-462	Terminated abandoned spot instance	Done	—
RM-463	Delete 2 orphaned EBS volumes	Done	—
RM-464	RDS Storage Right-sizing	In Progress	$30/mo

VPN replacement arc (~$489/mo → $3/mo)

Ticket	Summary	Status
RM-173	Secure VPC resources behind VPN (original setup)	Done
RM-394	AWS VPN endpoints — disable endpoints not needed	Done
RM-453	Delete cert-based VPN	Done
RM-455	Deploy Headscale VPN (replace dev VPNs)	Done
RM-456	Delete SAML VPN (after Tailscale validated)	Done

Cost Optimization Hub cluster (INF)

Ticket	Summary	Status
RM-345	Set Up Cost Optimization Hub in AWS	Done
RM-346	Follow Up on Cost Optimization Hub Findings	Pending Review
RM-347	Clean Up Unused Airsim Instances	Done
RM-348	Optimize unused PostgreSQL RDS instance	Done
RM-349	Review and cleanup unused MySQL RDS instances	Done
RM-371	Convert EBS Volumes to gp3	Done
RM-372	Snapshot and Delete Underutilized EBS Volumes	Done

Infrastructure cleanup

Ticket	Summary	Status
RM-342	Upgrade RDS PostgreSQL 12 before end of standard support	Done
RM-343	Upgrade RDS MySQL before deprecation	Done
RM-369	Delete Sandbox Account in AWS	Pending Review
RM-370	Delete unused EKS cluster	Done
RM-465	Upgrade PostgreSQL 13→16 ($330/mo extended support)	Done

Scale-down / auto-scaling

Ticket	Summary	Status
RM-77	Create pipeline for scaling down deployments (60min timeout)	Done
RM-78	Create API for scale-down pipeline	Done
RM-83	Create resource for scaling up/down APIs	Done
RM-377	Fix scale-down-delay and scale-to-zero annotations	Done
RM-401	Add Windows idle disconnect policy to WorkSpaces bootstrap	Done

SaaS platform cost optimization

Ticket	Summary	Status
RM-256	Replace per-service log filters with shared router Lambda	Done
RM-261	EKS cluster scale-to-zero for cost optimization	Done
RM-262	Validate full cluster teardown and rebuild	Obsolete
RM-442	Fix EKS park/unpark Terraform min_size drift	Done
RM-452	Add pre-destroy step to delete ALB bootstrap ingress	To Do

Right-sizing

Ticket	Summary	Status
RM-14	Optimize the AWS infrastcuture	Done
RM-37	Install kubecost	Done
RM-42	Create test plan for spot instances	Done
RM-43	Downsize EKS nodegroup from 5 to 3	Done
RM-61	Implement EBS cleanup strategy	Done
RM-83	Audit our AWS costs	Done
RM-201	Reduce EBS volume cost	Done
RM-415	Explore cost reduction options with archera platform	Backlog
RM-39	Update EBS cleanup Jenkins jobs to delete PVCs directly	Backlog
RM-184	Quantify the cost of our current setup	Done
RM-313	Change default instance type to cheapest available	Done
RM-436	Self-hosted GHA runner on EKS Graviton for ARM64 builds	To Do

Visibility & governance

Ticket	Summary	Status
RM-189	Enable tenant cost allocation tag	Obsolete
RM-204	Enable cost allocation tags in management account	Done
RM-269	Add finops tags to all Terraform root modules	To Do
RM-270	FinOps automation: expiry enforcement, cost sweeps	To Do

Spend controls (LLM)

Ticket	Summary	Status
RM-472	Lock down expensive LLM models	Backlog
RM-473	Per-model daily spend limits on LiteLLM proxy	Backlog

Direct cost reduction

Ticket	Summary	Status
RM-501	Downgrade AWS support plans	To Do

#finops #aws #platformengineering #duckdb