Your employees are tenants and you should bill them like it

2026-03-16

I built a Lambda that enriches every Bedrock invocation with cost data and routes it to per-tenant CloudWatch log groups. Model ID, input tokens, output tokens, estimated cost in USD, all written to /bedrock/tenants/{tenant} so each customer’s AI spend is visible in near-real-time.

Then a developer on the team needed Bedrock access for local development, and I had a problem I hadn’t anticipated.

The invisible burn

The developer’s use case was reasonable. He was building features against the Bedrock API and needed to iterate against real models, not mocks. I created an SSO permission set with bedrock:InvokeModel and handed him the profile name.

Three days later I checked the Bedrock usage in Cost Explorer and found a number I couldn’t attribute to any tenant. It wasn’t large — maybe forty dollars — but it was unattributed. No log group, no cost enrichment, no record of which models were called or how many tokens were consumed. The developer’s calls went straight to the Bedrock API and the only evidence they happened was the line item on the monthly bill.

Forty dollars is nothing. But the pattern is everything. We’re running on AWS Activate credits, and the whole point of the enrichment pipeline is that nobody burns through shared resources without visibility. I’d built an airtight attribution system for customers and then handed an employee a hole in the floor.

The fix was smaller than the insight

The Bedrock log router Lambda already subscribes to CloudWatch Logs via a subscription filter. Every Bedrock invocation log hits the Lambda, which parses the model ID, counts tokens, looks up the per-model pricing table, calculates estimated cost, and writes an enriched record to the appropriate tenant log group.

The routing decision is a single function. It looks at the IAM principal ARN in the log event and extracts the tenant name. For assumed roles created by the tenant operator, the ARN contains the tenant identifier directly. The function maps that to /bedrock/tenants/{tenant} and writes the enriched log there.

Adding employee routing meant teaching that same function to recognize a different ARN pattern. When a developer authenticates through SSO, the principal ARN contains the SSO username — it’s right there in the role session name. A regex extracts it and routes to /bedrock/users/{username} instead of /bedrock/tenants/{tenant}.

Same Lambda. Same pricing table. Same token counting. Same estimated_cost_usd field in the output. Same CloudWatch Logs destination with a different prefix. The diff was about fifteen lines.

The developer became a tenant

Not metaphorically. The developer’s Bedrock usage now flows through the identical enrichment pipeline as every customer tenant. His calls get the same cost attribution, the same model-level breakdown, the same per-invocation records. The only difference is the log group prefix and the extraction regex for the routing key.

This means I can answer the same questions about a developer that I can answer about a tenant. Which models is he calling? How many tokens per invocation? What’s his daily spend curve? Is he hammering Claude Sonnet when Haiku would suffice for his test cases? All of it visible in CloudWatch, queryable, graphable, alertable.

Before, the developer’s usage was a black box that resolved into a line item thirty days later. Now it’s a stream with the same fidelity as production tenant traffic.

The generalization

The tenant abstraction isn’t a billing concept. It’s a resource attribution concept. Anyone who consumes shared infrastructure resources — a customer, a developer, a CI pipeline, a load test harness — is an entity that needs cost visibility. The question isn’t “are they a paying customer?” The question is “are they consuming resources that cost money?”

When I built the enrichment pipeline, I was thinking about it as a customer-facing feature. Per-tenant cost dashboards, usage-based billing inputs, spend alerts. The customer is the entity that matters, and the pipeline exists to serve them.

That framing left a blind spot exactly where it hurts most: internal consumption. Developers experimenting with models. Integration test suites calling Bedrock in CI. Demo environments running inference for investor presentations. All of these burn real dollars against real credits, and none of them had attribution until I widened the aperture from “tenants” to “entities.”

Dogfooding falls out naturally

There’s a secondary benefit I didn’t design for but got anyway. When the company’s own developers flow through the same cost attribution pipeline as customers, the company is dogfooding its own finops infrastructure.

If the enrichment Lambda drops events, developers notice because their dashboards go dark — before a customer reports the same gap. If the pricing table drifts from AWS’s actual rates, the discrepancy shows up in internal cost tracking first. If the token counting is wrong, someone on the team sees a number that doesn’t match their expectations from a call they just made.

The developer-as-tenant pattern converts internal usage from a cost center with no telemetry into a canary for the production pipeline. Every developer call is an implicit integration test of the cost attribution system.

The routing key is the only difference

I want to be specific about what changed and what didn’t, because the smallness of the change is the point.

The Lambda’s enrichment logic — parse model ID, extract token counts from the response metadata, look up per-thousand-token pricing, multiply, write the enriched record — is identical for tenants and employees. The cost calculation doesn’t know or care who made the call. It sees a Bedrock invocation log entry and produces a cost-annotated version of it.

The routing function examines the principal ARN. If it matches the tenant role pattern, extract the tenant name and write to /bedrock/tenants/{name}. If it matches the SSO session pattern, extract the username and write to /bedrock/users/{name}. If it matches neither, write to a dead-letter log group so nothing disappears silently.

That’s it. The “employee as tenant” capability is a conditional branch in a routing function. Everything upstream and downstream is shared.

What this means for the next entity

The pattern extends without modification. When I add a CI pipeline that calls Bedrock for automated testing, its service role ARN gets a third branch in the routing function and its logs land in /bedrock/ci/{pipeline}. When we stand up a demo environment for investor presentations, same thing. /bedrock/demos/{demo_name}.

Each new entity type is a new regex match and a new log group prefix. The enrichment pipeline, the pricing table, the cost calculation, the log structure — all of it stays the same. The marginal cost of adding a new entity type to the cost attribution system is one conditional branch and one log group prefix.

I didn’t design it this way on purpose. I designed it for tenants and then discovered that the abstraction was already general enough. The insight came from a forty-dollar line item I couldn’t explain. The fix took fifteen minutes. The lesson will outlast the code.

Cost attribution isn’t a billing feature. It’s how you make resource consumption visible, and visibility doesn’t care whether the consumer is a customer, an employee, or a robot.

#finops #multi-tenant #bedrock #aws