IAM eventual consistency is 4 seconds — if your policy still doesn’t work, you have a bug

2026-02-26

I changed an IAM inline policy on a role — added an sts:AssumeRole statement so a pod could assume a cross-account SES role. Ran terraform apply. Checked the policy with get-role-policy. The old policy came back. No new statement.

I said “propagation delay” and moved on to other work.

Twenty minutes later I checked again. Same old policy. That’s not propagation.

What eventual consistency actually means

AWS IAM uses a distributed computing model. Changes to policies, roles, and credentials take time to replicate across endpoints. AWS documents this explicitly and recommends not including IAM changes in critical code paths.

The control plane is immediately consistent — API calls reflect changes the moment they return. Authorization enforcement is eventually consistent. The window is approximately 4 seconds.

OFFENSAI published research in 2025 measuring this across us-east-1 and eu-central-1. Deleted credentials remained valid for ~4 seconds. Detached policies continued to authorize for ~4 seconds. AWS acknowledged the findings and patched one specific scenario (deleted access keys creating new access keys). The remaining behaviors — policy attachment, role assumption, group membership, permission boundaries — all propagate in the same ~4 second window.

Four seconds. Not four minutes. Not twenty.

My actual bug

terraform apply had run against empty state. The state file for this tenant was missing from S3, so Terraform saw zero existing resources and created everything fresh — 54 new resources including a complete parallel set of IAM roles alongside the existing ones.

The old role: myapp-apiserver-tenant-foo-42b1fbfc0a. Created 15 hours earlier. Bound to the running pod.

The new role: myapp-apiserver-tenant-foo. Created by my apply. Had the correct policy. Bound to nothing.

I was querying the old role and reading the old policy back to myself. The new policy was exactly where Terraform put it — on a role that no pod was using.

aws iam list-roles filtered by name showed two CreateDate values 15 hours apart. The problem was obvious the moment I looked at the right data.

Why “propagation delay” is comfortable

It requires no investigation. The system is slow, not wrong. Wait and it fixes itself. It’s the distributed systems equivalent of “it works on my machine” — a hypothesis that conveniently absolves you of looking harder.

Every time I’ve blamed IAM propagation delay for something broken longer than 10 seconds, the real cause was one of these:

Wrong role. Duplicate resources, stale state, different naming convention.
Wrong policy. Syntactically valid, logically wrong — mismatched ARN, missing condition key, action/resource mismatch.
Wrong side of the assumption. sts:AssumeRole needs to be permitted on the caller’s IAM policy AND the target’s trust policy. I had the trust policy right and the caller’s policy missing.
Wrong credentials. The pod was using cached STS tokens from a previous role binding.

Every one of these looks like “the policy isn’t working yet.” None of them are propagation.

The 10-second rule

IAM eventual consistency is real, documented, and measurable at ~4 seconds. If an IAM change doesn’t take effect within 10 seconds, the change didn’t land where you think it did. Stop waiting. Start querying. List the roles, read the policy, check the caller identity, verify the trust policy. The system is behaving correctly — the question is whether I am.

#aws #iam #debugging