ferkakta.dev

Your ACM certificate request is a beacon — scanners are watching Certificate Transparency logs

I accidentally exposed production secrets on a public endpoint. Here’s what happened and what I learned about Certificate Transparency.

The setup

We’re building a multi-tenant SaaS platform on EKS. During development, our Terraform module defaulted to ealen/echo-server for three microservices — a lightweight HTTP server that echoes back request info. Seemed harmless.

What I missed: echo-server echoes EVERYTHING. Every environment variable in the container, including ones injected from AWS SSM via External Secrets Operator. Database connection strings. Redis auth tokens. OAuth client secrets. Signing keys. A single unauthenticated GET / returns it all as JSON.

Over a month, we spun up and tore down test tenants to validate provisioning automation. At least six ran echo-server with real secrets on internet-facing endpoints. Exposure windows were 6 minutes to 3 hours each.

Why exposure time doesn’t matter

When you request an ACM certificate, it hits public Certificate Transparency logs within seconds. CT is a public, append-only ledger of every TLS certificate issued. It exists so domain owners can detect unauthorized certs — but it also means scanners like Censys and Shodan can monitor every new hostname in real time.

A new *.something.yourdomain.com cert is a neon sign saying “new infrastructure here.”

On the one tenant with WAF logging enabled, we saw 39 unique IPs probing within 3 hours. They hit:

/.env
/.git/config
/.vscode/sftp.json
/wp-admin/
/actuator/env

These aren’t humans. They’re automated pipelines that harvest CT logs and scan every new hostname immediately. Your exposure window starts the moment the cert is issued, not when you think you’re “ready.”

What saved us (accidentally)

Most test tenants were missing SSM parameters that External Secrets needed. Pods failed with CreateContainerConfigError before echo-server could start. Broken provisioning was accidentally a kill switch.

But we can’t rely on accidental safety. The tenants where provisioning worked had everything exposed.

The reconstruction

We had no centralized logging for most of the exposure period. WAF logging was only on one tenant. We reconstructed the full timeline from:

This is why observability on day one matters. Not day two. Day one.

Takeaways

  1. Never use echo-server as a default in deployment templates. If provisioning fails, it should fail loudly — not deploy something that silently leaks your environment.

  2. Certificate Transparency means your infrastructure is public the moment you request a cert. If you’re not ready for traffic, don’t create the DNS record.

  3. WAF logging isn’t optional. Enable it on day one, even in dev. You can’t investigate an incident if you have no logs from the exposure window.

  4. Audit your container images like you audit your dependencies. A “harmless testing tool” with access to your secret store is a data exfiltration vector.

  5. Defense in depth works even when individual layers fail. Short-lived tenants, missing parameters, and automated teardown limited exposure. No single control was sufficient, but together they contained the damage.

The scanners are fast. Your observability needs to be faster.

#aws #security #acm