ferkakta.dev

from feature_flags import *

A colleague needed a feature flag enabled on one tenant. FEATURE_FLAG_ENABLE_AGENTS=True — one environment variable, one pod. I added it to the K8s secret manually, restarted the pod, and he was unblocked in two minutes.

Then I realized: the next terraform apply would overwrite that secret without the flag. The ExternalSecret syncs from SSM, and the flag wasn’t in SSM through any path terraform knew about. My manual fix had a shelf life of one deploy.

The obvious solution — add the flag to the terraform module’s ssm_secrets map — means a PR, a plan, a review, an apply, and a deploy cycle. For one boolean. And the next flag would need the same cycle. And the one after that.

I wanted from feature_flags import *.

The explicit import problem

The tenant deployment already had 30+ environment variables, each individually mapped from SSM Parameter Store through External Secrets Operator. Every variable declared in terraform, every SSM path explicit:

ssm_secrets = {
  DATABASE_URL         = "/ramparts/dev/tenants/momcorp/apiserver/db_connection_string"
  REDIS_CACHE_HOST     = "/ramparts/dev/tenants/momcorp/apiserver/REDIS_CACHE_HOST"
  SECRET_KEY           = "/ramparts/dev/tenants/momcorp/apiserver/SECRET_KEY"
  # ... 27 more
}

Each one becomes an env entry on the deployment with a secretKeyRef pointing to the ESO-managed K8s Secret. Explicit, auditable, and completely static. Adding a variable means editing terraform. That’s the right pattern for database URLs and API keys — you want to see every secret in the code.

Feature flags are different. They’re intentionally dynamic. The whole point is that someone can flip a flag without a code change. Requiring a terraform PR for each flag defeats the purpose.

What ESO can do that I didn’t know about

External Secrets Operator has a dataFrom.find field that discovers parameters by path prefix and name pattern. Instead of listing each parameter individually, you tell ESO “find everything under this path that matches this regex”:

dataFrom:
  - find:
      path: /ramparts/dev/tenants/momcorp/apiserver
      name:
        regexp: "FEATURE_FLAG_"
      conversionStrategy: Default
      decodingStrategy: None
    rewrite:
      - regexp:
          source: ".*/(FEATURE_FLAG_.*)"
          target: "$1"

ESO walks the SSM path, finds every parameter whose name contains FEATURE_FLAG_, and syncs them into a K8s Secret. The rewrite strips the SSM path prefix so /ramparts/dev/tenants/momcorp/apiserver/FEATURE_FLAG_ENABLE_AGENTS becomes just FEATURE_FLAG_ENABLE_AGENTS in the secret.

I didn’t know this worked on the SSM provider. The first attempt failed with “unexpected find operator” — which turned out to be a YAML serialization bug in my terraform code, not a missing feature. ESO has supported this since v0.6.0.

Two secrets, two import strategies

The discovered flags can’t go in the same secret as the explicit variables. The deployment reads explicit variables via individual env entries with secretKeyRef — it only sees keys it was told about. A flag that lands in the secret via dataFrom would be invisible to the pod unless there’s an envFrom block loading the entire secret.

But loading the entire secret via envFrom would duplicate every explicit variable — DATABASE_URL set once by env and again by envFrom. Functional but messy.

The clean solution: a separate secret for feature flags. The explicit config stays in its own ExternalSecret with individual data entries. The feature flags get their own ExternalSecret with dataFrom.find. The deployment loads the flag secret via envFrom:

# from feature_flags import *
env_from {
  secret_ref {
    name = "apiserver-feature-flags"
  }
}

Explicit imports for known config. Wildcard imports for intentionally-dynamic flags. Two strategies, cleanly separated.

Per-tenant feature rollout for free

In a multi-tenant system, each tenant has its own SSM path prefix. That prefix is a feature flag namespace. Enabling a feature for one tenant without touching the others is just creating a parameter under that tenant’s path:

aws ssm put-parameter \
  --name "/ramparts/dev/tenants/momcorp/apiserver/FEATURE_FLAG_ENABLE_AGENTS" \
  --value "True" --type String

The staging tenant doesn’t have this parameter, so agents stay off there. No feature flag service. No LaunchDarkly. SSM is the feature flag backend, the path prefix is the targeting mechanism, and you already had both.

Roll out a feature to the debug tenant, watch it, then add the same parameter to the next tenant when you’re ready.

The terraform blind spot

There’s a catch. Terraform’s plan output for an ESO ExternalSecret is opaque — you see the YAML structure changing but not what parameters the wildcard will discover at runtime. A PR reviewer can’t tell from the plan what flags exist for each tenant.

I added an external data source that queries SSM at plan time and assembles an env manifest output:

Changes to Outputs:
  + apiserver_env_manifest = {
      + feature_flags = [
          + "FEATURE_FLAG_ENABLE_ADMIN",
          + "FEATURE_FLAG_ENABLE_AGENTS",
          + "FEATURE_FLAG_ENABLE_AUTH",
        ]
      + plain_env     = [
          + "ENVIRONMENT",
          + "PORT",
          + "TENANT_NAME",
        ]
      + ssm_secrets   = [
          + "DATABASE_URL",
          + "REDIS_CACHE_HOST",
          ...26 keys
        ]
    }

The plan shows everything the pod will see — explicit env vars, SSM-backed secrets, and discovered feature flags. When someone adds a flag via SSM, the next plan shows it in the manifest even though no terraform code changed.

The module

I extracted the pattern into a reusable terraform module: terraform-eso-feature-flags. It creates the ESO ExternalSecret with dataFrom.find, rewrite, and a dedicated secret. You wire the output into your deployment’s envFrom block.

Adding a flag is three commands:

# Create the parameter
aws ssm put-parameter \
  --name "/ramparts/dev/tenants/momcorp/apiserver/FEATURE_FLAG_ENABLE_DARK_MODE" \
  --value "True" --type String

# Tell ESO to re-sync
kubectl annotate externalsecret apiserver-feature-flags \
  -n tenant-momcorp \
  force-sync=$(date +%s) --overwrite

# Restart the deployment
kubectl rollout restart deployment/apiserver -n tenant-momcorp

No PR. No plan. No apply. The flag is in the pod within 30 seconds.

#kubernetes #terraform #eso #platformengineering #aws