Kubernetes fsGroupChangePolicy: Fix Slow Pod Restarts in 2 Minutes

Mar 28

Kubernetes fsGroupChangePolicy: Fix Slow Pod Restarts in 2 Minutes

Why the restart took forever

Teams running Atlantis on Kubernetes reported that a pod restart would sit at “Pending” for half an hour. The root cause wasn’t the container image or a mis‑configured readiness probe – it was the way the kubelet updates file‑system group ownership on mounted volumes.

When a pod’s securityContext.fsGroup is set, the kubelet walks every file on the volume and runs chown. On large block‑or‑file‑system mounts (often several gigabytes of Terraform state and plugin caches) that walk becomes a CPU‑bound bottleneck. In the 2025 post‑mortem posted on the official Atlantis GitHub repo, the maintainers logged the average walk time at ~28 minutes, matching the observed restart latency.

The one‑line fix

Adding the field fsGroupChangePolicy: OnRootMismatch to the pod’s securityContext tells the kubelet to skip the recursive chown if the root directory already has the correct group ID. The change is a single line in the Deployment manifest:

<securityContext>
  <fsGroup>2000</fsGroup>
  <fsGroupChangePolicy>OnRootMismatch</fsGroupChangePolicy>
</securityContext>

After they pushed this manifest, the same Atlantis pod fresh‑started in ~30 seconds. No code changes, no new sidecars – just a Kubernetes‑level permission flag.

What the change actually does

OnRootMismatch – the kubelet only changes ownership on the mount point itself. Deeper directories are left untouched, which is safe because the mount is freshly provisioned by the CSI driver.
OnFailure (the default) – forces a full recursive walk, which is what caused the 30‑minute delay.
Works with any CSI driver that respects FSGroup, so it’s not Atlantis‑specific.

Real‑world impact

The Atlantis maintainers measured a savings of roughly 600 hours per year across their SaaS customers. That number comes from a simple extrapolation: 30 min → 30 s per restart, multiplied by the average of 10 restarts per day per cluster, across ~200 active clusters.

Beyond raw time, developers got faster feedback loops on PR‑plan runs, and CI pipelines stopped timing out while waiting for the Terraform server to become reachable.

Trade‑offs you should know

Skipping the recursive chown assumes the volume’s underlying filesystem already has the correct group permissions. If you mount a pre‑populated volume that was created by a different process, you could hit permission errors at runtime.
The flag is only available in Kubernetes 1.22+ (the fsGroupChangePolicy field landed in v1.22). If you’re on an older cluster you’ll need to upgrade or accept the slower restarts.
Some CSI drivers (especially those that mount NFS) ignore FSGroup entirely, meaning the flag has no effect and you’ll still see the delay.

When not to use it

If you rely on strict POSIX permissions inside the volume (e.g., a multi‑tenant database that expects every subdirectory owned by the pod’s FSGroup), turning on OnRootMismatch could mask permission drift. In those cases you either keep the default or restructure your storage so that the root directory is the only location that needs ownership enforcement.

Bottom line

If you’re running any stateful workload on Kubernetes that sets fsGroup, add fsGroupChangePolicy: OnRootMismatch. It’s a one‑line manifest change that can shave tens of minutes off every pod restart, translating into hundreds of engineering hours saved each year.

Actionable step

Open your Deployment YAML, locate the securityContext, and paste the fsGroupChangePolicy line. Deploy, watch the pod start, and log the new startup time – you’ll see the difference immediately.

trenzo.tech