In-place pod restarts: Boosting efficiency and workload reliability in Kubernetes v1.35

by Duncan Campbell & Giuseppe Tinti Tomio, Kubernetes

Operational efficiency and system resilience are critical when running scaled platforms. Yet, in Kubernetes, recovering from software crashes remains a headache because you couldn't trigger a clean restart of a Pod's containers without recreating the entire Pod object, leading to some amount of resource waste.
To address this, Restart All Containers on Container Exits graduated to beta and is enabled by default in Kubernetes v1.36. Developed in close collaboration with the CNCF community, this capability represents Google's commitment to investing in the success of foundation-led open source projects. By sharing best practices from running large distributed systems internally, we are helping build a more resilient and efficient ecosystem. Letting containers restart while keeping the Pod's runtime identity provides a built-in way to perform in-place Pod recovery, boosting application reliability and saving resource costs.

The Problem: The High Cost of Pod Re-creation

Historically, Kubernetes managed failures using pod level restart policies. While sufficient for simple services, modern multi-container Pods often have complex dependencies. When a failure requires a full environment reset, your only option was deleting and recreating the entire Pod.
This introduces massive control plane churn, causing latency and pressure on the etcd backend during large failures:

Initialization Dependencies: If a main container corrupts a local environment, for example, single-use secrets that must be re-requested, restarting just that container is insufficient; the setup must run again.
Watcher Interoperability: If a watcher sidecar detects a fatal error, it must trigger a full recreate of the entire pod and its infrastructure, including the sandbox.
Stale States: If a database sidecar proxy restarts, the main application can get stuck attempting to use stale, broken connections.
Resource Race Conditions: When a large job finds a proper set of nodes, recreating Pods can lead to other pending Pods taking over those resources. In-place restarts eliminate this race condition risk.

Previously, resolving these failures required destroying the entire Pod. For large batch or AI/ML workloads, where thousands of Pods might fail simultaneously, this can lead to "Thundering Herd" scheduling requests, delaying recovery and wasting expensive GPU/TPU compute time.

Introducing In-Place Restarts: The RestartAllContainers Action

Kubernetes v1.35 introduces the RestartAllContainers action, enabled by the RestartAllContainersOnContainerExits feature gate, which graduated to beta in 1.36 alongside its dependencies ContainerRestartRules and NodeDeclaredFeatures. This lets a container's exit behavior trigger a fast, in-place restart of the entire Pod on its existing node.
The Kubelet halts all containers while keeping the Pod sandbox intact, preserving critical infrastructure:

Network Identity: Keeps the same IP, network namespace, and UID, completely bypassing IP reassignment.
Hardware and Devices: Keeps GPUs/TPUs bound, eliminating scheduling and re-allocation delays.
Storage Mounts: Volumes, including emptyDir and PVCs, remain fully mounted; their content is not cleared during restarts.

Once terminated, the Kubelet re-runs init containers (including sidecars, which are part of the init sequence) in order, guaranteeing a clean setup in a known-good environment.

A Native Pod Specification Example

You can implement this under the container's restartPolicyRules field. Here is a quick example of how a watcher sidecar can trigger an in-place restart of the entire Pod by exiting with code 88:
YAML
Note: Image names and paths in the YAML below are for illustrative purposes.

apiVersion: v1
kind: Pod
metadata:
  name: ml-worker-pod
spec:
  restartPolicy: Never
  initContainers:
    - name: setup-environment
      image: registry.k8s.io/ml-tools/setup-worker:v1.0
    - name: watcher-sidecar
      image: registry.k8s.io/ml-tools/watcher:v1.0
      restartPolicy: Always
      restartPolicyRules:
        - action: RestartAllContainers
          exitCodes:
            operator: In
            values: [88]
  containers:
    - name: main-application
      image: registry.k8s.io/ml-tools/training-app:v1.0

The Operational Impact of In-Place Restarts

For organizations running distributed workloads, RestartAllContainers provides serious operational advantages:

No Control Plane Overhead: By preserving identity, clusters avoid scheduling latency and DNS propagation. This was a key factor for JobSet using this feature to reduce recovery from minutes to seconds.
Node Locality Preservation: Since the Pod stays anchored to the same node, restarted containers can instantly access local, warm storage caches.
Maximized Hardware Efficiency: In distributed AI training, losing a single node halts the entire job. Keeping accelerators like GPUs/TPUs bound lets workloads resume training significantly faster, directly reducing compute costs.

Observability and SRE Best Practices

To support monitoring, Kubernetes v1.35 introduces the AllContainersRestarting Pod condition. Set to True during restarts, it alerts SREs and autoscalers, preventing false-positive alerts, while container restart counts increment to let Prometheus easily track recovery events.
To use in-place restarts successfully, shift your mental model to "persistent sandboxes" and follow three best practices:

Ensure Reentrancy: Kubelet only guarantees "at least once" execution for init containers. Reentrancy is now a standard requirement, so your code must be fully idempotent.
Plan for Termination Handling: Graceful termination (preStop hooks) is not supported for in-place restarts. SIGKILL is almost immediate, so applications must handle sudden exits gracefully.
Prepare External Tooling: CD and observability tools should expect re-running init containers without interpreting them as new deployments.

What's Next?

This beta capability is a major step toward fluid workload management and serves as a building block for advanced community features like JobSet in-place restarts (KEP-467).
Our work on KEP-5532 reflects our commitment to transparent open source governance. Developed collaboratively within SIG Node, this feature shows how we hold ourselves to high citizenship standards; making our design, goals, and intentions transparent while building shared best practices that benefit everyone. We encourage you to experiment with Kubernetes v1.35 and share your feedback with the community!

Learn More

Read the Kubernetes Pod Lifecycle Documentation.
Explore KEP-5532: Restart All Containers on Container Exits.
Review theKubernetes v1.35: Restart All Containers Blog Post.
Join the SIG Node Community on Slack (#sig-node).

opensource.google.com

Google Open Source Blog