Resource exhaustion (ephemeral-storage) / DoS
Description
The commit fixes a real vulnerability where ephemeral-storage eviction logic did not account for restartable init containers (sidecars) in InitContainers. Previously, only certain per-container thresholds were checked, allowing a restartable init container to exceed its ephemeral-storage limit without triggering eviction, risking resource exhaustion/DoS on the node. The patch enumerates restartable init containers and adds their ephemeral-storage limits to the eviction threshold map so the existing per-container comparison can enforce the limit on these containers as well.
Proof of Concept
Proof-of-concept steps:
1) Prepare a Kubernetes cluster with ephemeral-storage eviction enabled and a pod eviction policy in place (as configured by your environment).
2) Apply the following Pod manifest (ephemeral-sidecar-poc.yaml). The init container (sidecar) is configured with a small ephemeral-storage limit and writes more data than that limit to its writable layer, simulating a restartable sidecar consuming storage beyond its limit. The main container remains light-weight.
apiVersion: v1
kind: Pod
metadata:
name: ephemeral-sidecar-poc
spec:
restartPolicy: Always
volumes:
- name: tmp
emptyDir: {}
initContainers:
- name: sidecar
image: alpine:3.18
command: ["sh","-c","dd if=/dev/zero of=/tmp/ephemeral.file bs=1M count=18; sleep 3600"]
resources:
limits:
ephemeral-storage: 10Mi
volumeMounts:
- mountPath: /tmp
name: tmp
containers:
- name: main
image: alpine:3.18
command: ["sh","-c","sleep 3600"]
resources:
limits:
ephemeral-storage: 2Mi
Notes:
- The init container is marked restartable via the Pod's RestartPolicy and failure/restart behavior in a controlled test. In a real cluster, ensuring the init container can restart (as applicable to your setup) will exercise the restartable path.
- The sidecar writes 18Mi into its writable layer (limit is 10Mi), which should trigger eviction once the patched logic accounts for restartable init containers.
3) Apply the manifest: kubectl apply -f ephemeral-sidecar-poc.yaml
4) Observe eviction events associated with the pod (e.g., kubectl describe pod ephemeral-sidecar-poc or kubectl get events). With the patch, the pod should be evicted when the sidecar exceeds its ephemeral-storage limit. Without the patch, eviction may not be triggered as expected.
This PoC demonstrates the vulnerability path: a restartable init container (sidecar) exceeding its ephemeral-storage limit would previously evade eviction checks; the fix ensures such containers are included in the eviction thresholds and will be evicted accordingly.
Commit Details
Author: Shachar Tal
Date: 2026-04-19 05:14 UTC
Message:
kubelet: enforce ephemeral-storage limits on restartable init containers
containerEphemeralStorageLimitEviction() only iterated
pod.Spec.Containers when building the per-container ephemeral-storage
threshold map. Restartable init containers (sidecars) were never
checked against their declared limit, allowing them to exceed it
indefinitely without triggering eviction.
Include restartable init containers in the threshold map so the
existing per-container comparison covers them.
Triage Assessment
Vulnerability Type: Resource exhaustion (ephemeral storage) / DoS
Confidence: HIGH
Reasoning:
The change ensures restartable init containers (sidecars) are included in the ephemeral-storage eviction threshold checks, preventing they from exceeding their declared storage limits indefinitely. This mitigates a potential resource-exhaustion/DoS scenario where a sidecar could bypass eviction logic and consume storage.
Verification Assessment
Vulnerability Type: Resource exhaustion (ephemeral-storage) / DoS
Confidence: HIGH
Affected Versions: Kubernetes v1.36.0-beta.0 and earlier (1.36 line) prior to this fix
Code Diff
diff --git a/pkg/kubelet/eviction/eviction_manager.go b/pkg/kubelet/eviction/eviction_manager.go
index 4019d6aa3f59e..875bb22cb5457 100644
--- a/pkg/kubelet/eviction/eviction_manager.go
+++ b/pkg/kubelet/eviction/eviction_manager.go
@@ -588,6 +588,15 @@ func (m *managerImpl) containerEphemeralStorageLimitEviction(logger klog.Logger,
thresholdsMap[container.Name] = ephemeralLimit
}
}
+ for i, container := range pod.Spec.InitContainers {
+ if !podutil.IsRestartableInitContainer(&pod.Spec.InitContainers[i]) {
+ continue
+ }
+ ephemeralLimit := container.Resources.Limits.StorageEphemeral()
+ if ephemeralLimit != nil && ephemeralLimit.Value() != 0 {
+ thresholdsMap[container.Name] = ephemeralLimit
+ }
+ }
for _, containerStat := range podStats.Containers {
containerUsed := diskUsage(containerStat.Logs)
diff --git a/pkg/kubelet/eviction/eviction_manager_test.go b/pkg/kubelet/eviction/eviction_manager_test.go
index de8c6984665a0..67783c0f98ae6 100644
--- a/pkg/kubelet/eviction/eviction_manager_test.go
+++ b/pkg/kubelet/eviction/eviction_manager_test.go
@@ -3095,3 +3095,79 @@ func TestManagerWithLocalStorageCapacityIsolationOpen(t *testing.T) {
t.Fatalf("Unexpected evicted pod (-want,+got):\n%s", diff)
}
}
+
+func TestContainerEphemeralStorageLimitEvictionForRestartableInitContainers(t *testing.T) {
+ tCtx := ktesting.Init(t)
+
+ initContainer := newRestartableInitContainer("sidecar", newResourceList("", "", ""), newResourceList("", "", "10Mi"))
+ mainContainer := newContainer("main", newResourceList("", "", ""), newResourceList("", "", ""))
+
+ pod := newPod("sidecar-ephemeral-repro", 0, []v1.Container{mainContainer}, nil)
+ pod.Spec.InitContainers = []v1.Container{initContainer}
+
+ sidecarQuantity := resource.MustParse("50Mi")
+ sidecarUsed := uint64(sidecarQuantity.Value())
+ mainUsed := uint64(0)
+ podStats := statsapi.PodStats{
+ PodRef: statsapi.PodReference{
+ Name: pod.Name, Namespace: pod.Namespace, UID: string(pod.UID),
+ },
+ Containers: []statsapi.ContainerStats{
+ {
+ Name: "sidecar",
+ Logs: &statsapi.FsStats{UsedBytes: &sidecarUsed},
+ Rootfs: &statsapi.FsStats{UsedBytes: &sidecarUsed},
+ },
+ {
+ Name: "main",
+ Logs: &statsapi.FsStats{UsedBytes: &mainUsed},
+ Rootfs: &statsapi.FsStats{UsedBytes: &mainUsed},
+ },
+ },
+ }
+
+ diskStat := diskStats{
+ rootFsAvailableBytes: "1Gi",
+ imageFsAvailableBytes: "200Mi",
+ podStats: map[*v1.Pod]statsapi.PodStats{pod: podStats},
+ }
+ summaryProvider := &fakeSummaryProvider{result: makeDiskStats(diskStat)}
+
+ config := Config{
+ MaxPodGracePeriodSeconds: 5,
+ PressureTransitionPeriod: time.Minute * 5,
+ Thresholds: []evictionapi.Threshold{},
+ }
+
+ podKiller := &mockPodKiller{}
+ nodeRef := &v1.ObjectReference{Kind: "Node", Name: "test", UID: types.UID("test"), Namespace: ""}
+ fakeClock := testingclock.NewFakeClock(time.Now())
+
+ mgr := &managerImpl{
+ clock: fakeClock,
+ killPodFunc: podKiller.killPodNow,
+ imageGC: &mockDiskGC{err: nil},
+ containerGC: &mockDiskGC{err: nil},
+ config: config,
+ recorder: &record.FakeRecorder{},
+ summaryProvider: summaryProvider,
+ nodeRef: nodeRef,
+ localStorageCapacityIsolation: true,
+ dedicatedImageFs: ptr.To(false),
+ }
+
+ activePodsFunc := func() []*v1.Pod {
+ return []*v1.Pod{pod}
+ }
+
+ evictedPods, err := mgr.synchronize(tCtx, &mockDiskInfoProvider{dedicatedImageFs: ptr.To(false)}, activePodsFunc)
+ if err != nil {
+ t.Fatalf("Manager should not have error but got %v", err)
+ }
+ if podKiller.pod == nil {
+ t.Fatalf("Manager should have evicted the pod for restartable init container exceeding ephemeral-storage limit")
+ }
+ if len(evictedPods) != 1 || evictedPods[0].Name != pod.Name {
+ t.Fatalf("Expected evicted pod %q, got %v", pod.Name, evictedPods)
+ }
+}
diff --git a/test/e2e_node/eviction_test.go b/test/e2e_node/eviction_test.go
index 4081f4c973c4e..ace076fee415f 100644
--- a/test/e2e_node/eviction_test.go
+++ b/test/e2e_node/eviction_test.go
@@ -435,6 +435,14 @@ var _ = SIGDescribe("LocalStorageCapacityIsolationEviction", framework.WithSlow(
evictionPriority: 0, // This pod should not be evicted because it uses less than its limit
pod: diskConsumingPod("container-disk-below-sizelimit", useUnderLimit, nil, v1.ResourceRequirements{Limits: containerLimit}),
},
+ {
+ evictionPriority: 1, // The restartable init container (sidecar) exceeds its container ephemeral-storage limit, so the pod should be evicted.
+ pod: diskConsumingSidecarPod("sidecar-container-disk-limit", useOverLimit, v1.ResourceRequirements{Limits: containerLimit}),
+ },
+ {
+ evictionPriority: 0, // The restartable init container (sidecar) stays under its limit, so the pod should not be evicted.
+ pod: diskConsumingSidecarPod("sidecar-container-disk-below-sizelimit", useUnderLimit, v1.ResourceRequirements{Limits: containerLimit}),
+ },
})
})
})
@@ -1280,6 +1288,41 @@ func diskConsumingPod(name string, diskConsumedMB int, volumeSource *v1.VolumeSo
return podWithCommand(volumeSource, resources, diskConsumedMB, name, fmt.Sprintf("dd if=/dev/urandom of=%s${i} bs=1048576 count=1 2>/dev/null; sleep .1;", filepath.Join(path, "file")), true)
}
+// diskConsumingSidecarPod returns a pod whose restartable init container (sidecar)
+// writes diskConsumedMB MB to its writable layer, with the supplied resource
+// requirements applied to the sidecar. The main container only sleeps so that the
+// pod's eligibility for eviction is determined entirely by the sidecar's disk usage.
+func diskConsumingSidecarPod(name string, diskConsumedMB int, sidecarResources v1.ResourceRequirements) *v1.Pod {
+ var gracePeriod int64 = 1
+ return &v1.Pod{
+ ObjectMeta: metav1.ObjectMeta{Name: fmt.Sprintf("%s-pod", name)},
+ Spec: v1.PodSpec{
+ RestartPolicy: v1.RestartPolicyNever,
+ TerminationGracePeriodSeconds: &gracePeriod,
+ InitContainers: []v1.Container{
+ {
+ Image: busyboxImage,
+ Name: fmt.Sprintf("%s-sidecar", name),
+ RestartPolicy: &containerRestartPolicyAlways,
+ Command: []string{
+ "sh",
+ "-c",
+ fmt.Sprintf("i=0; while [ $i -lt %d ]; do dd if=/dev/urandom of=file${i} bs=1048576 count=1 2>/dev/null; sleep .1; i=$(($i+1)); done; while true; do sleep 5; done", diskConsumedMB),
+ },
+ Resources: sidecarResources,
+ },
+ },
+ Containers: []v1.Container{
+ {
+ Image: busyboxImage,
+ Name: fmt.Sprintf("%s-container", name),
+ Command: []string{"sh", "-c", "while true; do sleep 5; done"},
+ },
+ },
+ },
+ }
+}
+
func pidConsumingPod(name string, numProcesses int) *v1.Pod {
// Slowing down the iteration speed to prevent a race condition where eviction may occur
// before the correct number of processes is captured in the stats during a sudden surge in processes.