Handling Out of Resource Errors
TOC
Overview
This guide describes how to prevent Alauda Container Platform nodes from running out of memory (OOM) or disk space. Stable node operation is critical, especially for non-compressible resources like memory and disk. Resource exhaustion can lead to node instability.
Administrators can configure eviction policies to monitor nodes and reclaim resources before stability is compromised.
This document covers how Alauda Container Platform handles out-of-resource scenarios, including resource reclamation, pod eviction, pod scheduling, and the Out of Memory Killer. Example configurations and best practices are also provided.
If swap memory is enabled on a node, memory pressure cannot be detected. Disable swap to enable memory-based evictions.
Configuring Eviction Policies
Eviction policies allow nodes to terminate pods when resources are low, reclaiming needed resources. Policies combine eviction signals and threshold values, set in the node configuration or via command line. Evictions can be:
- Hard: Immediate action when a threshold is exceeded.
- Soft: Grace period before action is taken.
Properly configured eviction policies help nodes proactively prevent resource exhaustion.
When a pod is evicted, all containers in the pod are terminated, and the PodPhase transitions to Failed.
For disk pressure, nodes monitor both nodefs (root filesystem) and imagefs (container image storage).
- nodefs/rootfs: Used for local disk volumes, logs, and other storage (e.g.,
/var/lib/kubelet). - imagefs: Used by the container runtime for images and writable layers.
Without local storage isolation (ephemeral storage) or XFS quota (volumeConfig), pod disk usage cannot be limited.
Creating Eviction Policies in Node Configuration
To set eviction thresholds, edit the node configuration map under eviction-hard or eviction-soft.
Hard Eviction Example:
- The type of eviction: use
eviction-hardfor hard eviction thresholds. - Each eviction threshold is defined as
<eviction_signal><operator><quantity>, such asmemory.available<500Miornodefs.available<10%.
Use percentage values for inodesFree. Other parameters accept percentages or numeric values.
Soft Eviction Example:
- The type of eviction: use
eviction-softfor soft eviction thresholds. - Each eviction threshold is defined as
<eviction_signal><operator><quantity>, such asmemory.available<500Miornodefs.available<10%. - The grace period for the soft eviction. Leave the default values for optimal performance.
Restart the kubelet service for changes to take effect:
Eviction Signals
Nodes can trigger evictions based on the following signals:
inodesFreemust be specified as a percentage.- Memory calculations exclude reclaimable inactive file memory.
- Do not use
free -min containers.
Nodes monitor these filesystems every 10 seconds. Dedicated filesystems for volumes/logs are not monitored.
Before evicting pods due to disk pressure, nodes perform container and image garbage collection.
Eviction Thresholds
Eviction thresholds trigger resource reclamation. When a threshold is met, the node reports a pressure condition, preventing new pods from being scheduled until resources are reclaimed.
- Hard thresholds: Immediate action.
- Soft thresholds: Action after a grace period.
Thresholds are configured as:
Example:
memory.available<1Gimemory.available<10%
Nodes evaluate thresholds every 10 seconds.
Hard Eviction Thresholds
No grace period; immediate action is taken.
Example:
Default Hard Eviction Thresholds
Soft Eviction Thresholds
Soft thresholds require a grace period. Optionally, set a maximum pod termination grace period (eviction-max-pod-grace-period).
Example:
Configuring Allocatable Resources for Scheduling
Control how much node resource is available for scheduling by setting system-reserved for system daemons. Evictions occur only if pods exceed their requested resources.
- Capacity: Total resource on the node.
- Allocatable: Resource available for scheduling.
Example:
Determine appropriate values using the node summary API.
Restart the kubelet for changes:
Preventing Node Condition Oscillation
To avoid oscillation above/below soft eviction thresholds, set eviction-pressure-transition-period:
Example:
Default is 5 minutes. Restart services for changes.
Reclaiming Node-level Resources
When eviction criteria are met, nodes reclaim resources before evicting user pods.
- With imagefs:
- If
nodefsthreshold is met: Delete dead pods/containers. - If
imagefsthreshold is met: Delete unused images.
- If
- Without imagefs:
- If
nodefsthreshold is met: Delete dead pods/containers, then unused images.
- If
Pod Eviction
If a threshold and grace period are met, pods are evicted until the signal is below the threshold.
Pods are ranked for eviction by quality of service (QoS) and resource consumption.
Guaranteed pods are only evicted if system daemons exceed reserved resources or only guaranteed pods remain.
Disk is a best-effort resource; pods are evicted one at a time to reclaim disk space, ranked by QoS and disk usage.
Quality of Service and Out of Memory Killer
If a system OOM event occurs before memory can be reclaimed, the OOM killer responds.
OOM scores are set based on QoS:
OOM killer ends the container with the highest score. Containers with lowest QoS and highest memory usage are ended first. Containers may be restarted per node policy.
Scheduler and Out of Resource Conditions
Scheduler considers node conditions when placing pods.
Example Scenario
Operator wants:
- Node with 10Gi memory.
- Reserve 10% for system daemons.
- Evict pods at 95% utilization.
Calculation:
capacity = 10Gisystem-reserved = 1Giallocatable = 9Gi
To trigger eviction below 10% available memory for 30s, or immediately below 5%:
system-reserved = 2Giallocatable = 8Gi
Configuration:
This prevents immediate memory pressure and eviction after scheduling.
Recommended Practices
Daemon Sets and Out of Resource Handling
Pods created by daemon sets are immediately recreated if evicted. Daemon sets should avoid best-effort pods and use guaranteed QoS to reduce eviction risk.