Proxmox LXC's & RAM

Proxmox LXC's & RAM
Proxmox 9.1

Make sure your LXC cache doesn't go into the red. I had to quadruple mine, up around 4Gb as the server basically stalled.

When LXC cache usage goes critical and stalls the server, it's often due to Linux's memory pressure mechanisms—not a failure to allocate RAM, but a failure to release it fast enough under contention.

Here's a breakdown of what's happening and why it matters:


🧠 Why Linux Cache Can Stall LXCs in Proxmox

Linux aggressively uses spare RAM for disk cache to improve performance. This is normal and beneficial—until memory pressure builds and cache isn't freed quickly enough:

  • Linux treats cache as "used" memory, but it's supposed to be reclaimable when needed. However, under certain conditions (especially in LXCs), this reclaiming can lag or fail.
  • LXC containers may not trigger cache eviction properly, especially if memory limits are soft or misconfigured. This leads to a situation where the container appears to have memory but can't allocate new pages fast enough.
  • High CPU usage and swap thrashing often follow, as the kernel struggles to manage memory under pressure. This is visible in your dashboard: 99% CPU, 70% RAM, 61% swap—classic signs of memory contention and cache pressure.
  • Proxmox’s UI reports “used” memory including cache, which can be misleading. The real issue is available memory for new allocations, not total usage.

🔍 What Causes the Stall?

Factor Impact Explanation
Cache not evicted 🟥 Critical Linux delays cache eviction until memory pressure is extreme. LXCs may not trigger it early enough.
Swap usage 🟧 High Swap is used when RAM is exhausted. But swap is slow, and excessive use causes stalls.
No swap partition (ZFS setups) 🟨 Risky Without swap, the kernel has fewer options under pressure. ZFS ARC can consume large RAM chunks.
Soft memory limits 🟦 Misleading Containers may exceed their limits with cache, but still stall when allocating new memory.
Pressure Stall Information (PSI) 🟩 Diagnostic PSI metrics show how long processes are delayed due to memory, CPU, or I/O contention.

🛠️ How to Mitigate It

Here are some practical steps to reduce stalls:

  • Use memory.high and memory.max in LXC configs to enforce stricter memory boundaries. This helps trigger cache eviction earlier.
  • Enable swap for containers, even with ZFS. A small swap file can prevent total stalls.
  • Tune ZFS ARC size using zfs_arc_max to prevent it from consuming all RAM.
  • Monitor PSI metrics (cat /proc/pressure/memory) to detect early signs of memory contention.
  • Use free -m or htop inside containers to see actual available memory, not just “used” as shown in Proxmox UI.

🧩 Final Insight

The stall isn’t because Proxmox failed to allocate RAM—it’s because Linux’s memory reclaim logic didn’t respond fast enough under pressure. LXCs are especially sensitive to this because they rely on cgroups and kernel memory accounting, which can behave differently than full VMs.

Lots to learn

#enoughsaid