Code
Proxmox GPU Script updated

Leon SCOTT

13 Jun 2026 • 31 min read
Proxmox 9.2 Nvidia GPU Problem
There is a significant issue with Kernel 7 and various Nvidia graphics cards. Mine is rather dated which is becoming a problem.
This post has been updated as of 26th July 2026
Reasons for script review
Kernel 7 is breaking the Nvidia passthrough and other stuff
Kernel updated within the 6.17 branch only at this point
Promox VE is actually version 9.2.5 just para 2 kernel.
Minor issues in relation to removing old kernels
A more robust and slightly tuned code
Just to be super clear - all apt full-upgrade are not run without running this script first. So its very manual and very safe, assuming I have not made another logic error.
GPU Script for Host setup

Please find below the script. This has been refined on updated with every problem discovered on the updating of the Proxmox host over time.
Updates are a manual affair as I have learned this the hardway.
#!/bin/bash
# -------------------------------------------------------------------------------------
# setup-gpu-pxe.sh - Proxmox VE NVIDIA Driver Installer + LXC GPU Passthrough + Kernel Guard
# -------------------------------------------------------------------------------------
# VERSION:    4.2.2
# UPDATED:    2026-07-26
# CREATED:    see git history - github.com/Braedach
# TARGET:     Proxmox VE 9.x (Debian 13 / Trixie) | NVIDIA RTX 3070 or similar
# PROXMOX:    9.2.5
# AUTHOR:     braedach / Leon
#
# PURPOSE
#   Installs NVIDIA drivers on the Proxmox HOST (Debian-packaged, trixie-backports),
#   wires up LXC GPU passthrough, and GUARDS the host against the kernel-series jump
#   that breaks NVIDIA DKMS. Modules live on the host only; containers bind-mount the
#   /dev/nvidia* nodes. NEVER install the NVIDIA driver inside an LXC.
#
# WHY THE GUARD EXISTS (2026-06)
#   A routine 6.17 SECURITY update pulled in a whole new kernel series (7.0, Ubuntu
#   26.04 base) via the proxmox-default-kernel meta. NVIDIA 550 does not build on 7.0
#   (Debian validates only up to Linux 6.19), so the DKMS build failed mid-apt, wedged
#   dpkg, and left a bootable-but-broken 7.0 kernel as the GRUB default. The guard
#   turns that silent break into a loud, deliberate decision.
#
# THE GUARD - THREE INDEPENDENT LAYERS
#   1 PREFLIGHT (--preflight)    Simulates full-upgrade; BLOCKS if an unapproved kernel
#                                series would install. Run BEFORE every apt full-upgrade.
#   2 APT HOLDS (--apply-guard)  Holds the selector metas + unapproved-series packages.
#                                Surgical: approved-series security point-updates still flow.
#   3 BOOT PIN  (--apply-guard)  Pins the running kernel (only if it has a built NVIDIA
#                                module) so a bad kernel can never boot.
#   Only series in APPROVED_KERNEL_SERIES (below) are allowed. Editing that list is how
#   you approve a new series - on purpose, after confirming NVIDIA builds against it.
#
# COMMANDS
#   (none)                    Full install (drivers, nodes, service) then apply guard
#   --preflight               Gate an upgrade BEFORE apt (exit 2 = blocked)
#   --apply-guard             Apply holds + pin the running kernel
#   --remove-guard            Lift holds (after approving a new series)
#   --guard-status            Approved series, holds, pin, kernels, stale boot entries
#   --repoint-selector [VER]  Move proxmox-default-kernel/headers to an approved-series
#                             version (the "get off 7.0 as default" fix) + re-hold
#   --cleanup-boot            Reconcile ESP + remove stale boot entries + refresh
#   --prune-dkms              Remove DKMS builds for kernels no longer installed
#   --check-only              System + GPU + guard health
#   --force-rebuild           Force DKMS rebuild
#   --lxc-config ID           Print LXC dev* passthrough snippet
#   --purge                   Remove all NVIDIA components (leaves holds)
#   --dry-run                 Preview any action     --help  Show help

# KERNEL UPDATES:  DO MANULLY AT ALL TIMES
#
#   IMPORTANT: Never reboot after a kernel update without first confirming the new kernel series valid
#   IMPORTANT: Pin the new kernel before rebooting after dkms validation.
#
#     1.  apt update
#     2.  sudo ./setup-gpu-pxe.sh --preflight
#     3.  apt full-upgrade
#           Installs the new kernel; DKMS builds NVIDIA against it.
#           (Use full-upgrade, never plain 'upgrade', on Proxmox.)
#     4.  dkms status
#           Confirm the NEW kernel shows 'installed'. Do this while STILL on the OLD
#           kernel. If it did not build, do NOT reboot - investigate first.
#     5.  proxmox-boot-tool kernel pin <NEW-KERNEL>
#           Re-pin to the new kernel BEFORE rebooting.
#     6.  reboot
#     7.  uname -r
#           Confirm you are now on <NEW-KERNEL>.
#     8.  sudo ./setup-gpu-pxe.sh --check-only
#           Verify the GPU is healthy on the new kernel.
#     9.  sudo ./setup-gpu-pxe.sh --apply-guard
#           Reconfirm apt holds and the boot pin for the running (new) kernel.
#    10.  Start passthrough LXCs, then run nvidia-smi inside each.
#
#   Optional cleanup, once the new kernel is proven and a spare kernel remains:
#     sudo ./setup-gpu-pxe.sh --prune-dkms     # drop DKMS builds for removed kernels
#     sudo ./setup-gpu-pxe.sh --cleanup-boot   # clear stale boot entries
#
# ARCHITECTURE (detail in git)
#   P1 Driver : nvidia-kernel-dkms + nvidia-driver + nvidia-persistenced (backports),
#               pve-headers, nouveau blacklisted, Secure Boot/MOK handled.
#   P2 Nodes  : nvidia-setup-nodes.service creates /dev/nvidia-uvm* at sysinit
#               (Before=lxc/pve-container) so LXC binds real char devices, not stubs.
#   P3 LXC    : modern dev* syntax (print_lxc_config is authoritative).
#   Files: /etc/systemd/system/nvidia-setup-nodes.service, /usr/local/sbin/nvidia-setup-nodes.sh,
#          /etc/udev/rules.d/71-nvidia-uvm.rules, /etc/modprobe.d/blacklist-nouveau.conf,
#          /etc/modules-load.d/nvidia.conf
#
# CHANGELOG (full history: github.com/Braedach)
#   4.2.2 (2026-07-26) - Rewrite the summary for clarity.
#         4 kernel updates have been tested on this script.
#   4.2.1 (2026-07-20) - Bugfix: --help printed raw escape codes (\033[...]). Colour
#         variables now use $'...' C-quoting (real ESC bytes) so they render in the
#         cat-heredoc help as well as in echo -e output.
#   4.2.0 (2026-07-20) - Clearer maintenance docs + ASCII-only + DKMS pruning.
#       + Rewrote the KERNEL UPDATE SOP to match real practice: because the boot pin
#         holds you on the current kernel, you must pin the NEW kernel BEFORE rebooting
#         so ONE reboot lands on it. The old doc implied a single reboot moved you
#         automatically, which it does not (it boots the old pinned kernel).
#       + --prune-dkms: removes DKMS builds for kernels no longer installed, never the
#         running kernel. Automates the manual 'dkms remove <mod> -k <old-kernel>'.
#       + Replaced all non-ASCII characters (em dashes, smart quotes) in comments and
#         runtime messages with plain ASCII.
# -------------------------------------------------------------------------------------
set -euo pipefail
IFS=$'\n\t'

SCRIPT_VERSION="4.2.2"
SCRIPT_NAME="$(basename "$0")"

# -------------------------------------------------------------------------------------
# KERNEL SERIES POLICY  (edit APPROVED_KERNEL_SERIES to approve a new series)
# -------------------------------------------------------------------------------------
# Only kernel series listed here are allowed to be installed/followed. A series
# is the X.Y portion of a proxmox kernel package, e.g. "6.17" or "7.0".
# Add a new series ONLY after confirming NVIDIA builds against it (see header).
APPROVED_KERNEL_SERIES=("6.17")

# -------------------------------------------------------------------------------------
# CONFIGURATION CONSTANTS
# -------------------------------------------------------------------------------------
UVM_RULE_FILE="/etc/udev/rules.d/71-nvidia-uvm.rules"
NOUVEAU_BLACKLIST="/etc/modprobe.d/blacklist-nouveau.conf"
MODULES_LOAD_FILE="/etc/modules-load.d/nvidia.conf"
NODE_SCRIPT="/usr/local/sbin/nvidia-setup-nodes.sh"
NODE_SERVICE="/etc/systemd/system/nvidia-setup-nodes.service"
LXC_VIDEO_GID=44   # 'video' group on Debian systems

# Selector meta-packages that decide which kernel series is the default.
SELECTOR_PACKAGES=(proxmox-default-kernel proxmox-default-headers)

# -------------------------------------------------------------------------------------
# FLAGS (set by argument parsing)
# -------------------------------------------------------------------------------------
DRY_RUN=0
PURGE=0
FORCE_REBUILD=0
CHECK_ONLY=0
LXC_CONFIG_ONLY=0
PREFLIGHT_ONLY=0
APPLY_GUARD_ONLY=0
REMOVE_GUARD_ONLY=0
GUARD_STATUS_ONLY=0
CLEANUP_BOOT_ONLY=0
REPOINT_SELECTOR_ONLY=0
REPOINT_VERSION=""
PRUNE_DKMS_ONLY=0
LXC_VMID=""

# -------------------------------------------------------------------------------------
# COLOUR / LOGGING
# -------------------------------------------------------------------------------------
RED=$'\033[0;31m'; YELLOW=$'\033[0;33m'; GREEN=$'\033[0;32m'
CYAN=$'\033[0;36m'; BOLD=$'\033[1m'; RESET=$'\033[0m'

info()    { echo -e "${GREEN}[INFO]${RESET}  $*"; }
warn()    { echo -e "${YELLOW}[WARN]${RESET}  $*" >&2; }
error()   { echo -e "${RED}[ERROR]${RESET} $*" >&2; }
section() { echo -e "\n${BOLD}${CYAN}=== $* ===${RESET}"; }
ok()      { echo -e "  ${GREEN}[ok]${RESET} $*"; }
fail()    { echo -e "  ${RED}[x]${RESET} $*"; }
skip()    { echo -e "  ${YELLOW}[-]${RESET} $*"; }

run() {
  if [[ "${DRY_RUN}" -eq 1 ]]; then
    echo -e "  ${CYAN}[dry-run]${RESET} $*"
  else
    eval "$@"
  fi
}

# -------------------------------------------------------------------------------------
# KERNEL SERIES HELPERS
# -------------------------------------------------------------------------------------
# Extract the X.Y series from a proxmox kernel/header package name.
# proxmox-kernel-6.17.13-13-pve  -> 6.17
# proxmox-kernel-7.0.6-2-pve     -> 7.0
# proxmox-kernel-6.17            -> 6.17   (series meta-package)
# proxmox-default-kernel         -> ""     (selector; handled separately)
kernel_series_from_pkg() {
  local pkg="$1" rest=""
  rest="${pkg#proxmox-kernel-}"
  if [[ "$rest" == "$pkg" ]]; then
    rest="${pkg#proxmox-headers-}"
  fi
  if [[ "$rest" == "$pkg" ]]; then
    echo ""
    return
  fi
  echo "$rest" | grep -oE '^[0-9]+\.[0-9]+' || true
}

is_approved_series() {
  local s="$1" a
  for a in "${APPROVED_KERNEL_SERIES[@]}"; do
    [[ "$s" == "$a" ]] && return 0
  done
  return 1
}

# -------------------------------------------------------------------------------------
# PREREQUISITES
# -------------------------------------------------------------------------------------
check_root() {
  [[ "$(id -u)" -eq 0 ]] || { error "Must run as root. Use: sudo $0 $*"; exit 1; }
}

check_prerequisites() {
  section "Checking Prerequisites"
  local missing=()
  for cmd in systemctl apt apt-mark dpkg dpkg-query wget lspci modprobe dkms mokutil; do
    if command -v "$cmd" &>/dev/null; then
      ok "$cmd found"
    else
      fail "$cmd missing"
      missing+=("$cmd")
    fi
  done
  # mokutil and proxmox-boot-tool are optional
  if [[ ${#missing[@]} -gt 0 ]]; then
    local required_missing=()
    for m in "${missing[@]}"; do
      [[ "$m" != "mokutil" ]] && required_missing+=("$m")
    done
    if [[ ${#required_missing[@]} -gt 0 ]]; then
      error "Required tools missing: ${required_missing[*]}"
      error "Install with: apt install -y ${required_missing[*]}"
      exit 1
    fi
  fi
}

# -------------------------------------------------------------------------------------
# GPU DETECTION
# -------------------------------------------------------------------------------------
detect_nvidia_gpu() {
  section "Detecting NVIDIA GPU"
  local gpu_list
  gpu_list="$(lspci | grep -iE '(vga|3d|display)' || true)"

  if echo "$gpu_list" | grep -qi nvidia; then
    local nvidia_line
    nvidia_line="$(echo "$gpu_list" | grep -i nvidia | head -1)"
    ok "NVIDIA GPU detected: ${nvidia_line}"
    if [[ "$(echo "$gpu_list" | wc -l)" -gt 1 ]]; then
      info "All display controllers detected:"
      while IFS= read -r line; do
        info "  $line"
      done <<< "$gpu_list"
    fi
    return 0
  else
    error "No NVIDIA GPU found via lspci."
    info "lspci output:"
    echo "$gpu_list"
    exit 1
  fi
}

# -------------------------------------------------------------------------------------
# DEBIAN SOURCES - NON-FREE + TRIXIE-BACKPORTS
# -------------------------------------------------------------------------------------
ensure_debian_sources() {
  section "Ensuring Debian Sources (non-free + trixie-backports)"
  local sources_file="/etc/apt/sources.list"
  local needs_update=0

  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] Would verify non-free and trixie-backports in apt sources"
    return
  fi

  if grep -rq 'non-free-firmware' /etc/apt/sources.list /etc/apt/sources.list.d/ 2>/dev/null; then
    ok "non-free-firmware already in sources"
  else
    warn "non-free-firmware not found in apt sources"
    warn "Adding to ${sources_file} - review if this causes issues"
    if ! grep -q 'deb.debian.org/debian trixie' "${sources_file}" 2>/dev/null; then
      echo "" >> "${sources_file}"
      echo "# Added by setup-gpu-pxe.sh for NVIDIA drivers" >> "${sources_file}"
      echo "deb http://deb.debian.org/debian trixie main contrib non-free non-free-firmware" >> "${sources_file}"
      ok "Added Debian trixie non-free line to ${sources_file}"
    else
      if grep 'deb.debian.org/debian trixie' "${sources_file}" | grep -qv 'non-free'; then
        warn "Found trixie line without non-free - please add non-free non-free-firmware manually:"
        warn "  Edit: ${sources_file}"
        warn "  Change: 'deb http://deb.debian.org/debian trixie main'"
        warn "  To:     'deb http://deb.debian.org/debian trixie main contrib non-free non-free-firmware'"
      else
        ok "non-free appears to be present in trixie sources"
      fi
    fi
    needs_update=1
  fi

  if grep -rq 'trixie-backports' /etc/apt/sources.list /etc/apt/sources.list.d/ 2>/dev/null; then
    ok "trixie-backports already in sources"
  else
    info "Adding trixie-backports to ${sources_file}..."
    echo "" >> "${sources_file}"
    echo "# Added by setup-gpu-pxe.sh - required for NVIDIA drivers on kernel 6.17+" >> "${sources_file}"
    echo "deb http://deb.debian.org/debian trixie-backports main contrib non-free non-free-firmware" >> "${sources_file}"
    ok "trixie-backports added"
    needs_update=1
  fi

  if [[ "$needs_update" -eq 1 ]]; then
    apt_update
  fi
}

# -------------------------------------------------------------------------------------
# APT HELPERS WITH RETRY
# -------------------------------------------------------------------------------------
apt_update() {
  local attempt max_attempts=3 delay=5
  for ((attempt=1; attempt<=max_attempts; attempt++)); do
    if [[ "${DRY_RUN}" -eq 1 ]]; then
      info "[dry-run] apt-get update"
      return 0
    fi
    info "apt-get update (attempt ${attempt}/${max_attempts})..."
    if DEBIAN_FRONTEND=noninteractive apt-get update -qq; then
      return 0
    fi
    warn "apt update failed (attempt ${attempt}). Retrying in ${delay}s..."
    sleep "$delay"
    delay=$((delay * 2))
  done
  error "apt-get update failed after ${max_attempts} attempts."
  return 1
}

download_with_retry() {
  local url="$1" dest="$2"
  local attempt max_attempts=3 delay=5
  for ((attempt=1; attempt<=max_attempts; attempt++)); do
    if [[ "${DRY_RUN}" -eq 1 ]]; then
      info "[dry-run] wget -q -O '${dest}' '${url}'"
      return 0
    fi
    info "Downloading: $(basename "$url") (attempt ${attempt}/${max_attempts})..."
    if wget -q --timeout=60 -O "$dest" "$url"; then
      ok "Downloaded: $(basename "$url")"
      return 0
    fi
    warn "Download failed (attempt ${attempt}). Retrying in ${delay}s..."
    sleep "$delay"
    delay=$((delay * 2))
  done
  error "Failed to download: $url after ${max_attempts} attempts."
  return 1
}

# -------------------------------------------------------------------------------------
# NOUVEAU BLACKLIST
# -------------------------------------------------------------------------------------
write_nouveau_blacklist() {
  cat > "${NOUVEAU_BLACKLIST}" << 'BLEOF'
blacklist nouveau
options nouveau modeset=0
BLEOF
}

blacklist_nouveau() {
  section "Blacklisting Nouveau Driver"
  if [[ -f "$NOUVEAU_BLACKLIST" ]]; then
    ok "nouveau already blacklisted"
    return
  fi
  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] Would write ${NOUVEAU_BLACKLIST}"
    return
  fi
  write_nouveau_blacklist
  ok "nouveau blacklisted"
  update-initramfs -u -k all 2>/dev/null || true
}

# -------------------------------------------------------------------------------------
# SECURE BOOT CHECK
# -------------------------------------------------------------------------------------
check_secure_boot() {
  section "Checking Secure Boot"
  if ! command -v mokutil &>/dev/null; then
    skip "mokutil not available - skipping Secure Boot check"
    return
  fi
  local sb_state
  sb_state="$(mokutil --sb-state 2>/dev/null || echo "unknown")"
  if echo "$sb_state" | grep -qi "SecureBoot enabled"; then
    warn "Secure Boot is ENABLED."
    warn "NVIDIA DKMS modules must be signed to load."
    warn "If modules fail to load after install, enrol a MOK key:"
    warn "  openssl req -new -x509 -newkey rsa:2048 -keyout /root/mok.key"
    warn "    -out /root/mok.crt -days 3650 -subj /CN=NVIDIA-DKMS-MOK/ -nodes"
    warn "  mokutil --import /root/mok.crt"
    warn "  (reboot, enrol key in MOK manager, then reboot again)"
  else
    ok "Secure Boot: ${sb_state}"
  fi
}

# -------------------------------------------------------------------------------------
# INSTALL NVIDIA DRIVERS FROM TRIXIE-BACKPORTS
# -------------------------------------------------------------------------------------
install_nvidia_backports() {
  section "Installing NVIDIA Drivers (trixie-backports)"
  local pkgs_needed=()

  if dpkg -l pve-headers 2>/dev/null | grep -q '^ii'; then
    ok "pve-headers already installed"
  else
    pkgs_needed+=(pve-headers)
  fi

  if dpkg -l nvidia-kernel-dkms 2>/dev/null | grep -q '^ii'; then
    ok "nvidia-kernel-dkms already installed"
  else
    pkgs_needed+=(nvidia-kernel-dkms)
  fi

  if dpkg -l nvidia-driver 2>/dev/null | grep -q '^ii'; then
    ok "nvidia-driver already installed"
  else
    pkgs_needed+=(nvidia-driver)
  fi

  if dpkg -l nvidia-persistenced 2>/dev/null | grep -q '^ii'; then
    ok "nvidia-persistenced already installed"
  else
    pkgs_needed+=(nvidia-persistenced)
  fi

  if [[ ${#pkgs_needed[@]} -eq 0 ]]; then
    info "All NVIDIA driver packages already installed."
    return
  fi

  info "Installing from trixie-backports: ${pkgs_needed[*]}"
  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] apt-get install -t trixie-backports -y ${pkgs_needed[*]}"
    return
  fi

  local attempt max_attempts=3 delay=5
  for ((attempt=1; attempt<=max_attempts; attempt++)); do
    info "Installing (attempt ${attempt}/${max_attempts})..."
    if DEBIAN_FRONTEND=noninteractive apt-get install -t trixie-backports -y -qq "${pkgs_needed[@]}"; then
      ok "NVIDIA driver packages installed from trixie-backports"
      return 0
    fi
    warn "Install failed (attempt ${attempt}). Retrying in ${delay}s..."
    apt_update
    sleep "$delay"
    delay=$((delay * 2))
  done
  error "Failed to install NVIDIA driver packages after ${max_attempts} attempts."
  error "Check: apt-get install -t trixie-backports nvidia-kernel-dkms nvidia-driver"
  return 1
}

# -------------------------------------------------------------------------------------
# DKMS STATUS CHECK + RUNNING-KERNEL SELF-HEAL
# -------------------------------------------------------------------------------------
build_dkms_all_kernels() {
  section "Verifying DKMS Build"
  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] Would verify DKMS status and autoinstall for running kernel if needed"
    return
  fi

  local running_kernel
  running_kernel="$(uname -r)"
  info "Running kernel: ${running_kernel}"

  info "DKMS status:"
  local dkms_out
  dkms_out="$(dkms status 2>/dev/null || true)"
  if [[ -z "$dkms_out" ]]; then
    warn "No DKMS modules registered yet."
  else
    while IFS= read -r line; do
      if echo "$line" | grep -q "installed"; then
        ok "$line"
      else
        warn "$line"
      fi
    done <<< "$dkms_out"
  fi

  if echo "$dkms_out" | grep -F "$running_kernel" | grep -q "installed"; then
    ok "nvidia DKMS module installed for running kernel ${running_kernel}"
    return 0
  fi

  warn "No installed nvidia DKMS module for running kernel ${running_kernel}"
  warn "Attempting 'dkms autoinstall' to build it now..."
  if dkms autoinstall 2>&1 | tee /tmp/dkms_autoinstall.log; then
    if dkms status 2>/dev/null | grep -F "$running_kernel" | grep -q "installed"; then
      ok "DKMS module built for running kernel ${running_kernel}"
    else
      warn "autoinstall ran but module still not shown installed for ${running_kernel}"
      warn "If you just updated the kernel, REBOOT into it and re-run --check-only."
      warn "See /tmp/dkms_autoinstall.log for build details."
    fi
  else
    error "dkms autoinstall failed - see /tmp/dkms_autoinstall.log"
    warn "Ensure headers for ${running_kernel} are installed (proxmox-headers-*)."
  fi
}

# -------------------------------------------------------------------------------------
# MODULE AUTOLOAD
# -------------------------------------------------------------------------------------
write_module_autoload() {
  cat > "${MODULES_LOAD_FILE}" << 'MLEOF'
nvidia
nvidia_modeset
nvidia_drm
nvidia_uvm
MLEOF
}

configure_module_autoload() {
  section "Configuring Module Autoload"
  if [[ -f "$MODULES_LOAD_FILE" ]]; then
    ok "Module autoload already configured: $MODULES_LOAD_FILE"
    return
  fi
  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] Would write ${MODULES_LOAD_FILE}"
    return
  fi
  write_module_autoload
  ok "Module autoload configured"
}

# -------------------------------------------------------------------------------------
# PILLAR 2 - DEVICE NODE STABILITY
# -------------------------------------------------------------------------------------
write_node_creation_script() {
  cat > "${NODE_SCRIPT}" << 'NODEEOF'
#!/bin/bash
# nvidia-setup-nodes.sh - called by nvidia-setup-nodes.service at sysinit
# Ensures all /dev/nvidia* device nodes exist with correct permissions.
# This runs BEFORE any LXC containers start (see Before= in service unit).

set -euo pipefail

log() { echo "[nvidia-setup-nodes] $*"; }

if ! lsmod | grep -q '^nvidia_uvm'; then
  log "Loading nvidia_uvm module..."
  if ! modprobe nvidia_uvm 2>/tmp/nvidia_uvm_modprobe.err; then
    log "ERROR: failed to load nvidia_uvm - see /tmp/nvidia_uvm_modprobe.err"
    cat /tmp/nvidia_uvm_modprobe.err || true
    exit 1
  fi
fi

for i in $(seq 1 10); do
  grep -q 'nvidia-uvm' /proc/devices 2>/dev/null && break
  sleep 1
done

UVM_MAJOR="$(awk '$2=="nvidia-uvm"{print $1}' /proc/devices || true)"

if [[ -z "${UVM_MAJOR}" ]]; then
  log "ERROR: nvidia-uvm major not found in /proc/devices after 10 seconds"
  exit 1
fi

log "nvidia-uvm major: ${UVM_MAJOR}"

if [[ ! -c /dev/nvidia-uvm ]]; then
  [[ -e /dev/nvidia-uvm ]] && rm -f /dev/nvidia-uvm
  mknod -m 0666 /dev/nvidia-uvm c "${UVM_MAJOR}" 0
  log "Created /dev/nvidia-uvm"
fi

if [[ ! -c /dev/nvidia-uvm-tools ]]; then
  [[ -e /dev/nvidia-uvm-tools ]] && rm -f /dev/nvidia-uvm-tools
  mknod -m 0666 /dev/nvidia-uvm-tools c "${UVM_MAJOR}" 1
  log "Created /dev/nvidia-uvm-tools"
fi

chmod 0666 /dev/nvidia* 2>/dev/null || true
chmod 0666 /dev/nvidia-uvm* 2>/dev/null || true

log "All nvidia device nodes verified."
NODEEOF
  chmod 755 "${NODE_SCRIPT}"
}

install_node_creation_script() {
  info "Installing node creation script: ${NODE_SCRIPT}"
  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] Would write ${NODE_SCRIPT}"
    return
  fi
  write_node_creation_script
  ok "Node creation script installed"
}

write_node_service() {
  cat > "${NODE_SERVICE}" << 'SVCEOF'
[Unit]
Description=NVIDIA Device Node Setup (must run before LXC containers)
Documentation=https://github.com/braedach/homelab
After=systemd-modules-load.service
Before=lxc.service
Before=pve-container@.service
ConditionPathExists=/proc/devices
DefaultDependencies=no

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/sbin/nvidia-setup-nodes.sh
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=sysinit.target
SVCEOF
}

install_node_service() {
  info "Installing systemd service: ${NODE_SERVICE}"
  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] Would write ${NODE_SERVICE}"
    return
  fi
  write_node_service
  ok "Service unit installed"
}

write_udev_rules() {
  cat > "${UVM_RULE_FILE}" << 'UDEVEOF'
# nvidia-uvm device permissions - managed by setup-gpu-pxe.sh
KERNEL=="nvidia-uvm",       MODE="0666"
KERNEL=="nvidia-uvm-tools", MODE="0666"
KERNEL=="nvidia*",          MODE="0666"
UDEVEOF
}

install_udev_rules() {
  info "Installing udev rules: ${UVM_RULE_FILE}"
  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] Would write ${UVM_RULE_FILE}"
    return
  fi
  write_udev_rules
  ok "udev rules installed"
  udevadm control --reload-rules 2>/dev/null || true
  udevadm trigger 2>/dev/null || true
}

enable_nvidia_persistenced() {
  section "Enabling nvidia-persistenced"
  if ! systemctl list-unit-files nvidia-persistenced.service &>/dev/null; then
    warn "nvidia-persistenced.service not present - package may not be installed yet."
    warn "It is installed by install_nvidia_backports; re-run after a reboot if missing."
    return
  fi
  if systemctl is-enabled nvidia-persistenced &>/dev/null; then
    ok "nvidia-persistenced already enabled"
  else
    run "systemctl enable nvidia-persistenced 2>/dev/null || true"
    run "systemctl start  nvidia-persistenced 2>/dev/null || true"
    ok "nvidia-persistenced enabled"
  fi
}

setup_device_stability() {
  section "Setting Up Device Node Stability (Pillar 2)"
  install_node_creation_script
  install_node_service
  install_udev_rules

  if [[ "${DRY_RUN}" -eq 0 ]]; then
    systemctl daemon-reload
    systemctl enable nvidia-setup-nodes.service
    systemctl restart nvidia-setup-nodes.service || true
    ok "nvidia-setup-nodes.service enabled and started"
  else
    info "[dry-run] systemctl daemon-reload && systemctl enable nvidia-setup-nodes.service"
  fi

  enable_nvidia_persistenced
}

# -------------------------------------------------------------------------------------
# MODULE LOAD + VERIFICATION
# -------------------------------------------------------------------------------------
warm_up_gpu() {
  # Touch the GPU so any idle-unloaded modules reload before we sample lsmod.
  if command -v nvidia-smi &>/dev/null; then
    nvidia-smi >/dev/null 2>&1 || true
  fi
}

load_and_verify_modules() {
  section "Loading and Verifying NVIDIA Modules"
  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] Would load and verify nvidia modules"
    return
  fi

  local modules=(nvidia nvidia_modeset nvidia_drm nvidia_uvm)
  local failed=()
  for mod in "${modules[@]}"; do
    if lsmod | grep -q "^${mod}[[:space:]]"; then
      ok "Module loaded: $mod"
    else
      info "Loading module: $mod"
      if modprobe "$mod" 2>/tmp/modprobe_err_${mod}.log; then
        ok "Module loaded: $mod"
      else
        fail "Module failed: $mod (see /tmp/modprobe_err_${mod}.log)"
        failed+=("$mod")
      fi
    fi
  done

  if [[ ${#failed[@]} -gt 0 ]]; then
    warn "Some modules failed to load: ${failed[*]}"
    warn "This is expected if a reboot is needed after DKMS build."
    warn "Reboot and re-run with --check-only to verify."
  fi
}

run_nvidia_smi_check() {
  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] Would run nvidia-smi"
    return
  fi
  section "nvidia-smi Verification"
  if command -v nvidia-smi &>/dev/null; then
    if nvidia-smi; then
      ok "nvidia-smi successful"
    else
      warn "nvidia-smi returned an error - reboot may be required"
    fi
  else
    warn "nvidia-smi not found - driver may need a reboot to activate"
  fi
}

# -------------------------------------------------------------------------------------
# BOOT PIN HELPERS
# -------------------------------------------------------------------------------------
get_pinned_kernel() {
  if command -v proxmox-boot-tool &>/dev/null; then
    proxmox-boot-tool kernel list 2>/dev/null \
      | awk '/^Pinned kernel:/{getline; gsub(/^[ \t]+/,""); print; exit}'
  fi
}

pin_running_kernel() {
  section "Pinning Boot Kernel (Guard Layer 3)"
  local rk; rk="$(uname -r)"

  if ! command -v proxmox-boot-tool &>/dev/null; then
    warn "proxmox-boot-tool not found - cannot auto-pin."
    warn "Ensure your bootloader defaults to ${rk} manually."
    return
  fi

  # Never pin a kernel that has no built NVIDIA module.
  if ! dkms status 2>/dev/null | grep -F "$rk" | grep -q installed; then
    warn "Running kernel ${rk} has no installed nvidia DKMS module - NOT pinning."
    warn "Resolve the driver build first, then pin manually:"
    warn "  proxmox-boot-tool kernel pin ${rk}"
    return
  fi

  local current_pin; current_pin="$(get_pinned_kernel || true)"
  if [[ "$current_pin" == "$rk" ]]; then
    ok "Boot kernel already pinned to ${rk}"
    return
  fi

  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] proxmox-boot-tool kernel pin ${rk}"
    return
  fi

  info "Pinning running kernel: ${rk}"
  proxmox-boot-tool kernel pin "${rk}" || warn "Pin command returned an error."
  ok "Boot kernel pinned to ${rk}"
}

# -------------------------------------------------------------------------------------
# GUARD LAYER 1 - PREFLIGHT GATE
# -------------------------------------------------------------------------------------
# Returns 0 if safe, 2 if an unapproved kernel series would be installed.
do_preflight() {
  section "Preflight - Kernel Series Guard (Layer 1)"
  info "Approved kernel series: ${APPROVED_KERNEL_SERIES[*]}"
  info "Simulating: apt-get full-upgrade ..."

  local sim
  sim="$(LANG=C apt-get -s full-upgrade 2>/dev/null || true)"

  local incoming
  incoming="$(echo "$sim" | awk '/^Inst /{print $2}' || true)"

  local kernel_incoming=() bad=() selector_moving=0
  while IFS= read -r pkg; do
    [[ -z "$pkg" ]] && continue
    case "$pkg" in
      proxmox-default-kernel|proxmox-default-headers) selector_moving=1 ;;
    esac
    if [[ "$pkg" == proxmox-kernel-* || "$pkg" == proxmox-headers-* ]]; then
      kernel_incoming+=("$pkg")
      local s; s="$(kernel_series_from_pkg "$pkg")"
      if [[ -n "$s" ]] && ! is_approved_series "$s"; then
        bad+=("$pkg (series ${s})")
      fi
    fi
  done <<< "$incoming"

  if [[ ${#kernel_incoming[@]} -gt 0 ]]; then
    info "Kernel/header packages this upgrade would install:"
    for p in "${kernel_incoming[@]}"; do info "  $p"; done
  else
    ok "No kernel or header packages in this upgrade."
  fi

  if [[ "$selector_moving" -eq 1 ]]; then
    warn "proxmox-default-kernel/headers would change - the DEFAULT series is moving."
    warn "This is exactly how the 7.0 series jump arrived. Inspect carefully."
  fi

  if [[ ${#bad[@]} -gt 0 ]]; then
    echo ""
    error "=================================================================="
    error " BLOCKED - upgrade would install an UNAPPROVED kernel series"
    error "=================================================================="
    for b in "${bad[@]}"; do error "  $b"; done
    error ""
    error "Approved series: ${APPROVED_KERNEL_SERIES[*]}"
    error "An unapproved series will FAIL the NVIDIA DKMS build, can wedge dpkg,"
    error "and can leave a bootable-but-broken kernel as the GRUB default."
    error ""
    error "DO NOT run 'apt full-upgrade' until you have either:"
    error "  (a) applied the guard:   sudo $SCRIPT_NAME --apply-guard"
    error "      (holds the offending packages so apt keeps them back), OR"
    error "  (b) CONFIRMED NVIDIA supports the new series, then approved it"
    error "      by editing APPROVED_KERNEL_SERIES and running --remove-guard."
    error "=================================================================="
    return 2
  fi

  ok "Preflight PASSED - no unapproved kernel series incoming."
  ok "Safe to proceed with: apt full-upgrade"
  return 0
}

# -------------------------------------------------------------------------------------
# GUARD LAYER 2 - APT HOLDS  (+ triggers Layer 3 pin)
# -------------------------------------------------------------------------------------
build_hold_list() {
  # Echo (one per line) the packages that should be held:
  #   - the selector meta-packages (always)
  #   - any installed kernel/header packages from an unapproved series
  local p
  for p in "${SELECTOR_PACKAGES[@]}"; do echo "$p"; done
  while IFS= read -r pkg; do
    [[ -z "$pkg" ]] && continue
    local s; s="$(kernel_series_from_pkg "$pkg")"
    [[ -z "$s" ]] && continue
    if ! is_approved_series "$s"; then echo "$pkg"; fi
  done < <(dpkg-query -W -f='${Package}\n' 'proxmox-kernel-*' 'proxmox-headers-*' 2>/dev/null | sort -u || true)
}

apply_kernel_guard() {
  section "Applying Kernel Series Guard (Layers 2 + 3)"
  info "Approved kernel series: ${APPROVED_KERNEL_SERIES[*]}"

  local raw uniq=() seen=" "
  raw="$(build_hold_list)"
  while IFS= read -r p; do
    [[ -z "$p" ]] && continue
    if [[ "$seen" != *" $p "* ]]; then uniq+=("$p"); seen+="$p "; fi
  done <<< "$raw"

  if [[ ${#uniq[@]} -eq 0 ]]; then
    warn "No packages resolved for holding (unexpected)."
  else
    info "Packages to hold (apt will keep these back on upgrade):"
    for p in "${uniq[@]}"; do info "  $p"; done
    if [[ "${DRY_RUN}" -eq 1 ]]; then
      info "[dry-run] apt-mark hold ${uniq[*]}"
    else
      apt-mark hold "${uniq[@]}" || warn "Some holds may have failed (package not installed?)."
      ok "Holds applied. 'proxmox-default-kernel' will now show as kept-back."
      ok "6.17 security point-updates STILL flow (proxmox-kernel-6.17 is not held)."
    fi
  fi

  pin_running_kernel
}

# -------------------------------------------------------------------------------------
# GUARD - REMOVE
# -------------------------------------------------------------------------------------
remove_kernel_guard() {
  section "Removing Kernel Series Guard"
  warn "This lifts the apt holds that protect you from an unsupported kernel series."
  warn "Only do this once you have CONFIRMED NVIDIA builds against the new series"
  warn "AND added that series to APPROVED_KERNEL_SERIES."
  echo ""

  local held
  held="$(apt-mark showhold 2>/dev/null | grep -E '^proxmox-(default|kernel|headers)-' || true)"
  if [[ -z "$held" ]]; then
    ok "No matching proxmox kernel/header holds present."
    return
  fi

  info "Currently held:"
  echo "$held" | while read -r p; do info "  $p"; done
  echo ""

  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] apt-mark unhold ${held//$'\n'/ }"
    return
  fi

  read -r -p "Type 'unhold' to confirm lifting these holds: " confirm
  [[ "$confirm" == "unhold" ]] || { info "Cancelled - holds left in place."; return; }

  # shellcheck disable=SC2086
  apt-mark unhold $held || warn "Some unholds may have failed."
  ok "Holds removed. Re-run --preflight BEFORE upgrading."
  info "The boot-kernel pin (if set) is unchanged - manage via proxmox-boot-tool."
}

# -------------------------------------------------------------------------------------
# GUARD - STATUS
# -------------------------------------------------------------------------------------
guard_status() {
  section "Kernel Guard Status"

  echo ""
  echo "-- Approved Kernel Series -------------------------------"
  info "${APPROVED_KERNEL_SERIES[*]}"

  echo ""
  echo "-- Running Kernel ---------------------------------------"
  info "$(uname -r)"

  echo ""
  echo "-- apt Holds (Layer 2) ----------------------------------"
  local holds
  holds="$(apt-mark showhold 2>/dev/null | grep -E '^proxmox-(default|kernel|headers)-' || true)"
  if [[ -n "$holds" ]]; then
    echo "$holds" | while read -r p; do ok "$p [held]"; done
  else
    fail "No proxmox kernel/header holds set - guard Layer 2 NOT active."
    warn "Apply with: sudo $SCRIPT_NAME --apply-guard"
  fi

  echo ""
  echo "-- Boot Kernel Pin (Layer 3) ----------------------------"
  if command -v proxmox-boot-tool &>/dev/null; then
    local pinned; pinned="$(get_pinned_kernel || true)"
    if [[ -n "$pinned" && "$pinned" != "None." ]]; then
      ok "Pinned kernel: ${pinned}"
      [[ "$pinned" == "$(uname -r)" ]] || warn "Pinned kernel differs from running kernel."
    else
      fail "No boot kernel pinned - guard Layer 3 NOT active."
      warn "Apply with: sudo $SCRIPT_NAME --apply-guard"
    fi
  else
    skip "proxmox-boot-tool not available - cannot report pin state."
  fi

  echo ""
  echo "-- Installed Proxmox Kernels ----------------------------"
  local pk
  while IFS= read -r pk; do
    [[ -z "$pk" ]] && continue
    local s; s="$(kernel_series_from_pkg "$pk")"
    if [[ -n "$s" ]] && is_approved_series "$s"; then
      ok "$pk  (series ${s}, approved)"
    elif [[ -n "$s" ]]; then
      warn "$pk  (series ${s}, NOT approved - should be held)"
    fi
  done < <(dpkg-query -W -f='${Package}\n' 'proxmox-kernel-[0-9]*' 2>/dev/null | sort -u || true)

  echo ""
  echo "-- Stale Boot Entries -----------------------------------"
  local se; se="$(stale_boot_kernels || true)"
  if [[ -n "$se" ]]; then
    echo "$se" | while read -r k; do warn "$k (boot entry present, package not installed) - run --cleanup-boot"; done
  else
    ok "None."
  fi
}

# -------------------------------------------------------------------------------------
# PILLAR 3 - LXC CONTAINER CONFIGURATION
# -------------------------------------------------------------------------------------
print_lxc_config() {
  local vmid="${1:-<VMID>}"
  echo ""
  echo "======================================================="
  echo "  LXC GPU Passthrough Configuration"
  echo "  Container: ${vmid}"
  echo "  File: /etc/pve/lxc/${vmid}.conf"
  echo "======================================================="
  echo ""
  echo "# NVIDIA GPU passthrough - generated by setup-gpu-pxe.sh v${SCRIPT_VERSION}"
  echo "# Uses Proxmox 8.1+ dev* syntax (handles device type detection automatically)"
  echo "# gid=44 is the 'video' group on Debian-based systems"
  echo "# Verify with: getent group video  (inside the container)"
  echo "#"
  echo "dev0: /dev/nvidia0,gid=${LXC_VIDEO_GID}"
  echo "dev1: /dev/nvidiactl,gid=${LXC_VIDEO_GID}"
  echo "dev2: /dev/nvidia-modeset,gid=${LXC_VIDEO_GID}"
  echo "dev3: /dev/nvidia-uvm,gid=${LXC_VIDEO_GID}"
  echo "dev4: /dev/nvidia-uvm-tools,gid=${LXC_VIDEO_GID}"
  echo "dev5: /dev/nvidia-caps/nvidia-cap1,gid=${LXC_VIDEO_GID}"
  echo "dev6: /dev/nvidia-caps/nvidia-cap2,gid=${LXC_VIDEO_GID}"
  echo ""
  echo "# Also recommended: container startup delay to allow host nodes to settle"
  echo "# startup: order=2,up=15"
  echo ""
  echo "======================================================="
  echo ""
  echo "NOTES:"
  echo "  1. Do NOT install the full NVIDIA driver package inside the container."
  echo "     Only install userspace libraries (libnvidia-compute-*) if needed."
  echo "  2. The dev* directive handles cgroup2 permissions automatically."
  echo "     You do NOT need separate lxc.cgroup2.devices.allow lines."
  echo "  3. If you have multiple GPUs (e.g. dual RTX), increment the dev* index"
  echo "     and add: dev7: /dev/nvidia1,gid=${LXC_VIDEO_GID}"
  echo "  4. After applying config, restart the container:"
  echo "     pct stop ${vmid} && pct start ${vmid}"
  echo "  5. Verify inside the container:"
  echo "     ls -la /dev/nvidia*"
  echo "     nvidia-smi"
  echo ""
}

# -------------------------------------------------------------------------------------
# PURGE
# -------------------------------------------------------------------------------------
do_purge() {
  section "Purging All NVIDIA Components"
  warn "This will remove ALL NVIDIA packages and configuration."
  read -r -p "Are you sure? (yes/no): " confirm
  [[ "$confirm" == "yes" ]] || { info "Purge cancelled."; exit 0; }

  run "systemctl stop nvidia-setup-nodes.service 2>/dev/null || true"
  run "systemctl disable nvidia-setup-nodes.service 2>/dev/null || true"
  run "systemctl stop nvidia-persistenced 2>/dev/null || true"
  run "systemctl disable nvidia-persistenced 2>/dev/null || true"

  run "rm -f '${NODE_SERVICE}' '${NODE_SCRIPT}' '${UVM_RULE_FILE}' '${MODULES_LOAD_FILE}' '${NOUVEAU_BLACKLIST}'"

  if [[ "${DRY_RUN}" -eq 0 ]]; then
    systemctl daemon-reload
    udevadm control --reload-rules 2>/dev/null || true
  fi

  local nvidia_pkgs
  nvidia_pkgs="$(dpkg -l '*nvidia*' '*libnvidia*' 2>/dev/null | awk '/^ii/{print $2}' || true)"

  if [[ -n "$nvidia_pkgs" ]]; then
    info "Removing packages:"
    echo "$nvidia_pkgs" | while read -r p; do info "  $p"; done
    if [[ "${DRY_RUN}" -eq 0 ]]; then
      # shellcheck disable=SC2086
      DEBIAN_FRONTEND=noninteractive apt-get purge -y $nvidia_pkgs || true
      DEBIAN_FRONTEND=noninteractive apt-get autoremove -y || true
    else
      info "[dry-run] apt-get purge -y ${nvidia_pkgs//$'\n'/ }"
    fi
  fi

  ok "Purge complete. You may want to reboot."
  info "Note: --purge does NOT touch kernel guard holds. Use --remove-guard for those."
}

# -------------------------------------------------------------------------------------
# HEALTH CHECK / --check-only
# -------------------------------------------------------------------------------------
do_check() {
  section "NVIDIA System Health Check"

  # Warm the GPU first so idle-unloaded modules reload before we sample state.
  warm_up_gpu
  local smi_ok=0
  if command -v nvidia-smi &>/dev/null && nvidia-smi &>/dev/null; then
    smi_ok=1
  fi

  echo ""
  echo "-- Running Kernel ---------------------------------------"
  info "$(uname -r)"

  echo ""
  echo "-- Kernel Modules ---------------------------------------"
  for mod in nvidia nvidia_modeset nvidia_drm nvidia_uvm; do
    if lsmod | grep -q "^${mod}[[:space:]]"; then
      ok "$mod loaded"
    elif [[ "$smi_ok" -eq 1 ]]; then
      skip "$mod not currently loaded (driver functional; loads on access)"
    else
      fail "$mod NOT loaded"
    fi
  done

  echo ""
  echo "-- Device Nodes -----------------------------------------"
  for dev in /dev/nvidia0 /dev/nvidiactl /dev/nvidia-modeset /dev/nvidia-uvm /dev/nvidia-uvm-tools; do
    if [[ -c "$dev" ]]; then
      local info_str; info_str="$(ls -la "$dev" 2>/dev/null)"
      ok "$dev  ->  $info_str"
    elif [[ -e "$dev" ]]; then
      fail "$dev EXISTS but is NOT a character device (stub file - timing bug)"
    else
      fail "$dev MISSING"
    fi
  done
  for cap in /dev/nvidia-caps/nvidia-cap1 /dev/nvidia-caps/nvidia-cap2; do
    if [[ -c "$cap" ]]; then
      ok "$cap  ->  $(ls -la "$cap" 2>/dev/null)"
    else
      skip "$cap not found (normal on GPUs without MIG/caps support)"
    fi
  done

  echo ""
  echo "-- systemd Services -------------------------------------"
  for svc in nvidia-setup-nodes.service nvidia-persistenced.service; do
    local state; state="$(systemctl is-active "$svc" 2>/dev/null || echo "inactive")"
    local enabled; enabled="$(systemctl is-enabled "$svc" 2>/dev/null || echo "disabled")"
    if [[ "$state" == "active" && "$enabled" == "enabled" ]]; then
      ok "$svc  [active/enabled]"
    else
      fail "$svc  [${state}/${enabled}]"
    fi
  done

  echo ""
  echo "-- DKMS Status (running kernel = $(uname -r)) -----------"
  if command -v dkms &>/dev/null; then
    local rk; rk="$(uname -r)"
    dkms status | while IFS= read -r line; do
      if echo "$line" | grep -q "installed"; then
        ok "$line"
      else
        fail "$line"
      fi
    done
    if dkms status 2>/dev/null | grep -F "$rk" | grep -q "installed"; then
      ok "nvidia module present for running kernel ${rk}"
    else
      fail "no installed nvidia module for running kernel ${rk} - run a full install or --force-rebuild"
    fi
  else
    skip "dkms not installed"
  fi

  echo ""
  echo "-- nvidia-smi -------------------------------------------"
  if command -v nvidia-smi &>/dev/null; then
    nvidia-smi || warn "nvidia-smi failed"
  else
    fail "nvidia-smi not found"
  fi

  echo ""
  echo "-- Kernel Guard -----------------------------------------"
  local holds pin
  holds="$(apt-mark showhold 2>/dev/null | grep -E '^proxmox-(default|kernel|headers)-' || true)"
  if [[ -n "$holds" ]]; then
    ok "apt holds active (Layer 2):"
    echo "$holds" | while read -r p; do ok "  $p"; done
  else
    fail "No proxmox kernel/header apt holds - guard Layer 2 NOT active (run --apply-guard)"
  fi
  if command -v proxmox-boot-tool &>/dev/null; then
    pin="$(get_pinned_kernel || true)"
    if [[ -n "$pin" && "$pin" != "None." ]]; then
      ok "Boot kernel pinned (Layer 3): ${pin}"
    else
      fail "No boot kernel pinned - guard Layer 3 NOT active (run --apply-guard)"
    fi
  fi
  local sb; sb="$(stale_boot_kernels || true)"
  if [[ -n "$sb" ]]; then
    echo "$sb" | while read -r k; do warn "stale boot entry: $k - run --cleanup-boot"; done
  else
    ok "No stale boot entries"
  fi

  echo ""
  echo "-- Installed Files --------------------------------------"
  for f in "$NODE_SERVICE" "$NODE_SCRIPT" "$UVM_RULE_FILE" "$NOUVEAU_BLACKLIST" "$MODULES_LOAD_FILE"; do
    if [[ -f "$f" ]]; then
      ok "$f"
    else
      fail "$f MISSING"
    fi
  done

  echo ""
  print_lxc_config "YOUR_VMID"
}

# -------------------------------------------------------------------------------------
# FORCE REBUILD
# -------------------------------------------------------------------------------------
do_force_rebuild() {
  section "Force Rebuilding DKMS Modules"
  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] dkms autoinstall --force"
    return
  fi
  info "Forcing DKMS rebuild for all kernels..."
  dkms autoinstall --force 2>&1 | tee /tmp/dkms_rebuild.log || true
  ok "Force rebuild complete. Check /tmp/dkms_rebuild.log for details."
}

# -------------------------------------------------------------------------------------
# HELP
# -------------------------------------------------------------------------------------
show_help() {
  cat <<EOF

${BOLD}setup-gpu-pxe.sh v${SCRIPT_VERSION}${RESET}
Proxmox VE NVIDIA Driver Host Installer + LXC GPU Configurator + Kernel Guard

${BOLD}USAGE${RESET}
  sudo $SCRIPT_NAME [OPTION]

${BOLD}INSTALL / VERIFY${RESET}
  (none)              Full install - drivers, services, nodes, THEN applies guard
  --check-only        System + GPU health (also reports guard status)
  --force-rebuild     Force DKMS module rebuild (after a kernel update)
  --lxc-config [ID]   Print LXC container config snippet (optional VMID)

${BOLD}KERNEL GUARD${RESET}
  --preflight         Simulate full-upgrade; BLOCK if an unapproved series is
                      incoming. Run this BEFORE 'apt full-upgrade'.
  --apply-guard       apt-mark hold the series selector + unapproved-series
                      packages, then pin the running kernel.
  --guard-status      Show approved series, apt holds, and boot pin.
  --remove-guard      Lift the apt holds (after approving a new series).
  --repoint-selector [VER]
                      Move proxmox-default-kernel/headers to an approved-series
                      version and re-hold (the "get off 7.0 as default" fix).
  --cleanup-boot      Reconcile the ESP and remove stale boot entries.
  --prune-dkms        Remove DKMS builds for kernels no longer installed.

${BOLD}MAINTENANCE${RESET}
  --purge             Remove all NVIDIA components (does not touch holds)
  --dry-run           Preview any action without making changes
  --help              Show this help

${BOLD}APPROVED KERNEL SERIES${RESET}  (edit APPROVED_KERNEL_SERIES near the top)
  ${APPROVED_KERNEL_SERIES[*]}

${BOLD}KERNEL UPDATE SOP (manual, production)${RESET}
  The boot pin holds you on the current kernel, so a new kernel does NOT auto-boot.
  Pin the NEW kernel BEFORE rebooting so a SINGLE reboot lands on it.

   1. apt update
   2. sudo $SCRIPT_NAME --preflight          hard gate; STOP if it prints BLOCKED
   3. apt full-upgrade                       installs new kernel; DKMS builds NVIDIA
   4. dkms status                            confirm NEW kernel shows 'installed'
                                             (do this while still on the OLD kernel)
   5. proxmox-boot-tool kernel pin <NEW>     re-pin BEFORE rebooting
   6. reboot
   7. uname -r                               confirm you are on <NEW>
   8. sudo $SCRIPT_NAME --check-only         verify GPU on the new kernel
   9. sudo $SCRIPT_NAME --apply-guard        reconfirm holds + pin (running kernel)
  10. start passthrough LXCs; nvidia-smi inside each

  Optional cleanup, once the new kernel is proven and a spare remains:
   - sudo $SCRIPT_NAME --prune-dkms          drop DKMS builds for removed kernels
   - sudo $SCRIPT_NAME --cleanup-boot        clear stale boot entries

EOF
}

# -------------------------------------------------------------------------------------
# BOOT ENTRY / SELECTOR HELPERS (v4.1.0)
# -------------------------------------------------------------------------------------
# Kernel version strings (matching uname -r) for installed proxmox-kernel packages.
installed_kernel_versions() {
  local pkg v
  while IFS= read -r pkg; do
    [[ -z "$pkg" ]] && continue
    v="${pkg#proxmox-kernel-}"
    v="${v%-signed}"
    echo "$v"
  done < <(dpkg-query -W -f='${Package}\n' 'proxmox-kernel-[0-9]*' 2>/dev/null \
           | grep -E '\-pve(-signed)?$' | sort -u || true)
}

# Kernels registered for boot in proxmox-boot-tool's "Automatically selected" list.
boot_listed_kernels() {
  command -v proxmox-boot-tool &>/dev/null || return 0
  proxmox-boot-tool kernel list 2>/dev/null \
    | awk '/[Aa]utomatically selected kernels:/{f=1;next} /Pinned kernel:/{f=0} f&&NF{gsub(/^[ \t]+/,"");print}'
}

# Boot-listed kernels whose package is NOT installed, excluding running + pinned.
stale_boot_kernels() {
  command -v proxmox-boot-tool &>/dev/null || return 0
  local running pinned inst k
  running="$(uname -r)"
  pinned="$(get_pinned_kernel || true)"
  inst=" $(installed_kernel_versions | tr '\n' ' ') "
  while IFS= read -r k; do
    [[ -z "$k" ]] && continue
    [[ "$k" == "$running" ]] && continue
    [[ "$k" == "$pinned"  ]] && continue
    [[ "$inst" == *" $k "* ]] && continue
    echo "$k"
  done < <(boot_listed_kernels)
}

# Emit "VERSION SERIES" for each available proxmox-default-kernel version.
list_selector_versions() {
  apt-cache show proxmox-default-kernel 2>/dev/null | awk '
    /^Version:/ { ver=$2 }
    /^Depends:/ {
      if (match($0, /proxmox-kernel-[0-9]+\.[0-9]+/)) {
        s=substr($0, RSTART, RLENGTH); sub(/proxmox-kernel-/,"",s);
        print ver, s
      }
    }'
}

# -------------------------------------------------------------------------------------
# --cleanup-boot : reconcile ESP and drop stale boot entries
# -------------------------------------------------------------------------------------
cleanup_boot() {
  section "Boot Kernel Cleanup"
  if ! command -v proxmox-boot-tool &>/dev/null; then
    warn "proxmox-boot-tool not found - nothing to reconcile."
    return
  fi
  local running pinned
  running="$(uname -r)"; pinned="$(get_pinned_kernel || true)"
  info "Running kernel: ${running}"
  info "Pinned kernel:  ${pinned:-<none>}"

  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] proxmox-boot-tool clean"
  else
    info "Reconciling ESP (proxmox-boot-tool clean)..."
    proxmox-boot-tool clean || warn "clean returned an error."
  fi

  local stale=() k
  while IFS= read -r k; do [[ -n "$k" ]] && stale+=("$k"); done < <(stale_boot_kernels)

  if [[ ${#stale[@]} -eq 0 ]]; then
    ok "No stale boot entries (every listed kernel is installed / running / pinned)."
  else
    warn "Stale boot entries (listed for boot but package not installed):"
    for k in "${stale[@]}"; do warn "  $k"; done
    if [[ "${DRY_RUN}" -eq 1 ]]; then
      for k in "${stale[@]}"; do info "[dry-run] proxmox-boot-tool kernel remove $k"; done
    else
      for k in "${stale[@]}"; do
        info "Removing stale boot entry: $k"
        proxmox-boot-tool kernel remove "$k" || warn "  remove failed for $k"
      done
    fi
  fi

  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] proxmox-boot-tool refresh"
  else
    info "Refreshing boot configuration..."
    proxmox-boot-tool refresh || warn "refresh returned an error."
  fi

  echo ""
  info "Current kernel list:"
  proxmox-boot-tool kernel list 2>/dev/null || true
}

# -------------------------------------------------------------------------------------
# --prune-dkms : remove DKMS builds for kernels that are no longer installed
# -------------------------------------------------------------------------------------
# After a kernel update, an old kernel's package may be gone while its NVIDIA DKMS
# build lingers in 'dkms status'. This removes those obsolete builds. It NEVER touches
# the build for the running kernel or for any kernel whose package is still installed.
prune_dkms() {
  section "Pruning Obsolete DKMS Builds"
  if ! command -v dkms &>/dev/null; then
    warn "dkms not found - nothing to do."
    return
  fi

  local running inst
  running="$(uname -r)"
  inst=" $(installed_kernel_versions | tr '\n' ' ') "

  # dkms status line: "nvidia-current/550.163.01, 6.17.4-2-pve, x86_64: installed"
  local obsolete=() line mod kern
  while IFS= read -r line; do
    [[ -z "$line" ]] && continue
    mod="$(echo "$line"  | awk -F, '{print $1}' | xargs)"   # nvidia-current/550.163.01
    kern="$(echo "$line" | awk -F, '{print $2}' | xargs)"   # 6.17.4-2-pve
    [[ -z "$mod" || -z "$kern" ]] && continue
    [[ "$kern" =~ ^[0-9]+\.[0-9]+ ]] || continue            # sanity: looks like a version
    [[ "$kern" == "$running" ]] && continue                 # never the running kernel
    [[ "$inst" == *" $kern "* ]] && continue                # never an installed kernel
    obsolete+=("${mod}|${kern}")
  done < <(dkms status 2>/dev/null || true)

  if [[ ${#obsolete[@]} -eq 0 ]]; then
    ok "No obsolete DKMS builds (every build is for an installed or running kernel)."
    return
  fi

  info "Obsolete DKMS builds (kernel no longer installed):"
  local entry
  for entry in "${obsolete[@]}"; do
    info "  ${entry%|*}  for kernel  ${entry#*|}"
  done

  if [[ "${DRY_RUN}" -eq 1 ]]; then
    for entry in "${obsolete[@]}"; do
      info "[dry-run] dkms remove ${entry%|*} -k ${entry#*|}"
    done
    return
  fi

  read -r -p "Remove the DKMS builds listed above? Type 'yes': " c
  [[ "$c" == "yes" ]] || { info "Cancelled - no changes made."; return; }

  for entry in "${obsolete[@]}"; do
    mod="${entry%|*}"; kern="${entry#*|}"
    info "Removing: ${mod} for kernel ${kern}"
    dkms remove "${mod}" -k "${kern}" || warn "  remove failed for ${mod} / ${kern}"
  done
  ok "Obsolete DKMS builds removed."
  info "Current dkms status:"
  dkms status 2>/dev/null || true
}

# -------------------------------------------------------------------------------------
# --repoint-selector [VER] : move proxmox-default-kernel/headers to an approved series
# -------------------------------------------------------------------------------------
# Automates the manual fix used to get OFF the 7.0 default without touching proxmox-ve:
#   apt-get install proxmox-default-kernel=2.0.2 proxmox-default-headers=2.0.2
#   apt-mark hold  proxmox-default-kernel proxmox-default-headers
# Downgrading the SELECTOR meta orphans the unapproved-series kernel, which can then be
# removed with 'apt-get autoremove' + '--cleanup-boot'. proxmox-ve stays satisfied
# because the approved-series selector still fulfils its dependency.
do_repoint_selector() {
  section "Repoint Default-Kernel Selector to an Approved Series"
  local want="${REPOINT_VERSION:-}"

  local rows; rows="$(list_selector_versions | sort -rV || true)"
  if [[ -z "$rows" ]]; then
    warn "Could not read proxmox-default-kernel versions from apt-cache."
    warn "Run 'apt update' first, then retry."
    return
  fi

  info "Available proxmox-default-kernel versions (version -> kernel series):"
  while read -r v s; do
    [[ -z "$v" ]] && continue
    if is_approved_series "$s"; then ok "  $v -> $s (approved)"; else warn "  $v -> $s (NOT approved)"; fi
  done <<< "$rows"

  local cur cur_series
  cur="$(dpkg-query -W -f='${Version}' proxmox-default-kernel 2>/dev/null || true)"
  cur_series="$(echo "$rows" | awk -v v="$cur" '$1==v{print $2; exit}')"
  echo ""
  info "Currently installed: ${cur:-<none>} (series ${cur_series:-unknown})"

  if [[ -n "$cur_series" ]] && is_approved_series "$cur_series"; then
    ok "Selector already points at an approved series - nothing to do."
    return
  fi

  # Choose target: explicit VER, else highest available approved-series version.
  local target="$want"
  if [[ -z "$target" ]]; then
    while read -r v s; do
      if is_approved_series "$s"; then target="$v"; break; fi   # rows sorted desc
    done <<< "$rows"
  fi
  if [[ -z "$target" ]]; then
    error "No available selector version depends on an approved series (${APPROVED_KERNEL_SERIES[*]})."
    error "Either an approved-series meta isn't in your repos, or APPROVED_KERNEL_SERIES needs updating."
    return
  fi

  local target_series; target_series="$(echo "$rows" | awk -v v="$target" '$1==v{print $2; exit}')"
  if [[ -z "$target_series" ]]; then
    error "Version ${target} not found among available proxmox-default-kernel versions."
    return
  fi
  if ! is_approved_series "$target_series"; then
    error "Refusing: target ${target} depends on series ${target_series}, which is NOT approved."
    return
  fi

  info "Target: proxmox-default-kernel=${target} / proxmox-default-headers=${target} (series ${target_series})"

  if [[ "${DRY_RUN}" -eq 1 ]]; then
    info "[dry-run] apt-mark unhold proxmox-default-kernel proxmox-default-headers"
    info "[dry-run] apt-get install --allow-downgrades proxmox-default-kernel=${target} proxmox-default-headers=${target}"
    info "[dry-run] apt-mark hold proxmox-default-kernel proxmox-default-headers"
    return
  fi

  warn "apt will show its plan next. REVIEW IT: it must NOT remove proxmox-ve or any"
  warn "approved-series (${APPROVED_KERNEL_SERIES[*]}) kernel. Answer 'n' at apt if it does."
  read -r -p "Proceed to repoint selectors to ${target}? Type 'yes': " c
  [[ "$c" == "yes" ]] || { info "Cancelled - no changes made."; return; }

  apt-mark unhold proxmox-default-kernel proxmox-default-headers 2>/dev/null || true

  local ok_install=0
  if DEBIAN_FRONTEND=noninteractive apt-get install --allow-downgrades \
       "proxmox-default-kernel=${target}" "proxmox-default-headers=${target}"; then
    ok_install=1
  fi

  # Always restore protective holds, whatever happened above.
  apt-mark hold proxmox-default-kernel proxmox-default-headers 2>/dev/null || true

  if [[ "$ok_install" -eq 1 ]]; then
    ok "Selectors repointed to ${target} and re-held."
    info "Next steps to finish clearing the old series:"
    info "  apt-get autoremove                 # drop now-orphaned unapproved kernels"
    info "  sudo $SCRIPT_NAME --cleanup-boot   # clear stale boot entries"
  else
    warn "Downgrade did not complete (aborted or failed). Selectors re-held at current version."
    warn "No harm done - the guard state is intact. Review the apt output above."
  fi
}

# -------------------------------------------------------------------------------------
# ARGUMENT PARSING
# -------------------------------------------------------------------------------------
parse_args() {
  while [[ $# -gt 0 ]]; do
    case "$1" in
      --dry-run)        DRY_RUN=1 ;;
      --purge)          PURGE=1 ;;
      --force-rebuild)  FORCE_REBUILD=1 ;;
      --check-only)     CHECK_ONLY=1 ;;
      --preflight)      PREFLIGHT_ONLY=1 ;;
      --apply-guard)    APPLY_GUARD_ONLY=1 ;;
      --remove-guard)   REMOVE_GUARD_ONLY=1 ;;
      --guard-status)   GUARD_STATUS_ONLY=1 ;;
      --cleanup-boot)   CLEANUP_BOOT_ONLY=1 ;;
      --prune-dkms)     PRUNE_DKMS_ONLY=1 ;;
      --repoint-selector)
        REPOINT_SELECTOR_ONLY=1
        if [[ $# -gt 1 && "$2" =~ ^[0-9] ]]; then REPOINT_VERSION="$2"; shift; fi
        ;;
      --lxc-config)
        LXC_CONFIG_ONLY=1
        if [[ $# -gt 1 && "$2" =~ ^[0-9]+$ ]]; then
          LXC_VMID="$2"; shift
        fi
        ;;
      --help|-h)        show_help; exit 0 ;;
      *)
        error "Unknown argument: $1"
        show_help
        exit 1
        ;;
    esac
    shift
  done
}

# -------------------------------------------------------------------------------------
# CLEANUP TRAP
# -------------------------------------------------------------------------------------
cleanup() {
  local exit_code=$?
  if [[ $exit_code -ne 0 && $exit_code -ne 2 ]]; then
    warn "Script exited with code ${exit_code}."
    warn "Partial installation may have occurred."
    warn "Check the output above, then re-run or use --purge to reset."
  fi
}
trap cleanup EXIT

# -------------------------------------------------------------------------------------
# MAIN
# -------------------------------------------------------------------------------------
main() {
  parse_args "$@"
  check_root "$@"

  echo ""
  echo -e "${BOLD}${CYAN}setup-gpu-pxe.sh v${SCRIPT_VERSION} - Proxmox VE NVIDIA Driver Installer + Kernel Guard${RESET}"
  echo -e "Target: Proxmox VE 9.x / Debian 13 (Trixie)"
  echo -e "Approved kernel series: ${APPROVED_KERNEL_SERIES[*]}"
  [[ "${DRY_RUN}" -eq 1 ]] && echo -e "${YELLOW}*** DRY RUN MODE - no changes will be made ***${RESET}"
  echo ""

  # Short-circuit modes
  if [[ "${LXC_CONFIG_ONLY}" -eq 1 ]]; then
    print_lxc_config "${LXC_VMID}"; exit 0
  fi
  if [[ "${PREFLIGHT_ONLY}" -eq 1 ]]; then
    do_preflight || exit $?     # exit 2 on block
    exit 0
  fi
  if [[ "${GUARD_STATUS_ONLY}" -eq 1 ]]; then
    guard_status; exit 0
  fi
  if [[ "${CLEANUP_BOOT_ONLY}" -eq 1 ]]; then
    cleanup_boot; exit 0
  fi
  if [[ "${PRUNE_DKMS_ONLY}" -eq 1 ]]; then
    prune_dkms; exit 0
  fi
  if [[ "${REPOINT_SELECTOR_ONLY}" -eq 1 ]]; then
    do_repoint_selector; exit 0
  fi
  if [[ "${APPLY_GUARD_ONLY}" -eq 1 ]]; then
    apply_kernel_guard; exit 0
  fi
  if [[ "${REMOVE_GUARD_ONLY}" -eq 1 ]]; then
    remove_kernel_guard; exit 0
  fi
  if [[ "${CHECK_ONLY}" -eq 1 ]]; then
    do_check; exit 0
  fi
  if [[ "${PURGE}" -eq 1 ]]; then
    do_purge; exit 0
  fi
  if [[ "${FORCE_REBUILD}" -eq 1 ]]; then
    do_force_rebuild; exit 0
  fi

  # Full install sequence
  check_prerequisites
  detect_nvidia_gpu
  check_secure_boot
  blacklist_nouveau
  ensure_debian_sources         # non-free + trixie-backports
  install_nvidia_backports      # nvidia-kernel-dkms + nvidia-driver + persistenced
  build_dkms_all_kernels        # verify DKMS + self-heal running kernel
  configure_module_autoload
  setup_device_stability        # Pillar 2 - boot ordering + persistence
  load_and_verify_modules
  run_nvidia_smi_check
  apply_kernel_guard            # Guard Layers 2 + 3

  section "Installation Complete"
  echo ""
  ok "NVIDIA drivers installed (trixie-backports)"
  ok "nvidia-setup-nodes.service enabled (runs before LXC at boot)"
  ok "nvidia-persistenced installed and enabled"
  ok "Kernel guard applied (apt holds + boot pin)"
  echo ""
  info "Note: a normal re-run VERIFIES the DKMS build and rebuilds only if the"
  info "running kernel lacks a module. Before EVERY 'apt full-upgrade', run:"
  info "  sudo $SCRIPT_NAME --preflight"
  echo ""
  echo -e "${BOLD}NEXT STEPS:${RESET}"
  echo "  1. Reboot the Proxmox host (if this was a fresh driver install)"
  echo "  2. Verify: sudo $SCRIPT_NAME --check-only"
  echo "  3. Get container config: sudo $SCRIPT_NAME --lxc-config <VMID>"
  echo "  4. Apply config to /etc/pve/lxc/<VMID>.conf"
  echo "  5. Restart containers: pct stop <VMID> && pct start <VMID>"
  echo ""
  print_lxc_config "YOUR_VMID"
}

main "$@"
This should work fine but you are advised I have a dated card that with drivers that are not supported on version 7 of the new Kernel.
The following information was checked this morning and advise received.
Short answer: yes, but the picture is more nuanced than a clean version bump, and it's directly relevant to the DKMS fragility problem you already flagged for baden.
Debian's own packaged driver is still stuck at 550.x Trixie's in-repo nvidia-driver package is 550.163.01, and the Debian wiki confirms this build no longer compiles on kernels past v6.19 that landed in trixie-backports — unresolved as of the last update. So if you're pulling from apt, you're capped there and it breaks on the newer kernel lines.
The kernel jump that's causing the pain Proxmox VE 9.2's pve kernel line moved into the 7.x branch, which included a rework of the VMA locking API (the VMA_LOCK_OFFSET macro was removed from mm_types.h). That's what actually broke DKMS compiles across the board — not a driver-version gate, a kernel internals change.
What actually works now NVIDIA's own 580.x branch (from download.nvidia.com, not Debian's package) is the one built against the new VMA API. Early 580 releases still failed to compile against kernel 7.0, but a fix landed — 580.159.04 is confirmed compiling clean against kernel 7.0.6 in a resolved Proxmox forum thread, and against 6.17 as well. People running RTX 3070/Ampere cards on Trixie are installing it manually via the .run installer with --dkms, not from apt.
One repo gotcha Debian 13 tightened package signing in Feb 2026 (rejects SHA1-signed sources), which trips up some of Nvidia's own repos. Most people working around this are either using the .run file directly or pointing at the bookworm CUDA repo with a signature-check workaround — worth knowing before you go down that path.
On LXC passthrough specifically Nothing changed in the actual passthrough mechanism — it's the same cgroup2 device allow rules and bind-mounting /dev/nvidia* into the container as always. That was never gated by driver version; it only breaks when the module fails to compile against the host kernel in the first place, which is exactly what's been happening.
Given setup-gpu-pxe.sh's current APPROVED_KERNEL_SERIES=("6.17") guard and your DKMS-fragility rationale for the AMD swap — 580.159.04+ does appear to have stabilized on both 6.17 and kernel 7.x now, but only via the manual .run route, since Debian's own packaging is still frozen at 550. So the underlying fragility (apt-packaged driver lagging kernel support) hasn't actually gone away, just shifted to "manual install required" rather than "doesn't work at all." Worth knowing whether that changes your calculus on the RX 9060 XT swap or not.
So at this point I am still not moving till they catch up, or I ditch the card as previously discussed.
#enoughsaid