Axelera Metis M.2 Max Edge AI module

Axelera Metis M.2 Max Edge AI module

Oh baby, more research required

The new Metis M.2 Max also offers a slimmer profile, advanced thermal management features, and additional security capabilities. It is equipped with up to 16 GB of memory, and versions for both a standard operating temperature range (-20°C to +70°C) and an extended operating temperature range (-40°C to +85°C) will be offered. These enhancements make Metis M.2 Max ideal for applications in industrial manufacturing, retail, security, healthcare, and public safety.
Axelera Metis M.2 Max Edge AI module doubles LLM and VLM processing speed - CNX Software
Axelera AI’s Metis M.2 Max is an M.2 module based on an upgraded Metis AI processor unit (AIPU) delivering twice the memory bandwidth of the current Metis

I need to get on with moving my GPU to my LXC container on my Proxmox server.

Apparently, the process is considerably different to using it in a VM, then I need to get my own personal AI going again. The above module would be extremely handy to boost performance. Only have 1x slot left though, although I found an expansion card somewhere.

WARNING: Tested 2025-10-22 - refer below


🔎 Step 1: Identify Your GPU in Debian 13 (Proxmox Server)

Run:

lspci -nn | grep -E "VGA|3D|Display"

Example output:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080 Ti] [10de:1b06]

For AMD:

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [1002:73bf]

You can also check kernel recognition: (Kernel didn't see it - see below)

dmesg | grep -i nvidia
dmesg | grep -i amdgpu

🛠 Step 2: Install GPU Drivers on the Server

NVIDIA

  1. Update system:
apt update && apt upgrade -y
apt install -y build-essential dkms pve-headers-$(uname -r)
  1. Download the latest driver from NVIDIA’s site. Example:
wget "https://us.download.nvidia.com/XFree86/Linux-x86_64/580.95.05/NVIDIA-Linux-x86_64-580.95.05.run"

chmod +x NVIDIA-Linux-x86_64-580.95.05.run
./NVIDIA-Linux-x86_64-580.95.05.run
  1. Carry out the following actions

    Follow the instructions in a logical manner - my choices are highlighted

Verify the installation

nvidia-smi

AMD - Not owned so can't give instructions

For ROCm compute:

apt install -y firmware-amd-graphics libdrm-amdgpu1

Then follow ROCm installation steps.


🛠 Step 3: Expose GPU Devices to LXC

  1. Find device nodes:
    • NVIDIA: /dev/nvidia0, /dev/nvidiactl, /dev/nvidia-uvm
    • AMD: /dev/dri/renderD128, /dev/kfd
  2. Edit the LXC config:
nano /etc/pve/lxc/<CTID>.conf

Add lines (NVIDIA example):

lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 511:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file

AMD example:

lxc.cgroup2.devices.allow: c 226:* rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.cgroup2.devices.allow: c 238:* rwm
lxc.mount.entry: /dev/kfd dev/kfd none bind,optional,create=file
  1. Restart the container:
pct reboot <CTID>

🛠 Step 4: Install User-Space Libraries Inside the Container

  • For NVIDIA CUDA workloads:
# Ensure you have backed up the container prior to the installation

# Add the NVIDIA GPG key and respositories to the LXC
wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/3bf863cc.pub
gpg --dearmor -o /usr/share/keyrings/nvidia-archive-keyring.gpg 3bf863cc.pub

echo "deb [signed-by=/usr/share/keyrings/nvidia-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/ /" > /etc/apt/sources.list.d/nvidia-cuda.list
apt update

wget "https://us.download.nvidia.com/XFree86/Linux-x86_64/580.95.05/NVIDIA-Linux-x86_64-580.95.05.run"
chmod +x NVIDIA-Linux-x86_64-580.95.05.run
./NVIDIA-Linux-x86_64-580.95.05.run --extract-only
cp NVIDIA-Linux-x86_64-580.95.05/nvidia-smi /usr/bin/
reboot
  • For AMD ROCm workloads:
apt install -y rocm-dev

Then test inside the container:

nvidia-smi   # NVIDIA - SUCCESS - AI READY
rocminfo     # AMD

🔧 How to fix a mismatch

Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.247

The user-space NVML library inside your container (v535.247) doesn't match the kernel-space NVIDIA driver version installed on your Proxmox host.

You need to ensure that the CUDA toolkit and user-space libraries inside the container match the host driver version. Here's how:

Remove existing NVIDIA libraries inside the container:

# I just restored from the backup I made of the LXC prior to this endeaour

apt remove --purge nvidia-cuda-toolkit libnvidia-ml-dev

✅ Option 1: Match container libraries to host driver - NOT tested

Install matching version manually:

Go to NVIDIA's CUDA archive and find the version that matches 580.95.05 (likely CUDA 12.4).

Download the .deb or .run installer inside the container and install it.

Verify inside container:

nvidia-smi

✅ Option 2: Use host libraries via bind mount - Used

If you want to avoid installing CUDA inside the container, you can bind-mount the host’s /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.* into the container:

Add to /etc/pve/lxc/<CTID>.conf:

lxc.mount.entry: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 none bind,optional,create=file

Then restart the container:

pct reboot <CTID>

# Then resturn to step 4 as you should of not got here in the first place

⚡ Automation Script (Server-Side)

Here’s a script to detect GPU, install drivers, and prepare LXC passthrough:

#!/bin/bash
# proxmox-lxc-gpu.sh
# Automates GPU passthrough setup for LXC containers on Proxmox VE 9

set -e

echo "[*] Detecting GPU..."
GPU=$(lspci -nn | grep -E "VGA|3D|Display" || true)

if [[ $GPU == *"NVIDIA"* ]]; then
  echo "[*] NVIDIA GPU detected"
  apt update && apt install -y build-essential dkms pve-headers-$(uname -r)
  echo "[*] Please manually download and install the NVIDIA driver from nvidia.com"
elif [[ $GPU == *"AMD"* ]]; then
  echo "[*] AMD GPU detected"
  apt update && apt install -y firmware-amd-graphics libdrm-amdgpu1
  echo "[*] For compute workloads, install ROCm runtime inside the container"
else
  echo "[-] No supported GPU detected"
  exit 1
fi

echo "[+] Host GPU drivers installed. Now edit /etc/pve/lxc/<CTID>.conf to add device passthrough entries."

✅ Key Differences from VM Passthrough

  • Do NOT bind GPU to VFIO (that would hide it from the host, breaking LXC passthrough).
  • Drivers must be installed on the server.
  • Containers only need user‑space libraries, not kernel modules.
  • WARNING: This may break if the server updates its drivers

Tested 2025-10-22

#enoughsaid