mirror of
https://github.com/MacRimi/ProxMenux.git
synced 2026-06-01 21:14:49 +00:00
5ca3463bf6
Full rewrite of the docs site under app/[locale]/ with next-intl in localePrefix:"always" mode. Every page now exists at both /en/<path> and /es/<path>; the root / shows a meta-refresh + JS redirect to /<defaultLocale>/ so GitHub Pages serves something on the apex URL. Highlights: - 107 doc pages migrated to file-per-page JSON namespaces under messages/en/ and messages/es/. Spanish content is fully translated (no copy-of-English placeholders). - New documentation for the Active Suppressions section in the Settings tab and the per-event Dismiss dropdown in the Health Monitor modal. - New screenshots: dismiss-duration-dropdown.png and an updated health-suppression-settings.png. - Pagefind integrated for client-side search; index is built on every CI deploy (not committed). - RSS feeds: per-locale at /<locale>/rss.xml plus root /rss.xml for backward compat. - Removed the dead app/[locale]/guides/[slug]/ route — every guide now has its own static page and no markdown source remains. - Fixed orphan link /guides/nvidia -> /guides/nvidia-manual in docs/hardware/nvidia-host. - Removed obsolete components (footer2, calendar, drawer). Verified locally with `npm ci && npm run build`: 2804 files in out/, 231 pages indexed by pagefind, root redirect intact, both locale roots and the new Active Suppressions docs render OK.
154 lines
14 KiB
JSON
154 lines
14 KiB
JSON
{
|
|
"meta": {
|
|
"title": "Installing NVIDIA Drivers on Proxmox VE 9 (manual procedure) | ProxMenux Guides",
|
|
"description": "Manual installation and configuration of NVIDIA drivers on a Proxmox VE 9 host, plus the LXC wiring needed to expose the GPU to one or more containers. Covers driver install, persistence service, optional NVENC patch and the per-container setup.",
|
|
"ogTitle": "Installing NVIDIA Drivers on Proxmox VE 9 (manual procedure)",
|
|
"ogDescription": "Manual NVIDIA driver install on PVE 9 — host driver, persistence service, optional NVENC patch and per-container LXC wiring."
|
|
},
|
|
"header": {
|
|
"title": "Installing NVIDIA Graphics Card Drivers on Proxmox VE 9 (manual procedure)",
|
|
"description": "Manual installation and configuration of NVIDIA drivers on a Proxmox VE 9 host, plus the LXC wiring needed to expose the GPU to one or more containers. Covers driver install, persistence service, optional NVENC patch and the per-container setup.",
|
|
"section": "Guides"
|
|
},
|
|
"intro": {
|
|
"calloutTitle": "Note",
|
|
"calloutBody": "This is the manual procedure preserved as a reference. For most users the recommended path is the automated ProxMenux flow: <strong>Install NVIDIA Drivers (Host)</strong> in the <em>GPUs and Coral-TPU</em> menu, and <strong>Add GPU to LXC</strong> (or <strong>Add GPU to VM</strong>) for the per-guest wiring. The automated flow handles the same steps documented here (with extra safety checks: kernel-headers compatibility, IOMMU validation, VFIO conflict resolution). This page is for operators who want to understand each command, or who need to deviate from the standard flow.",
|
|
"targetNote": "Targeted at <strong>Proxmox VE 9</strong> (Debian Trixie). PVE 7 (Bullseye) and PVE 8 (Bookworm) are no longer covered.",
|
|
"stepsTitle": "What you'll do",
|
|
"steps": [
|
|
"Prepare the Proxmox VE 9 host (blacklist nouveau, repos, prerequisites).",
|
|
"Install the NVIDIA driver on the host.",
|
|
"Install the NVIDIA persistence service.",
|
|
"(Optional) Apply the keylase patch to lift the NVENC concurrent-session limit on consumer GPUs.",
|
|
"Wire the GPU into one or more LXC containers and install the matching driver inside each."
|
|
]
|
|
},
|
|
"prepareHost": {
|
|
"heading": "1. Prepare the host (PVE 9)",
|
|
"blacklistHeading": "1.1 Blacklist nouveau",
|
|
"blacklistBody": "Check whether the open-source <code>nouveau</code> driver is already blacklisted:",
|
|
"blacklistCheckCode": "cat /etc/modprobe.d/blacklist.conf",
|
|
"blacklistAdd": "If <code>blacklist nouveau</code> does not appear, add it and reboot:",
|
|
"blacklistAddCode": "echo \"blacklist nouveau\" '>>' /etc/modprobe.d/blacklist.conf\nreboot",
|
|
"blacklistImageAlt": "Blacklist check",
|
|
"reposHeading": "1.2 Verify repositories (PVE 9 / Trixie)",
|
|
"reposBody": "If you ran the ProxMenux Post-Install script (or any other Proxmox post-install tooling), the repositories are already in place — skip this step.",
|
|
"reposOtherwise": "Otherwise, on a vanilla PVE 9 install with no enterprise subscription:",
|
|
"reposEditCode": "nano /etc/apt/sources.list.d/proxmox.sources",
|
|
"reposPveBody": "Make sure it contains the no-subscription source for Trixie:",
|
|
"reposPveCode": "Types: deb\nURIs: http://download.proxmox.com/debian/pve\nSuites: trixie\nComponents: pve-no-subscription\nSigned-By: /usr/share/keyrings/proxmox-archive-keyring.gpg",
|
|
"reposDebianBody": "And the Debian sources at <code>/etc/apt/sources.list.d/debian.sources</code>:",
|
|
"reposDebianCode": "Types: deb\nURIs: http://deb.debian.org/debian/\nSuites: trixie trixie-updates\nComponents: main contrib non-free non-free-firmware\nSigned-By: /usr/share/keyrings/debian-archive-keyring.gpg\n\nTypes: deb\nURIs: http://security.debian.org/debian-security/\nSuites: trixie-security\nComponents: main contrib non-free non-free-firmware\nSigned-By: /usr/share/keyrings/debian-archive-keyring.gpg",
|
|
"updateHeading": "1.3 Update the system and install prerequisites",
|
|
"updateCode": "apt update && apt dist-upgrade -y",
|
|
"buildToolsBody": "Install the build tools and kernel headers needed to compile the NVIDIA kernel module:",
|
|
"buildToolsCode": "apt-get install -y git\napt-get install -qqy pve-headers-$(uname -r) gcc make"
|
|
},
|
|
"installDriver": {
|
|
"heading": "2. Install the NVIDIA driver on the host",
|
|
"pickHeading": "2.1 Pick a driver version",
|
|
"pickBody": "Check the latest stable driver:",
|
|
"pickUrlCode": "https://download.nvidia.com/XFree86/Linux-x86_64/latest.txt",
|
|
"nvencCallout": "If you plan to apply the NVENC patch (step 4), verify the patch supports your chosen driver version first: <patchLink>github.com/keylase/nvidia-patch</patchLink>",
|
|
"nvencCalloutTitle": "Heads up",
|
|
"pickReplace": "Replace <code>latest.txt</code> in the URL with the version number to find the installer file ending in <code>.run</code>. The full driver list is at:",
|
|
"pickListCode": "https://download.nvidia.com/XFree86/Linux-x86_64/",
|
|
"pickImageAlt": "NVIDIA driver download",
|
|
"pickVersionNote": "Throughout the rest of this guide, replace <code>'<'VERSION'>'</code> with the actual version (for example <code>580.95.05</code>).",
|
|
"downloadHeading": "2.2 Download and run the installer",
|
|
"downloadCode": "mkdir -p /opt/nvidia\ncd /opt/nvidia\nwget https://download.nvidia.com/XFree86/Linux-x86_64/'<'VERSION'>'/NVIDIA-Linux-x86_64-'<'VERSION'>'.run\nchmod +x NVIDIA-Linux-x86_64-'<'VERSION'>'.run",
|
|
"firstPassBody": "First pass — this disables <code>nouveau</code> and prepares the system:",
|
|
"firstPassCode": "./NVIDIA-Linux-x86_64-'<'VERSION'>'.run --no-questions --ui=none --disable-nouveau\nreboot",
|
|
"secondPassBody": "After the host comes back, run the installer again to compile and install the kernel module:",
|
|
"secondPassCode": "/opt/nvidia/NVIDIA-Linux-x86_64-'<'VERSION'>'.run --no-questions --ui=none",
|
|
"modulesHeading": "2.3 Load NVIDIA modules at boot",
|
|
"modulesBody": "Edit the modules-load configuration:",
|
|
"modulesEditCode": "nano /etc/modules-load.d/modules.conf",
|
|
"modulesAddBody": "Add the VFIO and NVIDIA modules:",
|
|
"modulesAddCode": "vfio\nvfio_iommu_type1\nvfio_pci\nvfio_virqfd\nnvidia\nnvidia_uvm",
|
|
"modulesSaveBody": "Save (<code>Ctrl+X</code>) and rebuild initramfs:",
|
|
"modulesSaveCode": "update-initramfs -u -k all",
|
|
"udevHeading": "2.4 Create udev rules",
|
|
"udevBody": "So that <code>/dev/nvidia*</code> device nodes are created when the modules load:",
|
|
"udevEditCode": "nano /etc/udev/rules.d/70-nvidia.rules",
|
|
"udevRulesCode": "# /etc/udev/rules.d/70-nvidia.rules\n# Create /dev/nvidia0, /dev/nvidia1 ... and /dev/nvidiactl when nvidia module is loaded\nKERNEL==\"nvidia\", RUN+=\"/bin/bash -c ''/usr/bin/nvidia-smi -L''\"\n\n# Create the CUDA node when nvidia_uvm CUDA module is loaded\nKERNEL==\"nvidia_uvm\", RUN+=\"/bin/bash -c ''/usr/bin/nvidia-modprobe -c0 -u''\"",
|
|
"udevSaveBody": "Save (<code>Ctrl+X</code>)."
|
|
},
|
|
"persistence": {
|
|
"heading": "3. NVIDIA driver persistence service",
|
|
"body": "The persistence daemon keeps the GPU initialised between uses, which avoids the latency hit and occasional state loss that happens when the kernel module is loaded and unloaded repeatedly:",
|
|
"installCode": "cd /opt/nvidia\ngit clone https://github.com/NVIDIA/nvidia-persistenced.git\ncd nvidia-persistenced/init\n./install.sh\nreboot",
|
|
"verifyBody": "Verify the driver is loaded and the service is running after reboot:",
|
|
"verifySmiCode": "nvidia-smi",
|
|
"smiImageAlt": "NVIDIA SMI output",
|
|
"verifyServiceCode": "systemctl status nvidia-persistenced",
|
|
"serviceImageAlt": "NVIDIA persistence service status"
|
|
},
|
|
"nvenc": {
|
|
"heading": "4. (Optional) Lift the NVENC concurrent-session limit",
|
|
"body": "Consumer NVIDIA GPUs ship with a hardcoded limit on the number of simultaneous NVENC encoding sessions (typically 3, 5 or 8 depending on generation). The keylase patch removes that restriction. Useful when running Plex / Jellyfin / Frigate transcoding workloads.",
|
|
"code": "cd /opt/nvidia\ngit clone https://github.com/keylase/nvidia-patch.git\ncd nvidia-patch\n./patch.sh",
|
|
"imageAlt": "NVIDIA patch application",
|
|
"after": "The patch must be re-applied after every driver update. The keylase repo also includes <code>patch-fbc.sh</code> for the FBC (frame buffer capture) limit if you need it."
|
|
},
|
|
"lxcSetup": {
|
|
"heading": "5. Configure an LXC container to use the GPU",
|
|
"identifyHeading": "5.1 Identify the device numbers",
|
|
"identifyBody": "On the host:",
|
|
"identifyCode": "ls -l /dev/nv*",
|
|
"identifyImageAlt": "NVIDIA device list",
|
|
"identifyNote": "Note the major numbers — they vary between systems. Typical values:",
|
|
"tableHeaders": {
|
|
"device": "Device",
|
|
"major": "Typical major"
|
|
},
|
|
"tableRows": [
|
|
{ "device": "<code>/dev/nvidia0</code>, <code>/dev/nvidiactl</code>", "major": "195" },
|
|
{ "device": "<code>/dev/nvidia-uvm</code>, <code>/dev/nvidia-uvm-tools</code>", "major": "509 (varies)" },
|
|
{ "device": "<code>/dev/dri/*</code>", "major": "226" },
|
|
{ "device": "<code>/dev/nvidia-modeset</code>", "major": "195 (shares with nvidia)" }
|
|
],
|
|
"editHeading": "5.2 Edit the LXC config",
|
|
"editBody": "Stop the container first if it's running. Open its config file (replace <code>'<'CTID'>'</code> with the container ID):",
|
|
"editCode": "nano /etc/pve/lxc/'<'CTID'>'.conf",
|
|
"editConfigBody": "Comment out any pre-existing <code>lxc.cgroup2.devices.allow</code> or <code>/dev/dri</code> lines that conflict, then append the NVIDIA wiring (adjust the major numbers to match what <code>ls -l /dev/nv*</code> showed on <strong>your</strong> host):",
|
|
"editConfigCode": "lxc.cgroup2.devices.allow: c 195:* rwm\nlxc.cgroup2.devices.allow: c 509:* rwm\nlxc.cgroup2.devices.allow: c 10:* rwm\nlxc.cgroup2.devices.allow: c 238:* rwm\nlxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file\nlxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file\nlxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file\nlxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file\nlxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file\nlxc.mount.entry: /dev/nvram dev/nvram none bind,optional,create=file",
|
|
"editConfigImageAlt": "LXC configuration",
|
|
"editSave": "Save (<code>Ctrl+X</code>) and start the container.",
|
|
"installCtHeading": "5.3 Install the driver inside the container",
|
|
"installCtCalloutTitle": "Important",
|
|
"installCtCalloutBody": "This part runs <strong>inside</strong> the container, not on the host.",
|
|
"installCtBody": "The kernel module is already loaded by the host — the container only needs the userland libraries that match the same driver version:",
|
|
"installCtCode": "mkdir -p /opt/nvidia\ncd /opt/nvidia\nwget https://download.nvidia.com/XFree86/Linux-x86_64/'<'VERSION'>'/NVIDIA-Linux-x86_64-'<'VERSION'>'.run\nchmod +x NVIDIA-Linux-x86_64-'<'VERSION'>'.run\n./NVIDIA-Linux-x86_64-'<'VERSION'>'.run --no-kernel-module",
|
|
"installCtAfter": "Accept the defaults at every prompt.",
|
|
"installCtImageAlt": "NVIDIA driver installation",
|
|
"verifyCtHeading": "5.4 Verify inside the container",
|
|
"verifyCtSmiCode": "nvidia-smi",
|
|
"verifyCtSmiImageAlt": "NVIDIA SMI in LXC",
|
|
"verifyCtLsCode": "ls -l /dev/nv*",
|
|
"verifyCtLsImageAlt": "NVIDIA devices in LXC",
|
|
"verifyCtAfter": "You should see the GPU listed and the device nodes mounted into the container's filesystem.",
|
|
"workloadHeading": "5.5 Confirm a real workload picks up the GPU",
|
|
"workloadBody": "For Plex / Jellyfin, transcode a video and check the dashboard / logs — hardware-accelerated transcoding is now active.",
|
|
"workloadImage1Alt": "Plex using NVIDIA GPU",
|
|
"workloadImage2Alt": "Plex using NVIDIA GPU - active session",
|
|
"repeatNote": "To wire the GPU into another container, repeat <strong>section 5</strong> for each additional CTID. The driver install inside the container only needs to be done once per container."
|
|
},
|
|
"docker": {
|
|
"heading": "6. (Optional) NVIDIA Docker inside an LXC",
|
|
"body": "If the container runs Docker and you want containers-inside-the-container to use the GPU, install <code>nvidia-docker2</code>. From inside the LXC:",
|
|
"code": "wget https://raw.githubusercontent.com/MacRimi/manuales/main/NVIDIA/nvidia-docker.sh\nchmod +x nvidia-docker.sh\n./nvidia-docker.sh",
|
|
"after": "The script handles the repository setup, package install and Docker daemon configuration in one go."
|
|
},
|
|
"troubleshoot": {
|
|
"heading": "Troubleshooting",
|
|
"items": [
|
|
"<strong><code>nvidia-smi</code> on the host shows the GPU, but inside the container it errors with \"No devices found\":</strong> the driver versions don't match. Re-download the same <code>'<'VERSION'>'</code> inside the container and run with <code>--no-kernel-module</code>.",
|
|
"<strong>Driver compile fails on the host with \"No precompiled kernel interface was found\":</strong> kernel headers are missing or out of sync. Re-run <code>apt-get install pve-headers-$(uname -r)</code> and confirm <code>uname -r</code> matches the running kernel.",
|
|
"<strong>NVENC sessions still capped after applying the patch:</strong> the patch was overwritten by a driver update. Re-run <code>./patch.sh</code> from <code>/opt/nvidia/nvidia-patch</code>.",
|
|
"<strong>GPU stops responding after a few hours of idle:</strong> persistence daemon isn't running. Check with <code>systemctl status nvidia-persistenced</code> and start / enable it.",
|
|
"<strong>Container starts but <code>ls /dev/nv*</code> shows nothing:</strong> the major numbers in the LXC config don't match the host's. Re-run <code>ls -l /dev/nv*</code> on the host and adjust the <code>lxc.cgroup2.devices.allow</code> lines accordingly."
|
|
]
|
|
}
|
|
}
|