mirror of
https://github.com/MacRimi/ProxMenux.git
synced 2026-06-01 13:04:42 +00:00
complete i18n migration to /[locale]/ with EN+ES content
Full rewrite of the docs site under app/[locale]/ with next-intl in localePrefix:"always" mode. Every page now exists at both /en/<path> and /es/<path>; the root / shows a meta-refresh + JS redirect to /<defaultLocale>/ so GitHub Pages serves something on the apex URL. Highlights: - 107 doc pages migrated to file-per-page JSON namespaces under messages/en/ and messages/es/. Spanish content is fully translated (no copy-of-English placeholders). - New documentation for the Active Suppressions section in the Settings tab and the per-event Dismiss dropdown in the Health Monitor modal. - New screenshots: dismiss-duration-dropdown.png and an updated health-suppression-settings.png. - Pagefind integrated for client-side search; index is built on every CI deploy (not committed). - RSS feeds: per-locale at /<locale>/rss.xml plus root /rss.xml for backward compat. - Removed the dead app/[locale]/guides/[slug]/ route — every guide now has its own static page and no markdown source remains. - Fixed orphan link /guides/nvidia -> /guides/nvidia-manual in docs/hardware/nvidia-host. - Removed obsolete components (footer2, calendar, drawer). Verified locally with `npm ci && npm run build`: 2804 files in out/, 231 pages indexed by pagefind, root redirect intact, both locale roots and the new Active Suppressions docs render OK.
This commit is contained in:
@@ -0,0 +1,396 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "Proxmox Dashboard Authentication — 2FA, API Tokens, User Profile, Reverse Proxy | ProxMenux Monitor",
|
||||
"description": "Reach and secure ProxMenux Monitor: first-launch protect-your-dashboard flow, password authentication with display name + avatar, profile page, TOTP 2FA enrolment, long-lived API tokens for scripts, HTTPS configuration, reverse-proxy snippets for Nginx, Caddy and Traefik, the audit log and the optional Fail2Ban jail.",
|
||||
"ogTitle": "Proxmox Dashboard Authentication — 2FA, API Tokens, Reverse Proxy",
|
||||
"ogDescription": "Secure ProxMenux Monitor with password + TOTP 2FA, long-lived API tokens, HTTPS, reverse-proxy snippets and an optional Fail2Ban jail.",
|
||||
"twitterTitle": "Proxmox Dashboard Authentication | ProxMenux Monitor",
|
||||
"twitterDescription": "Password + TOTP 2FA, API tokens, HTTPS, Nginx/Caddy/Traefik snippets and audit log."
|
||||
},
|
||||
"header": {
|
||||
"title": "Access & Authentication",
|
||||
"description": "Reaching the dashboard, the first-launch security flow, and every layer that can sit between an attacker and the host: password + TOTP, JWT sessions, long-lived API tokens, HTTPS, reverse proxies, Secure Gateway, and the optional Fail2Ban jail.",
|
||||
"section": "ProxMenux Monitor"
|
||||
},
|
||||
"intro": {
|
||||
"title": "Authentication is opt-in",
|
||||
"body": "On first launch the dashboard shows a single dialog — <em>\"Protect Your Dashboard?\"</em> — with two buttons: <strong>Yes, Setup Password</strong> and <strong>No, Continue Without Protection</strong>. Saying no leaves every API endpoint open on TCP 8008 — fine for an isolated lab LAN, dangerous on anything else. Two-Factor Authentication (2FA) is <strong>not</strong> part of this initial choice; it's configured later from <strong>the Security tab</strong> once a password is set."
|
||||
},
|
||||
"reaching": {
|
||||
"heading": "Reaching the dashboard",
|
||||
"intro": "ProxMenux Monitor binds to <code>0.0.0.0:8008</code>. There are three common ways to reach it:",
|
||||
"outro": "Direct access matches what the systemd unit ships out of the box. The reverse-proxy and Secure Gateway sections below cover the other two. The Monitor honours <code>X-Forwarded-For</code>, <code>X-Forwarded-Proto</code> and <code>X-Forwarded-Host</code> so URLs and CORS work behind any of them without manual configuration."
|
||||
},
|
||||
"firstLaunch": {
|
||||
"heading": "First-launch flow",
|
||||
"intro": "The first time you open the dashboard, the frontend calls <code>GET /api/auth/status</code>. If the auth config has never been written (<code>configured: false</code>), a single dialog appears titled <em>\"Protect Your Dashboard?\"</em> with two choices:",
|
||||
"imageAlt": "First-launch dialog 'Protect Your Dashboard?' with two buttons: Yes Setup Password, No Continue Without Protection",
|
||||
"imageCaption": "The first-launch authentication chooser. Two buttons — password protection or skip. Re-runs after a fresh install or after \"Disable authentication\" from Settings.",
|
||||
"headerButton": "Button",
|
||||
"headerWhat": "What happens",
|
||||
"headerApi": "API call",
|
||||
"rows": [
|
||||
{
|
||||
"button": "Yes, Setup Password",
|
||||
"what": "Opens a form with the mandatory username + password and an optional <em>display name</em> + <em>avatar image</em>. Stores them in <code>auth.json</code> with <code>enabled: true</code>. Returns a JWT so you're logged in immediately. The form is documented in detail below.",
|
||||
"api": "POST /api/auth/setup"
|
||||
},
|
||||
{
|
||||
"button": "No, Continue Without Protection",
|
||||
"what": "Marks <code>declined: true</code> in <code>auth.json</code>. Every API endpoint is publicly accessible until you change your mind from Settings.",
|
||||
"api": "POST /api/auth/skip"
|
||||
}
|
||||
],
|
||||
"twofaCalloutTitle": "2FA is configured later, not here",
|
||||
"twofaCalloutBody": "The first-launch dialog covers only the password decision. <strong>Two-Factor Authentication (TOTP)</strong> is set up afterwards from <strong>the Security tab</strong> once you're logged in with a password. The full TOTP walkthrough is further down this page.",
|
||||
"createTitle": "Creating the first user",
|
||||
"createIntro": "Clicking <em>Yes, Setup Password</em> opens a single form that creates the account and, optionally, seeds the user's profile in one go so the avatar appears in the header right after saving. The fields are:",
|
||||
"headerField": "Field",
|
||||
"headerRequired": "Required",
|
||||
"headerNotes": "Notes",
|
||||
"fieldRows": [
|
||||
{
|
||||
"field": "Username",
|
||||
"required": "Yes",
|
||||
"notes": "The login identifier. Cannot be changed later from the UI; editing it requires touching <code>auth.json</code> directly."
|
||||
},
|
||||
{
|
||||
"field": "Password",
|
||||
"required": "Yes",
|
||||
"notes": "Minimum 10 characters, with at least 3 of the 4 categories (lowercase, uppercase, digit, symbol). A short list of obvious passwords (<code>password</code>, <code>12345678</code>, <code>proxmenux</code>…) is rejected outright. The same rules are enforced server-side, so a curl call cannot bypass the front-end check."
|
||||
},
|
||||
{
|
||||
"field": "Display name",
|
||||
"required": "No",
|
||||
"notes": "Friendly label shown in the header dropdown and on the profile page. Falls back to the username when empty. Can be changed later from <strong>Avatar → View profile</strong>."
|
||||
},
|
||||
{
|
||||
"field": "Avatar image",
|
||||
"required": "No",
|
||||
"notes": "PNG, JPEG, WebP or GIF up to 2 MB. Rendered as a circle in the header and on the profile page. When empty, the header shows the first letter of the display name (or username) on a coloured circle. Can be uploaded, replaced or removed later from the profile page."
|
||||
}
|
||||
],
|
||||
"createImageAlt": "Create-user form with mandatory Username + Password fields and the optional Display name + Avatar upload section",
|
||||
"createImageCaption": "The first-launch create-user form. Display name and avatar are optional — leaving them empty creates the account and falls back to a single-letter circle in the header.",
|
||||
"saveCalloutTitle": "One save, three API calls under the hood",
|
||||
"saveCalloutBody": "The form submits <code>POST /api/auth/setup</code> first (username + password). On success it uses the freshly-issued JWT to follow up with <code>PUT /api/auth/profile</code> (display name) and <code>POST /api/auth/profile/avatar</code> (avatar bytes) if those fields were filled. Failures on the profile calls are non-fatal — the account is already created and you can finish the profile later from the dedicated page.",
|
||||
"avatarTitle": "Avatar menu and profile page",
|
||||
"avatarBody1": "Once authentication is configured, an avatar circle appears at the top-right of every dashboard page next to the theme toggle. Clicking it opens a small dropdown with shortcuts to the profile page and the Security tab, plus a <strong>Sign out</strong> action — closing the session can be done from here or from the Security tab, whichever is closer to where you are.",
|
||||
"avatarBody2": "The profile page itself is a small card with an avatar preview, the username (read-only), and the display name with an inline edit button. Avatar uploads, replacements and removals are atomic — the header avatar refreshes automatically when any of them succeed, so there is no need to reload the page. The same set of endpoints documented in the next section are used by both the create-user form and the profile page.",
|
||||
"profileImageAlt": "Profile page with avatar preview circle, Upload / Replace / Remove buttons, read-only username and editable display name field",
|
||||
"profileImageCaption": "The dedicated profile page. Username is read-only; display name and avatar can be edited from here without touching the Security tab.",
|
||||
"headerEndpoint": "Endpoint",
|
||||
"headerEpWhat": "What it does",
|
||||
"endpointRows": [
|
||||
{
|
||||
"endpoint": "GET /api/auth/profile",
|
||||
"what": "Returns the current username, display name and whether an avatar is set (<code>has_avatar</code>, <code>avatar_mtime</code>)."
|
||||
},
|
||||
{
|
||||
"endpoint": "PUT /api/auth/profile",
|
||||
"what": "Updates the display name. Body: <code>'{' \"display_name\": \"...\" '}'</code>."
|
||||
},
|
||||
{
|
||||
"endpoint": "GET /api/auth/profile/avatar",
|
||||
"what": "Returns the avatar bytes (PNG / JPEG / WebP / GIF) with the matching content type. Requires the Bearer header — the front-end fetches it as a blob and converts to a local object URL for rendering."
|
||||
},
|
||||
{
|
||||
"endpoint": "POST /api/auth/profile/avatar",
|
||||
"what": "Uploads a new avatar (max 2 MB). Content type must match the file. Old avatar is replaced atomically."
|
||||
},
|
||||
{
|
||||
"endpoint": "DELETE /api/auth/profile/avatar",
|
||||
"what": "Removes the avatar. The header falls back to the initial-on-coloured-circle placeholder."
|
||||
}
|
||||
],
|
||||
"reversibleTitle": "Continuing without protection is reversible — but only from the host",
|
||||
"reversibleBody": "Once you click <em>No, Continue Without Protection</em>, the welcome dialog never appears again. You can re-enable authentication from <strong>the Security tab</strong> inside the dashboard, or by editing <code>/root/.config/proxmenux-monitor/auth.json</code> and removing the <code>declined</code> flag, then restarting the service."
|
||||
},
|
||||
"password": {
|
||||
"heading": "Password authentication",
|
||||
"intro": "After Set up, every API call (except the few public endpoints listed below) requires a JWT in <code>Authorization: Bearer <token></code>:",
|
||||
"items": [
|
||||
"<strong>Session token (login):</strong> 24-hour expiration. Issued by <code>POST /api/auth/login</code>.",
|
||||
"<strong>API token (integrations):</strong> 365-day expiration. Issued by <code>POST /api/auth/generate-api-token</code>. Documented separately in the next section."
|
||||
],
|
||||
"loginImageAlt": "Login screen shown after authentication is configured — username and password fields",
|
||||
"loginImageCaption": "Once authentication is configured, every visit to the dashboard starts here. With 2FA enabled, the screen asks for the 6-digit code in a second step after the password is accepted.",
|
||||
"loginFlowTitle": "Login flow",
|
||||
"twofaIntro": "With 2FA enabled, the same call returns <code>requires_totp: true</code> first. Re-issue with the 6-digit code:",
|
||||
"publicTitle": "Public endpoints (no token)",
|
||||
"publicIntro": "These are the only endpoints that work without authentication, even when auth is enabled:",
|
||||
"publicItems": [
|
||||
"<code>/api/auth/login</code>, <code>/api/auth/status</code>, <code>/api/auth/setup</code> — the auth flow itself, by necessity.",
|
||||
"<code>/api/system-info</code> — lightweight system snapshot (hostname, uptime, <code>health.status</code>). The right endpoint for external probes (Uptime Kuma, load-balancer health checks, status pages)."
|
||||
],
|
||||
"cryptoTitle": "Cryptography and storage",
|
||||
"cryptoIntro": "ProxMenux Monitor is open source — none of this is secret. Documenting the stack here explicitly is a deliberate choice: operators who store credentials on their host deserve to know how those credentials are protected before they decide to trust them. The algorithms below are the same ones the code in <code>scripts/auth_manager.py</code> uses; this section is a contract, not a marketing promise.",
|
||||
"headerAsset": "Asset",
|
||||
"headerAlgo": "Algorithm",
|
||||
"headerWhere": "Where it lives",
|
||||
"cryptoRows": [
|
||||
{
|
||||
"asset": "Password",
|
||||
"algorithm": "PBKDF2-HMAC-SHA256 with a per-password random salt and a high iteration count (OWASP 2023+ baseline). Stored as <code>pbkdf2_sha256$<iters>$<salt>$<hash></code>.",
|
||||
"where": "<code>auth.json</code> → <code>password_hash</code>"
|
||||
},
|
||||
{
|
||||
"asset": "Session / API JWT",
|
||||
"algorithm": "HS256 signed with a per-install secret minted at first launch (<code>secrets.token_urlsafe</code>, ≥48 bytes). Tokens carry <code>iss=proxmenux-monitor</code> + <code>aud=api</code> claims; the signature is validated against the current secret on every request.",
|
||||
"where": "Secret: <code>auth.json</code> → <code>jwt_secret</code>. JWT itself: only on the client."
|
||||
},
|
||||
{
|
||||
"asset": "API token metadata",
|
||||
"algorithm": "SHA-256 of the JWT stored alongside a <code>signed_with</code> fingerprint of the <code>jwt_secret</code> used to mint it — used to display the token in the UI and to detect tokens whose signing secret has been rotated.",
|
||||
"where": "<code>auth.json</code> → <code>api_tokens[]</code>"
|
||||
},
|
||||
{
|
||||
"asset": "2FA TOTP secret",
|
||||
"algorithm": "Standard TOTP (RFC 6238) base32-encoded. Backup codes are pre-generated, single-use, hashed with the same PBKDF2 scheme as the password.",
|
||||
"where": "<code>auth.json</code> → <code>totp_secret</code> + <code>backup_codes[]</code>"
|
||||
},
|
||||
{
|
||||
"asset": "Revocations",
|
||||
"algorithm": "When a token or session is revoked, its SHA-256 is added to a deny-list checked on every verification (mem-cached for ~30 s to avoid disk reads on the hot path).",
|
||||
"where": "<code>auth.json</code> → <code>revoked_tokens[]</code>"
|
||||
}
|
||||
],
|
||||
"authJsonTitle": "auth.json — what it contains and how it's protected",
|
||||
"authJsonBody": "Everything ProxMenux Monitor needs to authenticate you lives in a single file: <code>/root/.config/proxmenux-monitor/auth.json</code>, mode <code>0600</code>, owner <code>root</code>. The file holds <em>hashes</em> (PBKDF2) and <em>signing material</em> (<code>jwt_secret</code>, <code>totp_secret</code>) — never a plaintext password. Treat it like any other root-only secret: if you back up or replicate the host, encrypt the destination, and never commit it to version control.",
|
||||
"rotateTitle": "Rotating jwt_secret invalidates all existing JWTs",
|
||||
"rotateBody": "If <code>auth.json</code> is regenerated (manual delete, reinstall, restore from a backup with a different secret) the <code>jwt_secret</code> changes and every previously-issued JWT — both interactive sessions and long-lived API tokens — fails verification with \"Invalid or expired token\". The UI flags affected API tokens with an <strong>Invalid — regenerate</strong> badge so the operator knows to revoke and re-mint them; Home Assistant / scripts / any external client needs a fresh token after that.",
|
||||
"recoverTitle": "Recovering a lost password",
|
||||
"recoverIntro": "There is no online \"forgot password\" flow — by design, since the dashboard runs on the operator's own host and the recovery path is shell access to that host. ProxMenux ships a guided reset inside the configuration menu so you don't have to hand-edit <code>auth.json</code>:",
|
||||
"survivesTitle": "What survives the reset",
|
||||
"survivesBody": "Only the interactive login is wiped. The <code>jwt_secret</code> and the registered <code>api_tokens</code> are preserved — so Home Assistant and any other script using a long-lived API token continue to work without reconfiguration. If you want a fully clean slate (also rotate the JWT secret), delete <code>auth.json</code> manually and restart the service. The next launch generates a fresh secret and all old tokens become invalid.",
|
||||
"physicalTitle": "Physical-access prerequisite",
|
||||
"physicalBody": "This reset path needs <strong>root shell on the host</strong>. That is the trust anchor of the whole authentication scheme: anyone who can run <code>menu</code> as root can already do anything on the box, so giving them password reset is not a privilege increase. The corollary: if you let an untrusted user reach the Proxmox shell, the Monitor login won't protect anything that user couldn't already destroy by other means."
|
||||
},
|
||||
"twofa": {
|
||||
"heading": "Two-Factor Authentication (TOTP)",
|
||||
"intro": "2FA adds a second factor on top of your password: a 6-digit code that rotates every 30 seconds, generated on a phone or password manager you control. Even if someone obtains the password, they still can't log in without the code from your device. ProxMenux Monitor implements the standard <strong>TOTP</strong> protocol (RFC 6238), so any authenticator app works.",
|
||||
"pickTitle": "Pick an authenticator app",
|
||||
"pickIntro": "If you already use one for Google / GitHub / your bank, that one will work — skip to the setup walkthrough. If not, here's a survey of common options. All of them are free; the differences are mainly about which platforms they run on and how (or whether) they back up your secrets.",
|
||||
"headerApp": "App",
|
||||
"headerPlatforms": "Platforms",
|
||||
"headerAppNotes": "Notes",
|
||||
"apps": [
|
||||
{
|
||||
"name": "Google Authenticator",
|
||||
"href": "https://safety.google/authentication/",
|
||||
"platforms": "iOS, Android",
|
||||
"notes": "The default for many users. Optional Google-account cloud backup."
|
||||
},
|
||||
{
|
||||
"name": "Microsoft Authenticator",
|
||||
"href": "https://www.microsoft.com/en-us/security/mobile-authenticator-app",
|
||||
"platforms": "iOS, Android",
|
||||
"notes": "Microsoft-account backup. Also handles MS push notifications if you use them at work."
|
||||
},
|
||||
{
|
||||
"name": "Authy",
|
||||
"href": "https://authy.com/",
|
||||
"platforms": "iOS, Android, desktop",
|
||||
"notes": "Multi-device encrypted sync (the desktop app is being phased out — check the latest status)."
|
||||
},
|
||||
{
|
||||
"name": "Apple Passwords",
|
||||
"href": "https://support.apple.com/guide/passwords/welcome/mac",
|
||||
"platforms": "iOS, iPadOS, macOS, visionOS, Windows (via iCloud)",
|
||||
"notes": "Built into Apple OSes; standalone Passwords app since iOS 18 / macOS Sequoia. Stores TOTP next to the password and syncs across devices via iCloud Keychain."
|
||||
},
|
||||
{
|
||||
"name": "Bitwarden",
|
||||
"href": "https://bitwarden.com/",
|
||||
"platforms": "iOS, Android, desktop, browser",
|
||||
"notes": "Open source. TOTP lives next to the password it protects (handy if you also use BW for passwords; defeats \"separate device\" if you don't)."
|
||||
},
|
||||
{
|
||||
"name": "1Password",
|
||||
"href": "https://1password.com/",
|
||||
"platforms": "iOS, Android, desktop, browser",
|
||||
"notes": "Same idea as Bitwarden — TOTP integrated with the password vault. Subscription."
|
||||
},
|
||||
{
|
||||
"name": "Aegis Authenticator",
|
||||
"href": "https://getaegis.app/",
|
||||
"platforms": "Android",
|
||||
"notes": "Open source. Encrypted on-device backup file you control. No cloud, no account required."
|
||||
},
|
||||
{
|
||||
"name": "Raivo OTP",
|
||||
"href": "https://raivo-otp.com/",
|
||||
"platforms": "iOS, macOS",
|
||||
"notes": "Open source. Optional iCloud sync. The Apple-ecosystem counterpart to Aegis."
|
||||
},
|
||||
{
|
||||
"name": "Ente Auth",
|
||||
"href": "https://ente.io/auth/",
|
||||
"platforms": "iOS, Android, desktop, web",
|
||||
"notes": "Open source. End-to-end encrypted cloud sync across devices."
|
||||
},
|
||||
{
|
||||
"name": "2FAS",
|
||||
"href": "https://2fas.com/",
|
||||
"platforms": "iOS, Android, browser extension",
|
||||
"notes": "Open source. Optional encrypted cloud backup; browser extension can autofill codes."
|
||||
},
|
||||
{
|
||||
"name": "FreeOTP+",
|
||||
"href": "https://github.com/helloworld1/FreeOTPPlus",
|
||||
"platforms": "Android, iOS",
|
||||
"notes": "Open source (Red Hat-led). Minimal — no cloud, no account."
|
||||
}
|
||||
],
|
||||
"backupTitle": "What \"backup\" really matters for",
|
||||
"backupBody": "If you lose the device that has the authenticator on it, the only ways back in are (1) a backup code saved when you enabled 2FA, or (2) a backup of the authenticator's vault. Apps with cloud sync (Google Auth, Microsoft Auth, Authy, Apple Passwords, Ente, 2FAS, Bitwarden, 1Password) can restore on a new device. Apps without cloud (Aegis, Raivo, FreeOTP+) need an encrypted export file you've copied somewhere safe. Either approach works — the bad case is \"no backup at all\".",
|
||||
"setupTitle": "Step-by-step setup from the dashboard",
|
||||
"setupImageAlt": "2FA setup screen with QR code and backup codes",
|
||||
"setupImageCaption": "The 2FA setup dialog — QR code, Base32 secret (for manual entry), and the ten one-time backup codes. The codes are only displayed here; if you close the dialog without copying them, they're gone.",
|
||||
"setupSteps": [
|
||||
"<strong>Install the authenticator app on your phone</strong> (or open your password manager). One of the apps from the table above. You only need to do this once — the same app will hold codes for every service you protect.",
|
||||
"<strong>Log into the dashboard</strong> with your username and password.",
|
||||
"<strong>Open the Security tab</strong> in the dashboard sidebar, then click <strong>Enable 2FA</strong>. A dialog opens with a QR code, a long string in Base32 format and ten short codes labelled \"backup codes\".",
|
||||
"<strong>Add the entry to the authenticator app:</strong>",
|
||||
"<strong>Save the backup codes.</strong> Copy the ten codes somewhere safe — a password manager, an encrypted note, a printed copy in a drawer. Treat them like spare keys: each works exactly once and gets you in if your phone is gone or broken.",
|
||||
"<strong>Confirm by typing the current 6-digit code</strong> from the app into the \"Verification code\" field of the setup dialog and submit. Codes refresh every 30 seconds, so if it expires while you're typing, just enter the next one.",
|
||||
"<strong>Done.</strong> 2FA is now active. Next time you log in, the dashboard asks for the password first; once it's accepted it asks for the current 6-digit code."
|
||||
],
|
||||
"setupStep4Sub": [
|
||||
"<em>Easy path:</em> in the app, tap <em>Add account</em> → <em>Scan QR code</em>, point the camera at the QR on the screen. The app names the entry automatically (something like <code>ProxMenux Monitor (your-username)</code>) and starts showing a 6-digit code that refreshes every 30 seconds.",
|
||||
"<em>Manual fallback</em> (when scanning isn't possible — e.g. setting up on the same phone you opened the dashboard with): tap <em>Add account</em> → <em>Enter setup key</em>. Type any name (e.g. <em>Proxmox Monitor</em>), paste the Base32 string from the dialog, leave <em>Type</em> as <em>Time-based</em>, save."
|
||||
],
|
||||
"testTitle": "Test before logging out",
|
||||
"testBody": "Once you click Save, log out and log back in <em>immediately</em>, while the setup dialog is still fresh in your mind. If the code is rejected (clock-skew between server and phone is the most common cause), you can still fix it from the open session. Logging out without testing first means a one-trip-no-return — at that point only a backup code or editing <code>auth.json</code> on the host gets you back in.",
|
||||
"lostTitle": "Lost authenticator",
|
||||
"lostIntro": "Three escape hatches, in order of how disruptive they are:",
|
||||
"lostItems": [
|
||||
"<strong>Use a backup code.</strong> At the login screen, in the TOTP field, type one of the ten codes you saved at setup time. Each works once and is then consumed; the remaining codes still work. Once you're in, regenerate 2FA from Settings to get a fresh ten.",
|
||||
"<strong>Restore the authenticator from cloud / backup.</strong> If your app has a cloud sync (Google, Microsoft, Authy, Apple Passwords via iCloud Keychain, Ente, 2FAS) install it on a new device, sign in, and the entries reappear. If your app uses an encrypted export file (Aegis, Raivo, FreeOTP+), install the app on the new device and import the file.",
|
||||
"<strong>Disable 2FA from the host shell.</strong> When the previous options aren't available, edit <code>/root/.config/proxmenux-monitor/auth.json</code> on the Proxmox host (you need root SSH or console access), set <code>totp_enabled</code> to <code>false</code>, save, and restart the service:"
|
||||
],
|
||||
"lostShellOutro": "You can log in with username + password only, then re-enable 2FA from Settings.",
|
||||
"disableTitle": "Disable 2FA",
|
||||
"disableBody": "From the dashboard, open the <strong>Security</strong> tab and click <strong>Disable 2FA</strong>. The endpoint <code>POST /api/auth/totp/disable</code> requires the current password as confirmation, then deletes the TOTP secret and clears the backup codes. Remember to also remove the entry in the authenticator app — the app doesn't know the server side is gone, so the dead entry will sit there forever otherwise.",
|
||||
"rejectedTitle": "The 6-digit code is always rejected",
|
||||
"rejectedIntro": "TOTP is time-based — server clock and phone clock must agree to within ~30 s. Two checks:",
|
||||
"rejectedItems": [
|
||||
"<strong>Phone:</strong> Settings → Date & Time → automatic / network sync ON.",
|
||||
"<strong>Proxmox host:</strong> <code>timedatectl status</code> — \"System clock synchronized: yes\" should be visible. If not, <code>timedatectl set-ntp true</code> and wait a minute."
|
||||
],
|
||||
"rejectedOutro": "Once both clocks agree, the code is accepted within the next 30-second window."
|
||||
},
|
||||
"apiTokens": {
|
||||
"heading": "API tokens (long-lived)",
|
||||
"intro": "Browser sessions expire after 24 hours. For unattended integrations (Homepage widgets, Home Assistant sensors, Grafana scrapers, Uptime Kuma probes…) you generate a separate <strong>API token</strong> that lives 365 days. The token is a JWT signed with the same secret as the session token, but its <code>token_name</code> claim makes it easy to track and revoke individually.",
|
||||
"imageAlt": "API tokens panel showing the token list with name, prefix, created date and expiry",
|
||||
"imageCaption": "The API tokens list under Settings — name, prefix (last 4 chars are shown for identification), created and expiry dates, revoke action.",
|
||||
"generateTitle": "Generate a token",
|
||||
"generateIntro": "From the dashboard:",
|
||||
"generateSteps": [
|
||||
"Navigate to <strong>the Security tab → API Access Tokens</strong> section.",
|
||||
"Type a descriptive name (<em>e.g. \"Home Assistant\"</em>).",
|
||||
"Re-enter your password. If 2FA is on, also the current 6-digit code.",
|
||||
"Click <strong>Generate Token</strong>. The token appears <strong>once</strong> — copy it immediately."
|
||||
],
|
||||
"generateCli": "From the command line:",
|
||||
"useTitle": "Use a token",
|
||||
"revokeTitle": "Revoke a token",
|
||||
"revokeBody": "From the panel above: each row has a <strong>Revoke</strong> action that adds the token hash to <code>revoked_tokens</code> in <code>auth.json</code>. Revoked tokens fail validation immediately on the next request.",
|
||||
"cheatTitle": "Token security cheat-sheet",
|
||||
"cheatItems": [
|
||||
"Store tokens in your integration's native secrets store — Homepage <code>secrets.yaml</code>, Home Assistant <code>!secret</code>, environment variables, etc. Never commit them to git.",
|
||||
"One token per integration, named after the consumer. Revoke individually when retiring an integration.",
|
||||
"Rotate every 6–12 months. The expiry is a hard limit, not a recommendation."
|
||||
],
|
||||
"outro": "Full storage best-practices and integration recipes live in <apiLink>API Reference → Token Management</apiLink> and <intLink>Integrations</intLink>."
|
||||
},
|
||||
"https": {
|
||||
"heading": "HTTPS",
|
||||
"intro": "Two paths to TLS:",
|
||||
"items": [
|
||||
"<strong>Reverse proxy (recommended).</strong> Terminate TLS on Nginx / Caddy / Traefik and forward HTTP on port 8008 to the Flask process. Snippets below.",
|
||||
"<strong>Direct HTTPS in the AppImage.</strong> Configure a certificate via <code>POST /api/ssl/configure</code> (UI: <strong>Settings → SSL</strong>). When SSL is configured the process switches from Flask's dev server to <code>gevent.pywsgi</code> with the gevent-websocket handler so WebSocket terminal also works over WSS. The cert files live wherever you point them; the paths are stored in the SSL config."
|
||||
],
|
||||
"calloutTitle": "Direct HTTPS limitations",
|
||||
"calloutBody": "The bundled gevent path is suitable for self-signed or LAN-only certificates. For Let's Encrypt / ACME and automatic renewal, run a real reverse proxy in front — Caddy auto-renews and Traefik / Nginx have well-known patterns. The Monitor doesn't implement ACME on its own."
|
||||
},
|
||||
"gateway": {
|
||||
"heading": "Secure Gateway (Tailscale)",
|
||||
"intro": "Reverse proxies are the classic answer to \"reach the dashboard from outside\" but they require a public domain, certificate, and an open port on the edge. <strong>Secure Gateway</strong> is the zero-port alternative shipped inside the Monitor itself — a pre-built deployable app that spins up an Alpine LXC running <a>Tailscale</a> as a subnet router. Once joined to your tailnet, every device on it can hit the Monitor at the host's own LAN IP — from a laptop on holiday, a phone on 5G, or another node — without exposing TCP 8008 to the internet.",
|
||||
"calloutTitle": "Why this is convenient",
|
||||
"calloutBody": "The URL stays the same as on the LAN — <code>http://<proxmox-lan-ip>:8008</code> works everywhere Tailscale works. No certificates, no DNS, no port forwarding. The Monitor itself sees the request as coming from a tailnet IP (typically <code>100.x.y.z</code>), so the auth log and the Fail2Ban hook still function as on the LAN.",
|
||||
"deployBody": "The deploy flow is one screen — pick the host LXC storage, paste a Tailscale auth-key (generated at <a>login.tailscale.com/admin/settings/keys</a>), choose which subnets to advertise, click Deploy. The LXC takes ~30 seconds to bootstrap and registers in the tailnet automatically.",
|
||||
"outro": "Step-by-step deployment, subnet-routes configuration, Tailscale ACLs and Exit Node mode are documented separately in <link>Dashboard → Security → Secure Gateway</link> — that's where the deploy wizard lives in the dashboard UI. This page only covers the access pattern."
|
||||
},
|
||||
"proxy": {
|
||||
"heading": "Reverse proxy snippets",
|
||||
"intro": "The simplest layout is a <strong>dedicated host name</strong> for the Monitor (e.g. <code>monitor.example.com</code>) pointing at port 8008 on the Proxmox host. Snippets below use that pattern. Sub-path mounts (<code>example.com/proxmenux-monitor/</code>) are possible but require extra rewriting and are not the default — see the callout at the end.",
|
||||
"nginxTitle": "Nginx",
|
||||
"caddyTitle": "Caddy",
|
||||
"traefikTitle": "Traefik (labels — Docker / Kubernetes)",
|
||||
"subPathTitle": "Advanced: sub-path mounts under an existing domain",
|
||||
"subPathBody": "If you don't want a dedicated host name, you can mount the Monitor under a path on an existing domain — for example <code>example.com/proxmenux-monitor/</code>. The Next.js build uses relative asset paths so static files resolve, but the proxy must <strong>strip the prefix</strong> before forwarding so the Monitor still receives plain <code>/api/*</code> URLs. On Nginx that's a <code>location /proxmenux-monitor/ } proxy_pass http://127.0.0.1:8008/; {</code> (the trailing slash on <code>proxy_pass</code> does the strip). On Caddy, use <code>handle_path /proxmenux-monitor/*</code>. A dedicated host name is simpler."
|
||||
},
|
||||
"audit": {
|
||||
"heading": "Audit log",
|
||||
"intro": "Every authentication event (success and failure) is appended to <code>/var/log/proxmenux-auth.log</code> in a single line, syslog-style format:",
|
||||
"outro": "Tail it the usual way: <code>tail -F /var/log/proxmenux-auth.log</code>. The file is rotated by <code>logrotate</code> if a config drop-in is added; the Monitor itself does not rotate it."
|
||||
},
|
||||
"fail2ban": {
|
||||
"heading": "Optional: Fail2Ban jail",
|
||||
"calloutTitle": "Fail2Ban is not bundled with the Monitor",
|
||||
"calloutBody": "Fail2Ban is <strong>not</strong> installed by ProxMenux Monitor itself. Install it via <link>Security → Fail2Ban</link> in the ProxMenux menu (or with the standard Debian package). Without it, the Monitor still writes the audit log above — it just doesn't auto-ban repeat offenders.",
|
||||
"intro": "When Fail2Ban is installed, the ProxMenux integration ships a <code>[proxmenux]</code> jail that:",
|
||||
"items": [
|
||||
"Reads <code>/var/log/proxmenux-auth.log</code>.",
|
||||
"Matches the <code>authentication failure; rhost=<ip></code> pattern with a dedicated filter.",
|
||||
"Bans the offending IP at the kernel firewall level by default.",
|
||||
"Is queried by the Flask <code>before_request</code> hook every 30 s — so even when the firewall can't block (because the connection comes from the reverse proxy), the application returns HTTP 403 to banned IPs based on what Fail2Ban knows."
|
||||
],
|
||||
"outro": "Configuration, ban time tuning and unban procedures are in <link>Security → Fail2Ban</link>."
|
||||
},
|
||||
"troubleshoot": {
|
||||
"heading": "Troubleshooting",
|
||||
"noScreenTitle": "The first-launch screen never appears",
|
||||
"noScreenBody": "Either auth is already configured (<code>configured: true</code>) or somebody already chose Skip. To start fresh:",
|
||||
"noScreenOutro": "This wipes the auth state — also any TOTP secrets and API tokens. Back up <code>auth.json</code> first if you have tokens you want to keep.",
|
||||
"tokenTitle": "HTTP 401 on every request from a working API token",
|
||||
"tokenBody": "Token expired (365 d limit) or got into the <code>revoked_tokens</code> list. Generate a new one in Settings and update the integration. To check:",
|
||||
"tokenOutro": "Expired or revoked tokens return <code>'{'\"error\":\"Invalid or expired token\"'}'</code>.",
|
||||
"no2faTitle": "Can't log in after enabling 2FA, no authenticator at hand",
|
||||
"no2faBody": "Use a backup code in the TOTP field. If those are gone, edit <code>/root/.config/proxmenux-monitor/auth.json</code> from a host shell, set <code>totp_enabled</code> to <code>false</code>, restart the service.",
|
||||
"wsTitle": "Reverse proxy works but the terminal tab disconnects every minute",
|
||||
"wsBody": "WebSocket idle timeout in the proxy. Bump the read timeout (Nginx: <code>proxy_read_timeout 86400s</code>; Traefik: <code>idleTimeout</code> in the entry-point or middleware) and confirm <code>proxy_set_header Upgrade $http_upgrade</code> and <code>Connection \"upgrade\"</code> are present."
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "API Reference → Token Management",
|
||||
"href": "/docs/monitor/api",
|
||||
"tail": " — full lifecycle of API tokens (generate / list / revoke), security best-practices, secrets storage patterns."
|
||||
},
|
||||
{
|
||||
"label": "Integrations",
|
||||
"href": "/docs/monitor/integrations",
|
||||
"tail": " — Homepage, Home Assistant, Grafana / Prometheus, Uptime Kuma, generic cURL."
|
||||
},
|
||||
{
|
||||
"label": "Dashboard → Security → Secure Gateway",
|
||||
"href": "/docs/monitor/dashboard/security",
|
||||
"tail": " — deploy the Tailscale gateway LXC step by step (subnet routes, ACLs, Exit Node mode)."
|
||||
},
|
||||
{
|
||||
"label": "Security → Fail2Ban",
|
||||
"href": "/docs/security/fail2ban",
|
||||
"tail": " — how to install and configure the optional jail."
|
||||
},
|
||||
{
|
||||
"label": "Settings → ProxMenux Monitor",
|
||||
"href": "/docs/settings/proxmenux-monitor",
|
||||
"tail": " — start / stop the systemd service from the ProxMenux TUI."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,342 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "Proxmox AI Assistant — OpenAI, Claude, Gemini, Groq, Ollama | ProxMenux Monitor",
|
||||
"description": "ProxMenux Monitor's optional AI Assistant rewrites Proxmox VE notification bodies in plain language. Six providers supported: OpenAI, Anthropic Claude, Google Gemini, Groq, OpenRouter and local Ollama. Twelve languages, three detail levels per channel, custom prompt mode with a community library, and a context-enrichment layer that adds uptime, recurrence and SMART data to disk events.",
|
||||
"ogTitle": "Proxmox AI Assistant — OpenAI, Claude, Gemini, Groq, Ollama",
|
||||
"ogDescription": "Rewrite Proxmox VE notifications in plain language with OpenAI, Anthropic Claude, Google Gemini, Groq, OpenRouter or local Ollama.",
|
||||
"twitterTitle": "Proxmox AI Assistant | ProxMenux Monitor",
|
||||
"twitterDescription": "Rewrite Proxmox VE notifications in plain language with OpenAI, Anthropic, Gemini, Groq, OpenRouter or local Ollama."
|
||||
},
|
||||
"header": {
|
||||
"title": "AI Assistant",
|
||||
"description": "The opt-in rewriter that runs every outbound notification through an LLM before fan-out — turning structured templates into plain-language messages, with per-channel detail levels, twelve languages, a custom prompt mode with a public community library, and a context-enrichment layer that adds uptime, recurrence, SMART data and known-error matches to the prompt.",
|
||||
"section": "ProxMenux Monitor"
|
||||
},
|
||||
"intro": {
|
||||
"title": "Off by default, single switch to enable",
|
||||
"body": "AI rewrite is opt-in. Until you flip the master toggle inside <em>Settings → Notifications → Advanced: AI Enhancement</em>, every event is dispatched with the original templated body. When the toggle is on and a provider call fails or times out, the dispatcher falls back to the templated body silently — your notifications never block on the LLM."
|
||||
},
|
||||
"howItWorks": {
|
||||
"heading": "How it works",
|
||||
"intro": "Every event the Monitor dispatches goes through the same pipeline. The AI rewriter is one optional stage that sits between the templated body and the channel send. End to end, a single event walks through four steps:",
|
||||
"steps": [
|
||||
"<strong>Event + template.</strong> An event arrives from one of the six collectors and is rendered into a structured plain-text body by <code>notification_templates.py</code>. This is the body the channel would send if AI rewrite were off.",
|
||||
"<strong>Context enrichment.</strong> The dispatcher inspects the event and conditionally appends the relevant extra signals — system uptime (only for critical system failures), event frequency, SMART data (only for disk events), Known Errors database matches and journal log lines.",
|
||||
"<strong>Prompt builder.</strong> The system prompt is assembled from the template plus the per-channel settings: target language, detail level, emoji rules from <em>Rich messages</em>, and the AI Suggestions addon if enabled. In Custom prompt mode, your prompt replaces the system prompt entirely. The user message is built from the templated body plus the enriched context blocks.",
|
||||
"<strong>Provider call.</strong> The configured provider (Groq, OpenAI, Anthropic, Gemini, Ollama or OpenRouter) returns a rewritten title and body. The dispatcher parses the response, replaces the original title and body for that channel, and hands it off to the channel adapter for delivery."
|
||||
],
|
||||
"notesIntro": "Three details worth holding on to before reading the rest of this page:",
|
||||
"notes": [
|
||||
"<strong>The AI does not produce events.</strong> Every event is born from a real signal (Health Monitor scan, journal line, PVE webhook, etc.) and is rendered into a templated body before the AI ever sees it. The AI is a translator and re-formatter, not a watcher.",
|
||||
"<strong>The AI runs per-channel.</strong> Telegram and Discord can use a brief rewrite while Email gets a detailed report — same event, different shape, all from one provider call per channel.",
|
||||
"<strong>Failure is silent.</strong> If the provider 5xx's, times out, returns malformed output or rejects the request, the dispatcher logs the error and falls back to the original templated body for that channel. You never lose a notification because the LLM had a bad day."
|
||||
]
|
||||
},
|
||||
"enabling": {
|
||||
"heading": "Enabling the panel",
|
||||
"intro": "The AI configuration lives at the bottom of the Notifications panel inside the Settings tab, as a collapsible <em>Advanced: AI Enhancement</em> block. Click the header to expand:",
|
||||
"collapsedAlt": "Collapsed Advanced AI Enhancement header with chevron indicator",
|
||||
"collapsedCaption": "Collapsed by default — one click expands the panel.",
|
||||
"panelAlt": "Expanded AI Enhancement panel showing AI-Enhanced Messages master toggle, Provider Google Gemini, API Key masked, Model gemini-2.5-flash, Prompt Mode Default, Language English, Detail Level per Channel for Telegram Discord Gotify Email, AI Suggestions toggle and Test Connection button",
|
||||
"panelCaption": "The full AI Enhancement panel — every control documented in this page maps to one of the fields above.",
|
||||
"outro": "Top to bottom, the panel exposes: the <em>AI-Enhanced Messages</em> master toggle, the provider selector with an information modal next to it, the API key input (or Ollama URL for local mode), the model dropdown (loaded from the provider after entering the key), the prompt mode (<em>Default</em> / <em>Custom</em>), the output language, the per-channel detail level, the <em>AI Suggestions</em> opt-in, and a <em>Test Connection</em> button that sends a probe message to the provider to validate the credentials."
|
||||
},
|
||||
"context": {
|
||||
"heading": "What context the AI receives",
|
||||
"intro": "Before the prompt is built, the dispatcher walks a context-enrichment routine that decides which extra signals are relevant to the event at hand. The aim is to give the LLM enough information to produce a useful message, without flooding it (and your wallet) with noise that doesn't apply. Five context blocks can be added to the user message:",
|
||||
"headerBlock": "Block",
|
||||
"headerWhen": "When it's injected",
|
||||
"headerWhat": "What it carries",
|
||||
"rows": [
|
||||
{
|
||||
"block": "System uptime",
|
||||
"when": "Only for critical system-level failures: <code>crash</code>, <code>panic</code>, <code>oom</code>, <code>kernel</code>, <code>split_brain</code>, <code>quorum_lost</code>, <code>node_offline</code>, <code>node_fail</code>, <code>system_fail</code>, <code>boot_fail</code>. Skipped for disk errors, warnings and routine operations to keep the prompt tight.",
|
||||
"what": "A line such as <code>System uptime: 14 days (stable system)</code>. Lets the LLM distinguish startup issues from long-running failures."
|
||||
},
|
||||
{
|
||||
"block": "Event frequency",
|
||||
"when": "Always, when the Monitor has seen the same fingerprint before.",
|
||||
"what": "Occurrence count, first-seen timestamp, optional pattern label (recurring / one-off / spike). The LLM uses this to phrase \"recurring issue\" vs \"first time seen\"."
|
||||
},
|
||||
{
|
||||
"block": "SMART data",
|
||||
"when": "Only for disk-related events (event type contains <code>disk</code>, <code>smart</code>, <code>storage</code>, <code>io_error</code>, or the body mentions <code>/dev/sd</code>, <code>ata</code>, <code>i/o error</code>).",
|
||||
"what": "Output of <code>smartctl</code> for the affected device — overall health (PASSED / FAILED) plus the relevant attributes for the failure mode."
|
||||
},
|
||||
{
|
||||
"block": "Known errors DB",
|
||||
"when": "When the body or journal context matches a Proxmox-specific error pattern shipped with the Monitor.",
|
||||
"what": "A <code>KNOWN PROXMOX ERROR DETECTED</code> block with the matched cause and a concrete solution. The prompt instructs the LLM to translate this verbatim — no paraphrasing of the recommended fix."
|
||||
},
|
||||
{
|
||||
"block": "Journal logs",
|
||||
"when": "Whenever the originating collector captured journal lines for the event (mostly the journal watcher and the task watcher).",
|
||||
"what": "Raw <code>journalctl</code> excerpts. The prompt tells the LLM to extract IDs, timestamps and root-cause hints, and to ignore unrelated entries."
|
||||
}
|
||||
],
|
||||
"afterBlocks": "Once these blocks are joined, the user message sent to the LLM has this shape:",
|
||||
"calloutTitle": "No telemetry beyond the event itself",
|
||||
"calloutBody": "The Monitor only sends what it has on hand for the current event — no system-wide telemetry, no historical metric series, no inventory dumps. The five blocks above are the upper bound of what leaves the host on a single AI rewrite."
|
||||
},
|
||||
"tokens": {
|
||||
"heading": "Tokens — what they are and how they're consumed",
|
||||
"intro1": "Every commercial provider charges per <em>token</em>, so it's worth understanding what a token is before picking a plan. A token is roughly four characters of English text or about three quarters of a word. The phrase <em>\"Backup completed on storage local-bak\"</em> is around eight tokens. A short journal excerpt of ten lines can be 200-400 tokens depending on the technical density.",
|
||||
"intro2": "Two things are billed on every call:",
|
||||
"items": [
|
||||
"<strong>Input tokens</strong> — the system prompt plus the user message (severity, title, body, enriched context). For ProxMenux the system prompt alone is on the order of 1.5-2 KB (≈ 400-500 tokens) and the user message varies from 50 tokens (a clean backup-complete) to ~1500 tokens (a disk error with 30 lines of journal context).",
|
||||
"<strong>Output tokens</strong> — what the model writes back. The Monitor caps this with <code>max_tokens</code> (see the table below). The cap is a <em>limit</em>, not a charge: if the model produces 250 tokens with a cap of 1500, you pay for 250."
|
||||
],
|
||||
"capsIntro": "These are the actual caps the dispatcher applies, taken straight from <code>AI_DETAIL_TOKENS</code> in <code>notification_templates.py</code>:",
|
||||
"headerLevel": "Detail level",
|
||||
"headerCap": "Output cap (tokens)",
|
||||
"headerConsumption": "Typical real consumption",
|
||||
"capRows": [
|
||||
{
|
||||
"level": "brief",
|
||||
"cap": "500",
|
||||
"consumption": "50-200 output tokens for short events."
|
||||
},
|
||||
{
|
||||
"level": "standard",
|
||||
"cap": "1500",
|
||||
"consumption": "200-700 output tokens for typical events with light context."
|
||||
},
|
||||
{
|
||||
"level": "detailed",
|
||||
"cap": "3000",
|
||||
"consumption": "500-2000 output tokens for full email reports with logs and SMART tables."
|
||||
}
|
||||
],
|
||||
"customNote": "Custom prompt mode uses a fixed cap of 500 output tokens regardless of detail level — the custom prompt is in your control and the cap protects against runaway responses.",
|
||||
"sizingTitle": "Practical sizing",
|
||||
"sizingBody": "A homelab with 50-100 events per day on <code>standard</code> typically consumes a few thousand tokens per day. With the free tiers offered by Groq and Gemini, that fits without touching a paid plan. With OpenAI or Anthropic, billed per-token, the cost lands in the cents-per-month range for that volume. If your event count is much higher, the <link>Detail level per channel</link> section explains how to keep chat channels on <code>brief</code> while letting Email take the full report."
|
||||
},
|
||||
"providers": {
|
||||
"heading": "AI providers",
|
||||
"intro": "Six providers are wired into the Monitor. The provider dropdown in the UI shows them all; an information button next to it opens a modal with a one-line description for each. Below is the full reference, with the URL for getting an API key, the description shown in the UI and the relevant notes from the codebase.",
|
||||
"imageAlt": "AI Providers Information modal listing the six supported providers — Groq, OpenAI, Anthropic Claude, Google Gemini, Ollama, OpenRouter — each with its icon and one-line description, and a special OpenAI-Compatible APIs note for OpenAI",
|
||||
"imageCaption": "The in-app modal — six cards, one per provider, with the same descriptions documented below.",
|
||||
"groq": {
|
||||
"heading": "Groq",
|
||||
"tagline": "Very fast, generous free tier (30 req/min). Ideal to start.",
|
||||
"items": [
|
||||
"API key: <a>console.groq.com/keys</a>",
|
||||
"Verified models: <code>llama-3.3-70b-versatile</code>, <code>llama-3.1-70b-versatile</code>, <code>llama-3.1-8b-instant</code>, <code>llama3-70b-8192</code>, <code>llama3-8b-8192</code>, <code>mixtral-8x7b-32768</code>, <code>gemma2-9b-it</code>.",
|
||||
"Recommended: <strong><code>llama-3.3-70b-versatile</code></strong> — best quality at full Groq inference speed."
|
||||
]
|
||||
},
|
||||
"openai": {
|
||||
"heading": "OpenAI",
|
||||
"tagline": "Industry standard. Very accurate and widely used.",
|
||||
"items": [
|
||||
"API key: <a>platform.openai.com/api-keys</a>",
|
||||
"Verified models: <code>gpt-4.1-nano</code>, <code>gpt-4.1-mini</code>, <code>gpt-4o-mini</code>, <code>gpt-4.1</code>, <code>gpt-4o</code>, <code>gpt-5-chat-latest</code>, <code>gpt-5.4-nano</code>, <code>gpt-5.4-mini</code>.",
|
||||
"Recommended: <strong><code>gpt-4.1-nano</code></strong> — the cheapest member of the chat family, sufficient quality for translation and re-formatting. Reasoning models (o-series, gpt-5 non-chat) are supported by the provider plumbing but kept off the verified list: higher latency without measurable quality gain on this workload."
|
||||
],
|
||||
"baseUrlTitle": "OpenAI-compatible base URL",
|
||||
"baseUrlBody": "The OpenAI provider also accepts a custom <em>Base URL</em>, which lets you point the Monitor at any OpenAI-compatible endpoint. Confirmed to work with <strong>BytePlus / ByteDance (Kimi K2.5)</strong>, <strong>LocalAI</strong>, <strong>LM Studio</strong>, <strong>vLLM</strong>, <strong>Together AI</strong>, <strong>Fireworks AI</strong>, and any other service that speaks the OpenAI <code>/v1/chat/completions</code> dialect. Set the URL in the OpenAI tab next to the API key field."
|
||||
},
|
||||
"anthropic": {
|
||||
"heading": "Anthropic (Claude)",
|
||||
"tagline": "Excellent for writing and translation. Fast and economical.",
|
||||
"items": [
|
||||
"API key: <a>console.anthropic.com/settings/keys</a>",
|
||||
"Verified models: <code>claude-3-5-haiku-latest</code>, <code>claude-3-5-sonnet-latest</code>, <code>claude-3-opus-latest</code>.",
|
||||
"Recommended: <strong><code>claude-3-5-haiku-latest</code></strong> — Claude's smallest, fastest model with strong language quality for the translation workload."
|
||||
]
|
||||
},
|
||||
"gemini": {
|
||||
"heading": "Google Gemini",
|
||||
"tagline": "Free tier available, great quality/price ratio.",
|
||||
"items": [
|
||||
"API key: <a>aistudio.google.com/app/apikey</a>",
|
||||
"Verified models: <code>gemini-2.5-flash-lite</code>, <code>gemini-2.5-flash</code>, <code>gemini-3-flash-preview</code>.",
|
||||
"Recommended: <strong><code>gemini-2.5-flash-lite</code></strong> — flash and flash-lite pass the verifier consistently. The pro variants reject the <code>thinkingBudget=0</code> setting the Monitor uses and are overkill for this workload."
|
||||
]
|
||||
},
|
||||
"openrouter": {
|
||||
"heading": "OpenRouter",
|
||||
"tagline": "Aggregator with access to 100+ models using a single API key. Maximum flexibility.",
|
||||
"items": [
|
||||
"API key: <a>openrouter.ai/keys</a>",
|
||||
"Verified models: <code>meta-llama/llama-3.3-70b-instruct</code>, <code>meta-llama/llama-3.1-70b-instruct</code>, <code>meta-llama/llama-3.1-8b-instruct</code>, <code>anthropic/claude-3.5-haiku</code>, <code>anthropic/claude-3.5-sonnet</code>, <code>google/gemini-flash-1.5</code>, <code>openai/gpt-4o-mini</code>, <code>mistralai/mistral-7b-instruct</code>, <code>mistralai/mixtral-8x7b-instruct</code>.",
|
||||
"Recommended: <strong><code>meta-llama/llama-3.3-70b-instruct</code></strong> — same model as the Groq entry but routed through OpenRouter, which means a single key for all the listed models."
|
||||
]
|
||||
},
|
||||
"ollama": {
|
||||
"heading": "Ollama (Local)",
|
||||
"tagline": "Uses models available on your Ollama server. 100% local, no costs, total privacy.",
|
||||
"items": [
|
||||
"No API key. Set the <em>Ollama URL</em> field to your server (default <code>http://localhost:11434</code> or whatever host runs your Ollama instance).",
|
||||
"Models: <strong>not filtered.</strong> The Monitor reads whichever models you have pulled on the Ollama side via <code>ollama pull <model></code>. The dropdown is populated from <code>GET /api/tags</code> on your Ollama server.",
|
||||
"Install: <a>ollama.com/download</a> — runs on Linux, macOS, Windows. For best results pick a model that fits in RAM with a large enough context window for the journal blocks the dispatcher injects."
|
||||
]
|
||||
}
|
||||
},
|
||||
"models": {
|
||||
"heading": "Why these specific models",
|
||||
"intro": "The model dropdown for each commercial provider is populated from a curated list shipped with the Monitor (<code>verified_ai_models.json</code>). Models on this list have been tested end-to-end with the chat / completions API format the Monitor uses, with the exact <code>system_prompt + user_message + max_tokens</code> shape the AI Enhancer sends. The list is refreshed before each ProxMenux release with a private verifier tool that probes every candidate model and prunes the ones that misbehave.",
|
||||
"consequencesIntro": "Two consequences worth being aware of:",
|
||||
"consequences": [
|
||||
"<strong>The recommended model</strong> for each provider is the one that has the best balance of quality, latency and cost for notification translation specifically — not the most capable model the provider sells. Notification rewrites don't need frontier-model reasoning; they need fast and cheap.",
|
||||
"<strong>You can still pick another verified model</strong> from the dropdown — sometimes you have a free-tier quota you want to spend on a particular model, or you have a strong preference. Pick any of the listed entries; they've all passed the verifier."
|
||||
],
|
||||
"ollamaTitle": "Ollama is the exception",
|
||||
"ollamaBody": "Ollama models are local and the Monitor doesn't filter them — the dropdown reflects whatever you have pulled. Pick a model in the 7B-13B range with at least an 8K context window for the AI rewrite to behave reasonably with the journal context blocks."
|
||||
},
|
||||
"defaultPrompt": {
|
||||
"heading": "Default prompt",
|
||||
"intro": "With prompt mode set to <em>Default</em>, the Monitor uses the system prompt below. The prompt is templated at runtime: <code>'{'language'}'</code>, <code>'{'detail_level'}'</code>, <code>'{'emoji_instructions'}'</code> and <code>'{'suggestions_addon'}'</code> are replaced before the call. The variants for rich vs plain channels and the AI Suggestions addon are shown immediately after.",
|
||||
"showFullSummary": "Show full default system prompt",
|
||||
"passagesIntro": "Two passages in the prompt above are placeholders that get swapped depending on the per-channel <em>Rich messages</em> toggle:",
|
||||
"passages": [
|
||||
"<strong>Rich on</strong> → an emoji block is injected listing the icons the LLM may use, plus a hostname rule (the LLM must keep the hostname prefix from the title verbatim) and a handful of formatted examples (backup start, backup complete, updates, VM start, health degraded). This is what produces the emoji-prefixed messages on Telegram and Discord.",
|
||||
"<strong>Rich off</strong> → a one-line block tells the LLM to use plain ASCII only — no emojis, no Unicode symbols. Used for email and any channel where formatting noise hurts inbox rules or readability."
|
||||
],
|
||||
"suggestionsPlaceholder": "And the <code>'{'suggestions_addon'}'</code> placeholder is empty unless you enable AI Suggestions (next section), in which case this block gets injected:",
|
||||
"showAddonSummary": "Show AI Suggestions addon"
|
||||
},
|
||||
"customPrompt": {
|
||||
"heading": "Custom prompt mode",
|
||||
"intro": "Switching the prompt mode to <em>Custom</em> swaps the entire default system prompt for one you write yourself. The custom prompt is stored in the Monitor's SQLite settings and sent verbatim on every AI rewrite call. It's the right escape hatch when you want a completely different voice, structure or focus than the bundled prompt offers.",
|
||||
"imageAlt": "Custom Prompt mode showing a textarea with translation rules and Export Import buttons plus a Community prompts link",
|
||||
"imageCaption": "Custom Prompt — large textarea with the user's prompt, plus <em>Export</em>, <em>Import</em> and a link to the community gallery on GitHub.",
|
||||
"changesTitle": "What changes when Custom is on",
|
||||
"changes": [
|
||||
"<strong>The default prompt is replaced entirely.</strong> The Proxmox mappings, the context-handling rules and the emoji instructions are all gone. If you want to keep any of them, paste them into your prompt — the bundled <code>EXAMPLE_CUSTOM_PROMPT</code> shown below is a starting point.",
|
||||
"<strong>The Language selector is ignored.</strong> The default prompt has a <code>'{'language'}'</code> placeholder; the custom prompt does not. If you want output in a specific language, declare it inside your prompt (\"Translate to Spanish\", \"Output everything in French\").",
|
||||
"<strong>Detail level still applies</strong> in the sense that it's available as a setting per channel, but the cap on output tokens becomes a fixed 500 in custom mode (vs the 500 / 1500 / 3000 ramp of the default prompt). If your custom prompt asks for a long report, raise the cap by editing the prompt or split the request.",
|
||||
"<strong>The Output Format markers are still mandatory.</strong> The Monitor parses the response by looking for <code>[TITLE]</code> and <code>[BODY]</code> on their own lines. A custom prompt that doesn't emit those markers will break the parser and fall back to the original templated body.",
|
||||
"<strong>Rich messages emoji rules are not auto-injected.</strong> If you want emojis, tell the prompt to use them. If you want plain text, tell it not to. The toggle only gates the bundled blocks of the default prompt, not your custom string."
|
||||
],
|
||||
"starterTitle": "Starter prompt",
|
||||
"starterIntro": "The <em>Custom Prompt</em> textarea ships pre-filled with a minimal example you can adapt:",
|
||||
"showStarterSummary": "Show starter custom prompt",
|
||||
"shareTitle": "Sharing prompts with the community",
|
||||
"shareIntro": "The <em>Export</em> button writes your current custom prompt to a file (<code>.txt</code> / <code>.md</code>) you can keep as a backup or hand to someone else. <em>Import</em> pulls one back in. The third button next to them links to a public community gallery on GitHub:",
|
||||
"shareLinkLabel": "github.com/MacRimi/ProxMenux/discussions — Share custom prompts for AI notifications",
|
||||
"shareOutro": "Browse the discussion to see what other operators have built — terse pager-style alerts, verbose technical reports, language-specific variants. If you tweak yours and like the result, post it there: even a one-paragraph description of what your prompt optimises for helps people pick a good starting point. Feedback on what works and what doesn't is equally welcome."
|
||||
},
|
||||
"suggestions": {
|
||||
"heading": "AI Suggestions (BETA)",
|
||||
"intro": "AI Suggestions is an opt-in addon that lets the LLM append <strong>one</strong> short, actionable tip at the end of the body. It only activates when the prompt mode is <em>Default</em>, the master AI toggle is on, and the <em>AI Suggestions</em> switch is flipped — and even then, the prompt instructs the model to skip the tip whenever the cause or solution is unclear.",
|
||||
"formatIntro": "When a tip is added, it follows this exact format:",
|
||||
"rulesIntro": "The rules baked into the addon (visible in the collapsible block under the Default prompt section above):",
|
||||
"rules": [
|
||||
"The tip is included <em>only</em> if the journal context or the Known Errors database clearly points to a specific fix.",
|
||||
"The tip is capped at 100 characters.",
|
||||
"It must be specific (concrete command or path) — generic advice is rejected by the prompt itself.",
|
||||
"If a Known Error provides a solution, the LLM must use that solution, not invent a new one.",
|
||||
"If nothing in the input gives the LLM enough certainty to suggest a concrete fix, the tip is skipped — no guessing."
|
||||
],
|
||||
"betaTitle": "Why BETA",
|
||||
"betaBody": "Tip quality depends on two things outside of ProxMenux: the model picked, and how rich the journal context for the event was. With a strong model (Claude 3.5 Haiku, GPT-4.1 Mini, Llama 3.3 70B) and a disk error that comes with smartctl + journal lines, the tip is consistently useful. With a tiny local Ollama model and a one-line event, the tip can fall flat or get skipped entirely. Disable the toggle if you find the tips noisy and re-enable it when you want it back."
|
||||
},
|
||||
"detailLevel": {
|
||||
"heading": "Detail level per channel",
|
||||
"intro": "Each of the four channels (Telegram, Discord, Gotify, Email) has its own detail-level dropdown. Three values are available, mapped to specific output token caps and to specific instructions in the default prompt:",
|
||||
"headerLevel": "Level",
|
||||
"headerLabel": "UI label",
|
||||
"headerCap": "Output cap",
|
||||
"headerProduce": "What the prompt asks the LLM to produce",
|
||||
"rows": [
|
||||
{
|
||||
"level": "brief",
|
||||
"label": "2-3 lines, essential only",
|
||||
"cap": "500 tokens",
|
||||
"produce": "\"What happened + where\". Nothing else."
|
||||
},
|
||||
{
|
||||
"level": "standard",
|
||||
"label": "Concise with basic context",
|
||||
"cap": "1500 tokens",
|
||||
"produce": "3-6 lines: what, where, cause, affected devices."
|
||||
},
|
||||
{
|
||||
"level": "detailed",
|
||||
"label": "Complete technical details",
|
||||
"cap": "3000 tokens",
|
||||
"produce": "Full report: what, where, cause, affected, logs, SMART data, history."
|
||||
}
|
||||
],
|
||||
"defaultsIntro": "Defaults the Monitor applies on first install:",
|
||||
"defaults": [
|
||||
"<strong>Telegram, Discord, Gotify</strong> — <code>standard</code>.",
|
||||
"<strong>Email</strong> — <code>detailed</code>. Email is the channel where you typically want the full picture for archival."
|
||||
],
|
||||
"emailTitle": "Email detail level appends the original",
|
||||
"emailBody": "When Email is on <code>detailed</code> and the original templated body has substantial content (more than 50 characters), the dispatcher appends the original message at the bottom of the AI rewrite, separated by a 40-dash divider and an <code>Original message:</code> label. This means a detailed email always carries both the AI-friendly version and the machine-friendly raw template — useful when you want to grep an old alert later."
|
||||
},
|
||||
"language": {
|
||||
"heading": "Language",
|
||||
"intro": "Twelve languages are wired in. The dropdown sets <code>ai_language</code> in the config and the value is interpolated into the system prompt at the place where the prompt says <em>\"translate alerts into '{'language'}'\"</em>. The full list:",
|
||||
"list": "English (<code>en</code>), Spanish (<code>es</code>), French (<code>fr</code>), German (<code>de</code>), Portuguese (<code>pt</code>), Italian (<code>it</code>), Russian (<code>ru</code>), Swedish (<code>sv</code>), Norwegian (<code>no</code>), Japanese (<code>ja</code>), Chinese (<code>zh</code>), Dutch (<code>nl</code>).",
|
||||
"rulesIntro": "Two important rules taken straight from the prompt:",
|
||||
"rules": [
|
||||
"<strong>Translate</strong>: labels, descriptions, status words, units (e.g. GB → Go in French).",
|
||||
"<strong>Do not translate</strong>: hostnames, IPs, paths, VM/CT IDs, device names like <code>/dev/sdX</code>, technical identifiers. These stay verbatim regardless of language."
|
||||
],
|
||||
"customNote": "Custom prompt mode <strong>does not use</strong> the Language selector. If you switch to Custom and want a specific output language, declare it inside your prompt."
|
||||
},
|
||||
"templates": {
|
||||
"heading": "A note on templates",
|
||||
"body1": "The body the AI receives is not raw event data — it's a pre-rendered template. Each event type (<code>backup_complete</code>, <code>vm_start</code>, <code>auth_fail</code>, <code>health_degraded</code>, etc.) has a template in <code>notification_templates.py</code> that knows how to format that specific event into a structured plain-text body. The AI rewrites that body, it doesn't replace the templating step.",
|
||||
"body2": "Two practical implications: the AI never sees a hostname or VMID it has to invent — those fields are placed by the template before the rewrite. And if AI is disabled, the templated body is what gets dispatched directly. The <link>Notifications</link> page documents the dispatch pipeline in full and is the right cross-reference for everything that happens to an event before it reaches this layer."
|
||||
},
|
||||
"privacy": {
|
||||
"heading": "Privacy & data flow",
|
||||
"intro": "With AI rewrite enabled, the Monitor sends the rendered notification body plus the enriched context blocks to the configured provider. That can include hostnames, IPs, usernames, device paths, journal log lines and SMART attributes for the affected device. Whether that leaves the host depends on which provider you chose:",
|
||||
"headerProvider": "Provider",
|
||||
"headerDestination": "Data destination",
|
||||
"rows": [
|
||||
{
|
||||
"provider": "Ollama",
|
||||
"destination": "Stays on the Ollama host. If Ollama runs on the same Proxmox node, nothing leaves the network at all."
|
||||
},
|
||||
{
|
||||
"provider": "OpenAI",
|
||||
"destination": "<code>api.openai.com</code> (or your custom Base URL endpoint). Subject to OpenAI's data-handling policy at the time of the call."
|
||||
},
|
||||
{
|
||||
"provider": "Anthropic",
|
||||
"destination": "<code>api.anthropic.com</code>."
|
||||
},
|
||||
{
|
||||
"provider": "Google Gemini",
|
||||
"destination": "<code>generativelanguage.googleapis.com</code>."
|
||||
},
|
||||
{
|
||||
"provider": "Groq",
|
||||
"destination": "<code>api.groq.com</code>."
|
||||
},
|
||||
{
|
||||
"provider": "OpenRouter",
|
||||
"destination": "<code>openrouter.ai</code>, which forwards to the underlying model provider chosen in the <code>model</code> field. Two hops instead of one."
|
||||
}
|
||||
],
|
||||
"calloutTitle": "If event content cannot leave the network",
|
||||
"calloutBody": "Use Ollama, or disable the AI rewriter. There is no middle ground — the dispatcher does not try to redact hostnames or IPs before sending; the prompt is built from the actual event payload as the Monitor sees it."
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Notifications",
|
||||
"href": "/docs/monitor/notifications",
|
||||
"tail": " — the dispatch pipeline, channels, per-event toggles and the PVE webhook integration that surrounds this layer."
|
||||
},
|
||||
{
|
||||
"label": "Health Monitor",
|
||||
"href": "/docs/monitor/health-monitor",
|
||||
"tail": " — the largest single producer of events the AI ends up rewriting, with its own per-category suppression durations."
|
||||
},
|
||||
{
|
||||
"label": "Architecture",
|
||||
"href": "/docs/monitor/architecture",
|
||||
"tail": " — where the AI Enhancer fits into the wider Monitor process (it's a module, not a separate service)."
|
||||
}
|
||||
],
|
||||
"communityLabel": "Community prompts on GitHub",
|
||||
"communityTail": " — browse, share, ask."
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,681 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "Proxmox Monitor API — Integration Reference | ProxMenux",
|
||||
"description": "HTTP endpoints exposed by ProxMenux Monitor for integrations: Home Assistant, Homepage, Grafana, Prometheus, n8n and custom dashboards. Read-only data export plus the safe write operations needed by automations — VM control, backup trigger, notification dispatch, alert acknowledgement.",
|
||||
"ogTitle": "Proxmox Monitor API — Integration Reference",
|
||||
"ogDescription": "Endpoints exposed by ProxMenux Monitor for Home Assistant, Homepage, Grafana, Prometheus, n8n and custom dashboards.",
|
||||
"twitterTitle": "Proxmox Monitor API | ProxMenux",
|
||||
"twitterDescription": "Endpoints for Home Assistant, Homepage, Grafana, Prometheus, n8n and custom dashboards."
|
||||
},
|
||||
"header": {
|
||||
"title": "API Reference",
|
||||
"description": "The HTTP endpoints integrators use to read state and trigger safe actions on ProxMenux Monitor — Home Assistant sensors, Homepage cards, Grafana dashboards, Prometheus scrapes, n8n flows and custom scripts. Every category, plus a complete list of Prometheus metrics, with curl examples.",
|
||||
"section": "ProxMenux Monitor"
|
||||
},
|
||||
"intro": {
|
||||
"title": "What this page is for",
|
||||
"body": "This page lists the endpoints we expect external integrations to use — read-only data export across every part of the Monitor, plus the small set of write operations that automations legitimately need (trigger a backup, send a custom notification, acknowledge an alert). The whole API runs from the same Flask process that serves the dashboard UI on TCP <strong>8008</strong>; bind address and TLS are configured in <link>Access & Authentication</link>."
|
||||
},
|
||||
"headerEndpoint": "Endpoint",
|
||||
"headerMethod": "Method",
|
||||
"headerUse": "Use",
|
||||
"auth": {
|
||||
"heading": "Authentication",
|
||||
"intro": "Every endpoint marked \"authenticated\" expects a JWT bearer token in the request header:",
|
||||
"tokensIntro": "Two ways to obtain the token:",
|
||||
"items": [
|
||||
"<strong>API tokens (recommended for integrations).</strong> Long-lived tokens minted from <strong>Settings → Security → API Tokens</strong> in the dashboard. Each token is named, can be revoked individually, and is what you should hand to Home Assistant / Homepage / Grafana / n8n.",
|
||||
"<strong>Login flow (short-lived JWT).</strong> <code>POST /api/auth/login</code> with a username, password and TOTP token if 2FA is enabled. The returned JWT is short-lived and refreshed by the dashboard automatically; useful for ad-hoc scripts that authenticate as a human user."
|
||||
],
|
||||
"flowLink": "The auth flow, password policy, 2FA setup, audit log and TLS configuration all live in <link>Access & Authentication</link>.",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/auth/login",
|
||||
"method": "POST",
|
||||
"use": "Body: <code>'{'\"username\",\"password\",\"totp_token?\"'}'</code>. Returns a short-lived JWT."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/auth/api-tokens",
|
||||
"method": "GET",
|
||||
"use": "List API tokens (metadata only — names, prefixes, dates; never the actual secret)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/auth/api-tokens",
|
||||
"method": "POST",
|
||||
"use": "Mint a new long-lived API token. Body: <code>'{'\"name\":\"'<'label'>'\"'}'</code>. The token value is returned once and cannot be retrieved again."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/auth/api-tokens/<id>",
|
||||
"method": "DELETE",
|
||||
"use": "Revoke a specific API token by its ID."
|
||||
}
|
||||
]
|
||||
},
|
||||
"conventions": {
|
||||
"heading": "Conventions",
|
||||
"items": [
|
||||
"All requests and responses are JSON unless explicitly noted (log download is plain text, <code>/api/prometheus</code> is text/plain in the OpenMetrics format, task log is plain text).",
|
||||
"Successful mutating endpoints return <code>'{'\"success\": true, ...'}'</code>. Error responses use a non-2xx HTTP status with <code>'{'\"success\": false, \"error\": \"'<'reason'>'\"'}'</code>.",
|
||||
"List endpoints accept optional <code>limit</code>, <code>offset</code>, <code>since</code> and category-specific filters via query string.",
|
||||
"Time fields are returned as Unix epoch seconds or ISO-8601 with explicit timezone, never as locale strings."
|
||||
]
|
||||
},
|
||||
"system": {
|
||||
"heading": "System & hardware",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/system",
|
||||
"method": "GET",
|
||||
"use": "CPU, memory, swap, uptime, load — current snapshot."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/info",
|
||||
"method": "GET",
|
||||
"use": "Static host info: hostname, kernel, PVE version, CPU model, distro."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/system-info",
|
||||
"method": "GET",
|
||||
"use": "Extended system snapshot used by the dashboard header (overall metrics + boot time)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/hardware",
|
||||
"method": "GET",
|
||||
"use": "Detailed hardware inventory — PCI devices, GPUs, sensors map."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/hardware/live",
|
||||
"method": "GET",
|
||||
"use": "Live values for sensors that change second to second."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/temperature/history",
|
||||
"method": "GET",
|
||||
"use": "Time series of CPU / package temperatures."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/gpu/<slot>/realtime",
|
||||
"method": "GET",
|
||||
"use": "NVIDIA / Intel / AMD GPU live metrics by PCI slot."
|
||||
}
|
||||
]
|
||||
},
|
||||
"health": {
|
||||
"heading": "Health Monitor",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/health",
|
||||
"method": "GET",
|
||||
"use": "Small health probe — returns JSON with <code>status</code>, <code>timestamp</code>, <code>version</code>. Suitable for Uptime Kuma keyword checks; the receiver must send the bearer header."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/status",
|
||||
"method": "GET",
|
||||
"use": "Overall health verdict — single severity + summary string."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/details",
|
||||
"method": "GET",
|
||||
"use": "All ten categories with per-category statuses and the structured payload that produced each."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/full",
|
||||
"method": "GET",
|
||||
"use": "Full snapshot — categories + active errors + dismissed list + custom suppression settings. Backs the modal in one round-trip; uses a 6-min background cache for instant response."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/active-errors",
|
||||
"method": "GET",
|
||||
"use": "Active list, filterable by <code>?category=<name></code>."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/dismissed",
|
||||
"method": "GET",
|
||||
"use": "Dismissed list with remaining suppression hours."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/settings",
|
||||
"method": "GET",
|
||||
"use": "Per-category Suppression Duration values currently configured."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/remote-storages",
|
||||
"method": "GET",
|
||||
"use": "Inventory of Proxmox-defined remote storages, with online state."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/interfaces",
|
||||
"method": "GET",
|
||||
"use": "Inventory of network interfaces with type (bridge / bond / physical), IP and link speed."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/acknowledge",
|
||||
"method": "POST",
|
||||
"use": "Body: <code>'{'\"error_key\":\"smart_sdh\"'}'</code>. Dismiss an alert with the category's configured Suppression Duration."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/cleanup-orphans",
|
||||
"method": "POST",
|
||||
"use": "Manual cleanup of errors whose underlying device or VM is gone. Idempotent."
|
||||
}
|
||||
],
|
||||
"outro": "Response shapes and the semantics of the categories live in <link>Health Monitor</link>."
|
||||
},
|
||||
"storage": {
|
||||
"heading": "Storage",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/storage",
|
||||
"method": "GET",
|
||||
"use": "All disks visible to the host (block devices, ZFS pools, LVM)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/storage/summary",
|
||||
"method": "GET",
|
||||
"use": "Compact summary used by dashboard cards."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/proxmox-storage",
|
||||
"method": "GET",
|
||||
"use": "Proxmox-defined storages from <code>/etc/pve/storage.cfg</code> with online state and free space."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/storage/observations",
|
||||
"method": "GET",
|
||||
"use": "Permanent disk observation history — SMART warnings, I/O errors, ZFS pool events, kept across error auto-resolves."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/storage/smart/<disk>",
|
||||
"method": "GET",
|
||||
"use": "Current SMART attributes for one disk."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/storage/smart/<disk>/latest",
|
||||
"method": "GET",
|
||||
"use": "Most recent SMART self-test for the disk."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/storage/smart/<disk>/history",
|
||||
"method": "GET",
|
||||
"use": "List of stored SMART reports for the disk."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/storage/smart/<disk>/history/<file>",
|
||||
"method": "GET",
|
||||
"use": "Read a specific stored SMART report."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/storage/smart/<disk>/test",
|
||||
"method": "POST",
|
||||
"use": "Trigger a SMART self-test. Body: <code>'{'\"type\":\"short\"|\"long\"'}'</code>."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/storage/smart/schedules",
|
||||
"method": "GET",
|
||||
"use": "List the currently scheduled SMART tests."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/storage/smart/tools",
|
||||
"method": "GET",
|
||||
"use": "Detect whether <code>smartctl</code> and friends are installed."
|
||||
}
|
||||
]
|
||||
},
|
||||
"network": {
|
||||
"heading": "Network",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/network",
|
||||
"method": "GET",
|
||||
"use": "All interfaces with link state and addresses."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/network/summary",
|
||||
"method": "GET",
|
||||
"use": "Compact view used by the dashboard."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/network/<iface>/metrics",
|
||||
"method": "GET",
|
||||
"use": "Per-interface RX / TX, error counters, RRD time series."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/network/latency/current",
|
||||
"method": "GET",
|
||||
"use": "Latest gateway latency probe."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/network/latency/history",
|
||||
"method": "GET",
|
||||
"use": "Time series of gateway latency."
|
||||
}
|
||||
]
|
||||
},
|
||||
"vms": {
|
||||
"heading": "VMs & containers",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/vms",
|
||||
"method": "GET",
|
||||
"use": "List of all VMs and CTs with status, vmid, name."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/vms/<vmid>",
|
||||
"method": "GET",
|
||||
"use": "Full detail for one guest (config, network, disks)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/vms/<vmid>/metrics",
|
||||
"method": "GET",
|
||||
"use": "CPU / memory / disk I/O time series for one guest (RRD)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/vms/<vmid>/logs",
|
||||
"method": "GET",
|
||||
"use": "Recent task logs scoped to that guest."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/vms/<vmid>/backups",
|
||||
"method": "GET",
|
||||
"use": "List backups currently held for this guest."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/vms/<vmid>/control",
|
||||
"method": "POST",
|
||||
"use": "Body: <code>'{'\"action\":\"start|stop|reboot|shutdown\"'}'</code>. Power-cycle a guest."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/vms/<vmid>/backup",
|
||||
"method": "POST",
|
||||
"use": "Trigger <code>vzdump</code> for that guest. Body chooses storage and mode (<code>snapshot</code> / <code>suspend</code> / <code>stop</code>)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/vms/<vmid>/config",
|
||||
"method": "PUT",
|
||||
"use": "Update the description (notes) field of a VM / CT. Other config keys are not modifiable from this endpoint."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/node/metrics",
|
||||
"method": "GET",
|
||||
"use": "Aggregated node-level metrics (RRD)."
|
||||
}
|
||||
]
|
||||
},
|
||||
"backups": {
|
||||
"heading": "Backups",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/backups",
|
||||
"method": "GET",
|
||||
"use": "Cluster-wide list of backups."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/backup-storages",
|
||||
"method": "GET",
|
||||
"use": "Storages flagged as backup targets, with free space."
|
||||
}
|
||||
]
|
||||
},
|
||||
"logs": {
|
||||
"heading": "Logs, tasks, events",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/logs",
|
||||
"method": "GET",
|
||||
"use": "Filtered journal entries. Query: <code>?level=&service=&limit=</code>."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/logs/download",
|
||||
"method": "GET",
|
||||
"use": "Plain-text dump of the filtered range."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/events",
|
||||
"method": "GET",
|
||||
"use": "Internal event stream — the same one that feeds notifications."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/task-log/<upid>",
|
||||
"method": "GET",
|
||||
"use": "Plain-text complete log for one Proxmox task by UPID."
|
||||
}
|
||||
]
|
||||
},
|
||||
"notifications": {
|
||||
"heading": "Notifications & AI",
|
||||
"intro": "The dispatch pipeline, channel walk-throughs and AI rewriter setup live in <notifLink>Notifications</notifLink> and <aiLink>AI Assistant</aiLink>.",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/notifications",
|
||||
"method": "GET",
|
||||
"use": "Recent notifications surfaced in the dashboard."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/download",
|
||||
"method": "GET",
|
||||
"use": "Export notifications as text."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/status",
|
||||
"method": "GET",
|
||||
"use": "Dispatcher status — whether the background thread is running, queue depth, last send."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/settings",
|
||||
"method": "GET",
|
||||
"use": "Read the full notification config (channels, per-event toggles, AI rewriter, Display Name)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/history",
|
||||
"method": "GET",
|
||||
"use": "Dispatch history. Query: <code>?limit=&offset=&severity=&channel=</code>."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/history",
|
||||
"method": "DELETE",
|
||||
"use": "Wipe the dispatch history table."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/test",
|
||||
"method": "POST",
|
||||
"use": "Send a test notification to one channel. Body: <code>'{'\"channel\":\"telegram\"'}'</code>."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/send",
|
||||
"method": "POST",
|
||||
"use": "Emit a custom event. Body: <code>'{'\"event_type\":\"custom\",\"severity\":\"WARNING\",\"title\":\"...\",\"body\":\"...\",\"data\":'{''}''}'</code>."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/test-ai",
|
||||
"method": "POST",
|
||||
"use": "Test the AI provider connection. Body: <code>'{'\"provider\",\"api_key\",\"model\",\"ollama_url?\"'}'</code>."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/provider-models",
|
||||
"method": "POST",
|
||||
"use": "List available models for the selected AI provider."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/proxmox/setup-webhook",
|
||||
"method": "POST",
|
||||
"use": "Register the Monitor as a webhook target in <code>/etc/pve/notifications.cfg</code>."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/proxmox/cleanup-webhook",
|
||||
"method": "POST",
|
||||
"use": "Remove the Monitor target from PVE's notification config."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/proxmox/read-cfg",
|
||||
"method": "GET",
|
||||
"use": "Read PVE's current <code>notifications.cfg</code> as PVE sees it."
|
||||
}
|
||||
]
|
||||
},
|
||||
"security": {
|
||||
"heading": "Security (read)",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/security/firewall/status",
|
||||
"method": "GET",
|
||||
"use": "PVE firewall state and active rules."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/security/fail2ban/details",
|
||||
"method": "GET",
|
||||
"use": "Fail2Ban jail status — only useful when the optional jail is installed."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/security/fail2ban/activity",
|
||||
"method": "GET",
|
||||
"use": "Recent Fail2Ban events (bans, unbans, jail starts)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/security/lynis/status",
|
||||
"method": "GET",
|
||||
"use": "Lynis last-run status (whether installed, last scan timestamp)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/security/lynis/report",
|
||||
"method": "GET",
|
||||
"use": "Latest Lynis audit report (warnings, suggestions, hardening index)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/security/tools",
|
||||
"method": "GET",
|
||||
"use": "Detect which optional security tools (Fail2Ban, Lynis) are installed."
|
||||
}
|
||||
]
|
||||
},
|
||||
"proxmenuxIntegration": {
|
||||
"heading": "ProxMenux integration",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/proxmenux/update-status",
|
||||
"method": "GET",
|
||||
"use": "Whether ProxMenux Monitor has an update available, current and latest version."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/proxmenux/installed-tools",
|
||||
"method": "GET",
|
||||
"use": "List of every ProxMenux post-install optimization currently registered on the host (from <code>/usr/local/share/proxmenux/installed_tools.json</code>)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/proxmenux/tool-source/<key>",
|
||||
"method": "GET",
|
||||
"use": "Source code of a specific post-install function — the exact bash that was applied."
|
||||
}
|
||||
]
|
||||
},
|
||||
"prometheus": {
|
||||
"heading": "Prometheus metrics",
|
||||
"intro": "ProxMenux Monitor exposes a Prometheus-format scrape endpoint at <code>GET /api/prometheus</code> (authenticated) returning OpenMetrics-format text. Every metric is labelled with <code>node=\"<hostname>\"</code> and carries an explicit timestamp so it ingests cleanly into Prometheus, VictoriaMetrics, Mimir or any compatible TSDB.",
|
||||
"exportedTitle": "Exported metrics",
|
||||
"headerGroup": "Group",
|
||||
"headerMetric": "Metric",
|
||||
"headerDesc": "Description",
|
||||
"groups": [
|
||||
{
|
||||
"group": "System",
|
||||
"metrics": [
|
||||
{
|
||||
"metric": "proxmox_cpu_usage",
|
||||
"desc": "CPU usage percentage (gauge)."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_memory_total_bytes",
|
||||
"desc": "Total physical memory in bytes."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_memory_used_bytes",
|
||||
"desc": "Used memory in bytes."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_memory_usage_percent",
|
||||
"desc": "Memory usage percentage."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_load_average",
|
||||
"desc": "System load average. Label <code>period=\"1m\" | \"5m\" | \"15m\"</code>."
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Uptime",
|
||||
"metrics": [
|
||||
{
|
||||
"metric": "proxmox_uptime_seconds",
|
||||
"desc": "Seconds since last boot."
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Hardware",
|
||||
"metrics": [
|
||||
{
|
||||
"metric": "proxmox_cpu_temperature_celsius",
|
||||
"desc": "CPU package temperature."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_disk_temperature_celsius",
|
||||
"desc": "Per-disk temperature. Label <code>device</code>."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_fan_speed_rpm",
|
||||
"desc": "Fan speeds. Label <code>fan</code>."
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Disk space",
|
||||
"metrics": [
|
||||
{
|
||||
"metric": "proxmox_disk_total_bytes",
|
||||
"desc": "Total disk space per mount. Label <code>mountpoint</code>."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_disk_used_bytes",
|
||||
"desc": "Used disk space per mount."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_disk_usage_percent",
|
||||
"desc": "Disk usage percentage per mount."
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "Network",
|
||||
"metrics": [
|
||||
{
|
||||
"metric": "proxmox_network_bytes_sent_total",
|
||||
"desc": "Total bytes sent (counter)."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_network_bytes_received_total",
|
||||
"desc": "Total bytes received (counter)."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_interface_bytes_sent_total",
|
||||
"desc": "Per-interface bytes sent. Label <code>interface</code>."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_interface_bytes_received_total",
|
||||
"desc": "Per-interface bytes received."
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "VMs / CTs",
|
||||
"metrics": [
|
||||
{
|
||||
"metric": "proxmox_vms_total",
|
||||
"desc": "Total number of VMs and LXCs."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_vms_running",
|
||||
"desc": "Number of running guests."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_vms_stopped",
|
||||
"desc": "Number of stopped guests."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_vm_status",
|
||||
"desc": "Per-VM/CT status (1 = running, 0 = stopped). Labels <code>vmid</code>, <code>name</code>, <code>type</code>."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_vm_cpu_usage",
|
||||
"desc": "Per-VM/CT CPU usage. Same labels."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_vm_memory_used_bytes / _max_bytes",
|
||||
"desc": "Per-VM/CT memory used and configured maximum."
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "GPU",
|
||||
"metrics": [
|
||||
{
|
||||
"metric": "proxmox_gpu_temperature_celsius",
|
||||
"desc": "GPU temperature. Labels <code>slot</code>, <code>vendor</code>."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_gpu_utilization_percent",
|
||||
"desc": "GPU utilization percentage."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_gpu_memory_total_bytes",
|
||||
"desc": "GPU memory total."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_gpu_power_draw_watts",
|
||||
"desc": "GPU power draw in watts."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_gpu_clock_speed_mhz",
|
||||
"desc": "GPU core clock speed."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_gpu_memory_clock_mhz",
|
||||
"desc": "GPU memory clock speed."
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"group": "UPS",
|
||||
"metrics": [
|
||||
{
|
||||
"metric": "proxmox_ups_battery_charge_percent",
|
||||
"desc": "Battery charge percentage."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_ups_load_percent",
|
||||
"desc": "Current load on the UPS."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_ups_runtime_seconds",
|
||||
"desc": "Estimated runtime on battery."
|
||||
},
|
||||
{
|
||||
"metric": "proxmox_ups_input_voltage_volts",
|
||||
"desc": "Input voltage."
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"scrapeTitle": "Prometheus scrape config",
|
||||
"scrapeIntro": "The endpoint requires authentication. Pass the API token as a bearer header in your Prometheus scrape config:",
|
||||
"perHostTitle": "Per-host scrape",
|
||||
"perHostBody": "Each ProxMenux Monitor instance exports metrics for the host it runs on. In a cluster, point Prometheus at every node — the <code>node</code> label on every series lets you distinguish them in Grafana queries (<code>proxmox_vms_running'{'node=\"pve01\"'}'</code>)."
|
||||
},
|
||||
"puttingItTogether": {
|
||||
"heading": "Putting it together",
|
||||
"body": "For end-to-end recipes wiring these endpoints into Home Assistant sensors, Homepage cards, Grafana dashboards, n8n flows and other tools, see the dedicated <link>Integrations</link> page — it walks through the typical setup for each platform with copy-paste configuration. This page stays focused on the catalogue itself."
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Access & Authentication",
|
||||
"href": "/docs/monitor/access-auth",
|
||||
"tail": " — minting tokens, the audit log, the optional Fail2Ban jail, TLS configuration."
|
||||
},
|
||||
{
|
||||
"label": "Notifications",
|
||||
"href": "/docs/monitor/notifications",
|
||||
"tailRich": " — what each event type carries in <code>data</code> when you call <code>/api/notifications/send</code>."
|
||||
},
|
||||
{
|
||||
"label": "AI Assistant",
|
||||
"href": "/docs/monitor/ai-assistant",
|
||||
"tailRich": " — how <code>/api/notifications/test-ai</code> and <code>/api/notifications/provider-models</code> are wired."
|
||||
},
|
||||
{
|
||||
"label": "Health Monitor",
|
||||
"href": "/docs/monitor/health-monitor",
|
||||
"tailRich": " — the response shape of <code>/api/health/*</code> and the semantics of the ten categories."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,303 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "ProxMenux Monitor Architecture — AppImage, Flask, SQLite, WebSocket | ProxMenux",
|
||||
"description": "How ProxMenux Monitor is built: AppImage layout, Flask blueprints, background workers, data sources (psutil, pvesh, smartctl, journalctl), SQLite persistence, WebSocket terminal, AI providers, notification channels, reverse proxy and optional Fail2Ban integration.",
|
||||
"ogTitle": "ProxMenux Monitor Architecture",
|
||||
"ogDescription": "Inside ProxMenux Monitor — AppImage layout, Flask blueprints, background workers, SQLite, WebSocket, AI providers, notification channels.",
|
||||
"twitterTitle": "ProxMenux Monitor Architecture",
|
||||
"twitterDescription": "AppImage, Flask, SQLite, WebSocket, AI providers and notification channels — inside the Monitor."
|
||||
},
|
||||
"header": {
|
||||
"title": "Architecture",
|
||||
"description": "How ProxMenux Monitor is packaged, what runs inside the AppImage, and how requests flow from the browser through the Flask backend to the host's tooling and SQLite store.",
|
||||
"section": "ProxMenux Monitor"
|
||||
},
|
||||
"intro": {
|
||||
"title": "One process, many responsibilities",
|
||||
"body": "A single Python process listens on TCP 8008. It serves the static Next.js build, exposes the REST API, handles the WebSocket terminal, runs the periodic Health Monitor, and dispatches notifications. There is no separate web server, no message broker, no external database."
|
||||
},
|
||||
"requestFlow": {
|
||||
"heading": "Request flow",
|
||||
"intro": "From the browser to the kernel, every dashboard view follows the same path:",
|
||||
"diagramCaption": "Each request is authenticated by JWT (when auth is enabled), dispatched to a blueprint, and answered with data collected on demand from host tooling. If Fail2Ban is installed and the proxmenux jail is active, the middleware also checks the request against the jail's banned IP list. The optional reverse proxy is transparent to Flask — it forwards X-Forwarded-* headers and the app recovers the real client IP from them. State that needs to outlive a request lives in SQLite.",
|
||||
"diagramArrowLabel": "HTTP / WS",
|
||||
"nodes": {
|
||||
"clientLabel": "Client",
|
||||
"clientDetail": "Browser or PWA\n+ optional\nNginx / Caddy /\nTraefik proxy",
|
||||
"flaskLabel": "Flask :8008",
|
||||
"flaskDetail": "Blueprints\nJWT middleware\nFail2Ban hook\n(if installed)",
|
||||
"hostLabel": "Host tools",
|
||||
"hostDetail": "psutil\npvesh\nsmartctl\njournalctl",
|
||||
"stateLabel": "Local state",
|
||||
"stateDetail": "SQLite DB\n+ auth.json"
|
||||
},
|
||||
"threadsIntro": "The same process also runs four <strong>background threads</strong> started at boot — they don't serve HTTP, they push state into SQLite or into the notification queue while the host is up:",
|
||||
"headerThread": "Thread",
|
||||
"headerCadence": "Cadence",
|
||||
"headerJob": "Job",
|
||||
"rows": [
|
||||
{
|
||||
"thread": "_temperature_collector_loop",
|
||||
"cadence": "60 s",
|
||||
"job": "Records CPU temperature and a network-latency sample into the history DB so the dashboard graphs have data even when no client is connected."
|
||||
},
|
||||
{
|
||||
"thread": "_health_collector_loop",
|
||||
"cadence": "5 min",
|
||||
"job": "Runs the full Health Monitor cycle (10 categories), persists active errors, dismissals and disk observations, and feeds new events into the notification engine."
|
||||
},
|
||||
{
|
||||
"thread": "_vital_signs_sampler",
|
||||
"cadence": "~1 s",
|
||||
"job": "High-frequency CPU + temperature sampler used for live widgets in the Overview panel."
|
||||
},
|
||||
{
|
||||
"thread": "notification_manager.start()",
|
||||
"cadence": "event-driven",
|
||||
"job": "Spawns the journal / task / hook watchers (<code>JournalWatcher</code>, <code>TaskWatcher</code>, <code>ProxmoxHookWatcher</code>) and dispatches to configured channels with optional AI rewriting."
|
||||
}
|
||||
]
|
||||
},
|
||||
"systemd": {
|
||||
"heading": "systemd unit",
|
||||
"intro": "The installer drops a unit at <code>/etc/systemd/system/proxmenux-monitor.service</code>. Default content:",
|
||||
"items": [
|
||||
"<strong><code>User=root</code></strong> — required: SMART, <code>pvesh</code>, journal scopes, ZFS commands and the web terminal all need root.",
|
||||
"<strong><code>Restart=on-failure</code></strong> with a 10-second back-off — non-zero exits relaunch automatically.",
|
||||
"<strong><code>After=network.target</code></strong> — waits for the host network stack to be online."
|
||||
],
|
||||
"inspectTitle": "Inspect the live unit"
|
||||
},
|
||||
"appimage": {
|
||||
"heading": "What the AppImage contains",
|
||||
"intro": "The AppImage is a self-mounting filesystem. <code>AppRun</code> at the root sets up the environment and execs <code>flask_server.py</code>:",
|
||||
"consequencesIntro": "Two consequences of this layout:",
|
||||
"consequences": [
|
||||
"<strong>No host Python pollution.</strong> The vendored interpreter and packages are isolated inside the AppImage — upgrading the host's system Python doesn't affect the Monitor and vice-versa.",
|
||||
"<strong>Hardware tools are bundled too.</strong> <code>ipmitool</code>, <code>lm-sensors</code> and <code>upsc</code> ship inside the AppImage so the dashboard can read out-of-band sensors and UPS state without forcing the user to install Debian packages."
|
||||
]
|
||||
},
|
||||
"flask": {
|
||||
"heading": "Flask app structure",
|
||||
"intro": "<code>flask_server.py</code> creates a single <code>Flask(__name__)</code> instance, enables CORS, and registers six blueprints plus a WebSocket initializer:",
|
||||
"headerBlueprint": "Blueprint / module",
|
||||
"headerPrefix": "Routes prefix",
|
||||
"headerOwns": "Owns",
|
||||
"rows": [
|
||||
{
|
||||
"blueprint": "flask_server.py",
|
||||
"prefix": [
|
||||
"/api/system",
|
||||
"/api/storage",
|
||||
"/api/network",
|
||||
"/api/vms",
|
||||
"/api/hardware",
|
||||
"/api/logs",
|
||||
"/api/prometheus"
|
||||
],
|
||||
"owns": "Core data endpoints + static dashboard serving + optional Fail2Ban app-level check (active only when Fail2Ban is installed on the host with the <code>proxmenux</code> jail)."
|
||||
},
|
||||
{
|
||||
"blueprint": "flask_auth_routes.py",
|
||||
"prefix": [
|
||||
"/api/auth/*"
|
||||
],
|
||||
"owns": "Login, JWT issuing, TOTP setup/verify, password change, API token generation."
|
||||
},
|
||||
{
|
||||
"blueprint": "flask_health_routes.py",
|
||||
"prefix": [
|
||||
"/api/health/*"
|
||||
],
|
||||
"owns": "Public health probe, detailed status, active / dismissed errors, suppression settings."
|
||||
},
|
||||
{
|
||||
"blueprint": "flask_terminal_routes.py",
|
||||
"prefix": [
|
||||
"/api/terminal/* + WS"
|
||||
],
|
||||
"owns": "PTY allocation per session and WebSocket pipe to <code>xterm.js</code> in the browser."
|
||||
},
|
||||
{
|
||||
"blueprint": "flask_notification_routes.py",
|
||||
"prefix": [
|
||||
"/api/notifications/*"
|
||||
],
|
||||
"owns": "Channel CRUD, test-send, AI provider config, history, manual sends."
|
||||
},
|
||||
{
|
||||
"blueprint": "flask_security_routes.py",
|
||||
"prefix": [
|
||||
"/api/security/*"
|
||||
],
|
||||
"owns": "Authentication failures and, when Fail2Ban is installed, jail status, ban events and manual unban."
|
||||
},
|
||||
{
|
||||
"blueprint": "flask_proxmenux_routes.py",
|
||||
"prefix": [
|
||||
"/api/proxmenux/*"
|
||||
],
|
||||
"owns": "Reads which ProxMenux post-install optimizations are installed on the host."
|
||||
},
|
||||
{
|
||||
"blueprint": "flask_oci_routes.py",
|
||||
"prefix": [
|
||||
"/api/oci/*"
|
||||
],
|
||||
"owns": "OCI / container app deployment helpers (Proxmox VE 9.1+)."
|
||||
}
|
||||
],
|
||||
"endpointsLink": "The full endpoint list with request / response shapes is in <link>API Reference</link>."
|
||||
},
|
||||
"dataSources": {
|
||||
"heading": "Data sources",
|
||||
"intro": "Nothing is collected from a custom agent — the Monitor reads the same files and runs the same commands a human admin would:",
|
||||
"headerSource": "Source",
|
||||
"headerUsedFor": "Used for",
|
||||
"rows": [
|
||||
{
|
||||
"source": "psutil",
|
||||
"usedFor": "CPU load, memory, swap, mountpoint usage, NIC counters, process list."
|
||||
},
|
||||
{
|
||||
"source": "pvesh / qm / pct",
|
||||
"usedFor": "Proxmox node info, VM and CT inventory and config, storage pools, task history."
|
||||
},
|
||||
{
|
||||
"source": "smartctl",
|
||||
"usedFor": "SATA / NVMe attributes, SMART health, wear / lifetime, model and serial."
|
||||
},
|
||||
{
|
||||
"source": "zpool / zfs",
|
||||
"usedFor": "Pool state (ONLINE / DEGRADED / FAULTED / UNAVAIL), scrub progress, dataset usage."
|
||||
},
|
||||
{
|
||||
"source": "journalctl",
|
||||
"usedFor": "System logs, OOM kills, ATA / NVMe / dm errors, security events, custom service units."
|
||||
},
|
||||
{
|
||||
"source": "ip / iproute2",
|
||||
"usedFor": "Interfaces, addresses, bridges, bonds, OVS-managed devices."
|
||||
},
|
||||
{
|
||||
"source": "nvidia-smi · intel_gpu_top",
|
||||
"usedFor": "GPU utilisation, VRAM, temperature, encoder / decoder load."
|
||||
},
|
||||
{
|
||||
"source": "lspci · lscpu · dmidecode",
|
||||
"usedFor": "PCIe topology, CPU model and topology, board and BIOS info."
|
||||
},
|
||||
{
|
||||
"source": "ipmitool · sensors",
|
||||
"usedFor": "Out-of-band sensors, fan speeds, board temperatures (when supported)."
|
||||
},
|
||||
{
|
||||
"source": "upsc (NUT)",
|
||||
"usedFor": "UPS battery state, load, runtime — when a NUT server is configured on the host."
|
||||
}
|
||||
],
|
||||
"cacheTitle": "Output is cached — not every request hits the host",
|
||||
"cacheBody": "Expensive sources (<code>smartctl -a</code>, <code>pvesh get</code>) are wrapped in time-bound caches inside the Flask process so a busy dashboard tab doesn't hammer the disk or the cluster API. The cache TTLs are tuned per source (a few seconds for live metrics, several minutes for SMART)."
|
||||
},
|
||||
"persistence": {
|
||||
"heading": "Persistence",
|
||||
"intro": "Two filesystem locations split state by sensitivity:",
|
||||
"headerPath": "Path",
|
||||
"headerOwner": "Owner",
|
||||
"headerContents": "Contents",
|
||||
"rows": [
|
||||
{
|
||||
"path": "/usr/local/share/proxmenux/health_monitor.db",
|
||||
"owner": "root:root",
|
||||
"contents": "SQLite DB. Tables: <code>errors</code>, <code>events</code>, <code>disk_registry</code>, <code>disk_observations</code>, <code>user_settings</code>, <code>notification_history</code>, <code>excluded_storages</code>, <code>excluded_interfaces</code>. WAL journal mode."
|
||||
},
|
||||
{
|
||||
"path": "/usr/local/share/proxmenux/.notification_key",
|
||||
"owner": "root <code>0600</code>",
|
||||
"contents": "32-byte XOR key used to encrypt sensitive notification settings before storing them in the DB (Telegram tokens, AI API keys, etc.)."
|
||||
},
|
||||
{
|
||||
"path": "/root/.config/proxmenux-monitor/auth.json",
|
||||
"owner": "root:root",
|
||||
"contents": "Authentication state: enabled flag, username, SHA-256 password hash, TOTP secret, backup codes, list of issued API tokens, list of revoked token hashes."
|
||||
},
|
||||
{
|
||||
"path": "/var/log/proxmenux-auth.log",
|
||||
"owner": "root:root",
|
||||
"contents": "Plain-text auth event log. Always written. If Fail2Ban is installed with the <code>[proxmenux]</code> jail, the jail reads this file to ban brute-force attempts; if not, the file simply accumulates the log entries."
|
||||
}
|
||||
],
|
||||
"backupTitle": "Back up auth.json before reinstalling",
|
||||
"backupBody": "Reinstalling the AppImage replaces the binary but leaves <code>/root/.config/proxmenux-monitor/auth.json</code> and <code>/usr/local/share/proxmenux/health_monitor.db</code> intact. If you restore from a host backup, keep both files together — the API tokens stored in <code>auth.json</code> are validated against <code>JWT_SECRET</code>; if the DB and auth.json get out of sync, dismissed errors and stored tokens may misbehave."
|
||||
},
|
||||
"health": {
|
||||
"heading": "Health Monitor cycle",
|
||||
"intro": "Every 5 minutes <code>health_monitor.py</code> runs a deterministic cycle across the ten categories shown on the dashboard:",
|
||||
"items": [
|
||||
"Critical PVE services (<code>pveproxy</code>, <code>pvedaemon</code>, <code>pvestatd</code>, <code>pve-cluster</code>).",
|
||||
"Proxmox storage pools (<code>pvesh get /storage</code> + per-storage availability).",
|
||||
"Disks and filesystems: SMART, dmesg I/O errors, ZFS pool health, mountpoint capacity.",
|
||||
"VMs and CTs: failed starts, crashed guests, QMP errors, shutdown failures.",
|
||||
"Network: bridge / bond status, link state, latency to the gateway.",
|
||||
"Updates: pending package upgrades and security patches.",
|
||||
"Logs: persistent / spike / cascade pattern detection in the system journal.",
|
||||
"Memory: OOM killer activity, sustained high pressure.",
|
||||
"Temperature: CPU / chassis sensors against vendor thresholds.",
|
||||
"Security: authentication failures, ban events, fail2ban jail status."
|
||||
],
|
||||
"afterIntro": "Each finding is normalised into a stable <code>error_key</code> + category + severity. The persistence layer deduplicates against the existing <code>errors</code> table — repeated events update <code>last_seen</code> and the occurrence counter without spamming notifications.",
|
||||
"cycleEnd": "The cycle also auto-resolves stale errors using the per-category <em>Suppression Duration</em> setting, cleans up errors for resources that no longer exist (deleted VMs / removed disks / unmounted storages), and prunes the <code>events</code> log older than 30 days. The full catalogue of categories and the dashboard view that surfaces them is documented in <link>Dashboard → Health Monitor</link>."
|
||||
},
|
||||
"notifications": {
|
||||
"heading": "Notification engine",
|
||||
"intro": "<code>notification_manager.py</code> is the orchestrator. It loads the configured channels, owns the delivery queue, and exposes both a Python API (for Flask routes and the Health Monitor cycle) and a CLI entrypoint (for the <code>.sh</code> hook scripts shipped with ProxMenux).",
|
||||
"items": [
|
||||
"<strong>Watchers</strong> push events: <code>JournalWatcher</code> tails the system journal, <code>TaskWatcher</code> polls the Proxmox task list, <code>ProxmoxHookWatcher</code> reacts to backup / replication / snapshot hooks, and <code>PollingCollector</code> handles slow data sources.",
|
||||
"<strong>Templates</strong> turn an event into a (title, body) pair. The same template can run through the configured AI provider (OpenAI / Anthropic / Gemini / Groq / Ollama / OpenRouter) to produce a plain-language rewrite; both versions are stored in <code>notification_history</code>.",
|
||||
"<strong>Channels</strong> deliver messages: Telegram, Discord, Email, Gotify and Apprise (multi-channel). Each is implemented in <code>notification_channels.py</code> behind the same <code>create_channel()</code> / <code>send()</code> interface, so adding a new channel is a single class.",
|
||||
"<strong>Encryption.</strong> Sensitive settings (<code>telegram.token</code>, <code>discord.webhook_url</code>, <code>ai_api_key_*</code>, <code>email.password</code>) are XOR-encrypted with the key in <code>.notification_key</code> before being written to the DB. Plaintext never touches disk."
|
||||
],
|
||||
"linksFooter": "Per-event toggles, channel overrides and AI configuration are surfaced in <notifLink>Settings → Notifications</notifLink> and <aiLink>Settings → AI Assistant</aiLink>."
|
||||
},
|
||||
"websocket": {
|
||||
"heading": "WebSocket terminal",
|
||||
"intro": "The <em>Terminal</em> tab in the dashboard is a thin <code>xterm.js</code> client wired to a server-side PTY through a WebSocket. Two transport modes:",
|
||||
"items": [
|
||||
"<strong>HTTP mode (default):</strong> Flask's development server with <code>flask-sock</code> handles upgrade requests. Good enough for LAN / direct access.",
|
||||
"<strong>HTTPS / WSS mode:</strong> when an SSL certificate is configured, the process switches to <code>gevent.pywsgi.WSGIServer</code> with <code>geventwebsocket.handler.WebSocketHandler</code>, so WebSockets work over TLS without polyfills."
|
||||
],
|
||||
"outro": "The PTY is a child of the Flask process, so it inherits <code>User=root</code> from the unit. Every terminal request goes through JWT auth; the user must already be logged in to the dashboard before a PTY is allocated.",
|
||||
"proxyNote": "If you access the Monitor through a reverse proxy, make sure WebSocket forwarding is enabled (the <code>Upgrade</code> and <code>Connection</code> headers). Without it the terminal won't work."
|
||||
},
|
||||
"proxy": {
|
||||
"heading": "Reverse proxy & Fail2Ban",
|
||||
"intro": "Two safeguards make sure security works the same way whether the dashboard is hit directly or through a reverse proxy:",
|
||||
"items": [
|
||||
"<strong>Real client IP recovery.</strong> A <code>before_request</code> hook reads <code>X-Forwarded-For</code> and <code>X-Real-IP</code> in that order, falling back to <code>request.remote_addr</code>. The recovered address is what auth logging and rate limiting see. This is always on.",
|
||||
"<strong>Application-level Fail2Ban check (optional).</strong> When the dashboard sits behind a proxy, the kernel firewall can't block the real attacker IP — the connection always comes from the proxy. To plug that gap, the same hook above queries the <code>proxmenux</code> Fail2Ban jail every 30 seconds, caches the banned IP set, and short-circuits requests from those IPs with HTTP 403 inside Flask."
|
||||
],
|
||||
"calloutTitle": "Fail2Ban is not bundled",
|
||||
"calloutBody": "Fail2Ban is <strong>not</strong> installed by ProxMenux Monitor itself. The application-level check is a no-op until you install Fail2Ban on the host (e.g. via <link>Security → Fail2Ban</link> in the ProxMenux menu). When the <code>fail2ban-client</code> binary or the <code>proxmenux</code> jail is absent, the call fails silently and requests are not gated — auth still applies, but no IP-level banning.",
|
||||
"outro": "Reverse-proxy snippets (Nginx / Caddy / Traefik) and the Fail2Ban jail walkthrough are in <accessLink>Access & Authentication</accessLink> and <fail2banLink>Security → Fail2Ban</fail2banLink>."
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Access & Authentication",
|
||||
"href": "/docs/monitor/access-auth",
|
||||
"tail": " — first-launch setup, password + TOTP 2FA, reverse-proxy snippets, Fail2Ban jail."
|
||||
},
|
||||
{
|
||||
"label": "API Reference",
|
||||
"href": "/docs/monitor/api",
|
||||
"tail": " — every endpoint, token management, security best-practices."
|
||||
},
|
||||
{
|
||||
"label": "Settings → ProxMenux Monitor",
|
||||
"href": "/docs/settings/proxmenux-monitor",
|
||||
"tail": " — the in-menu service toggle and status verification flow inside the ProxMenux TUI."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,246 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "ProxMenux Monitor — Dashboard: Hardware tab | ProxMenux Documentation",
|
||||
"description": "The Hardware tab inventories the physical machine: CPU and motherboard, memory modules, thermal sensors, GPUs (with per-slot real-time monitoring and one-click driver installer), Coral TPU accelerators, storage summary with link-speed checks, full PCI and USB device lists, power consumption, PSUs, fans and UPS state."
|
||||
},
|
||||
"header": {
|
||||
"title": "Dashboard: Hardware tab",
|
||||
"description": "The physical machine in one screen — CPU and motherboard identity, every memory module, thermal sensors across all subsystems, GPUs with live utilisation and a built-in driver installer, Coral TPUs, every PCI and USB device with its kernel driver, the full disk inventory with negotiated link speeds, plus power, cooling and the UPS.",
|
||||
"section": "ProxMenux Monitor · Dashboard"
|
||||
},
|
||||
"intro": {
|
||||
"title": "Built from standard tools",
|
||||
"body": "Most of this tab is parsed from <code>lscpu</code>, <code>dmidecode</code>, <code>lspci</code>, <code>lsusb</code>, <code>lsblk</code>, <code>smartctl</code>, <code>nvme</code>, <code>sensors</code>, <code>nvidia-smi</code>, <code>intel_gpu_top</code>, <code>amdgpu_top</code>, <code>ipmitool</code> and <code>upsc</code>. Sections only render when the relevant tool returns data, so a host without a UPS won't show the UPS card and a host without IPMI won't show out-of-band power figures."
|
||||
},
|
||||
"thresholds": {
|
||||
"title": "Status colours and thresholds applied here",
|
||||
"intro": "Every temperature chip and reading on this tab follows the same classification — <green/> <strong>green</strong> below Warning, <amber/> <strong>amber</strong> from Warning to Critical, <red/> <strong>red</strong> at Critical and above. Recommended defaults shipped with ProxMenux:",
|
||||
"items": [
|
||||
"<strong>CPU temperature</strong> — Warning 80 °C, Critical 90 °C.",
|
||||
"<strong>Disk temperature</strong> — HDD 60/65 °C · SSD 70/75 °C · NVMe 80/85 °C · SAS 55/65 °C (warning / critical)."
|
||||
],
|
||||
"outro": "Every value is configurable per host — <link>Settings → Health Monitor Thresholds</link> is the single source of truth and explains how to tune them."
|
||||
},
|
||||
"sections": {
|
||||
"heading": "Sections",
|
||||
"intro": "The tab renders top-to-bottom in this order. Some sections only appear when the host has the corresponding hardware or tool installed — they're marked <em>(conditional)</em> below.",
|
||||
"systemInfoTitle": "System Information",
|
||||
"systemInfoIntro": "Two side-by-side blocks, always present:",
|
||||
"systemInfoItems": [
|
||||
"<strong>CPU</strong> — model name, microarchitecture, sockets / cores / threads, base / boost frequency, virtualisation flags (VT-x / AMD-V), cache topology.",
|
||||
"<strong>Motherboard</strong> — vendor, model, BIOS version, BIOS date, SMBIOS UUID. Useful for matching to vendor download pages when looking for firmware updates."
|
||||
],
|
||||
"memoryTitle": "Memory Modules",
|
||||
"memoryBody": "One row per populated slot from <code>dmidecode</code>: slot label, module size, type (DDR4 / DDR5 / ECC variants), speed (configured and rated), manufacturer, part number and serial. Empty slots are listed greyed-out so you can see the upgrade headroom at a glance.",
|
||||
"thermalTitle": "Thermal Monitoring",
|
||||
"thermalIntro": "Five sub-blocks, each fed by <code>lm-sensors</code> + tool-specific scrapers. A block hides itself when no sensors are reported in that category.",
|
||||
"thermalItems": [
|
||||
"<strong>CPU</strong> — package and per-core temperatures.",
|
||||
"<strong>GPU</strong> — discrete-GPU sensors via <code>nvidia-smi</code> / <code>amdgpu_top</code> / Intel iGPU. Includes hot-spot and memory-junction when the driver exposes them.",
|
||||
"<strong>NVME</strong> — composite + per-sensor temperatures from <code>nvme</code>.",
|
||||
"<strong>PCI</strong> — sensors that surface as PCI-attached devices (HBAs, network cards with internal sensors).",
|
||||
"<strong>OTHER</strong> — chipset, VRM, ambient sensors that don't fit elsewhere."
|
||||
]
|
||||
},
|
||||
"graphics": {
|
||||
"heading": "Graphics Cards",
|
||||
"intro": "Each detected video controller renders as its own card with vendor, model, kind (<em>Integrated</em> / <em>PCI</em> / BMC), PCI slot (BDF), kernel driver and module list. The card also exposes an inline <strong>Switch Mode</strong> control that flips the GPU between LXC sharing (native driver) and VM passthrough (<code>vfio-pci</code>) — see <link>Switch GPU Mode (VM ↔ LXC)</link> for what happens on the host when you press it.",
|
||||
"vfioImageAlt": "Graphics Cards section showing a Matrox G200EH integrated GPU bound to mgag200 (Ready for LXC) and an NVIDIA Quadro P400 bound to vfio-pci (Ready for VM passthrough)",
|
||||
"vfioImageCaption": "Two GPUs detected: the Matrox BMC chip is on the native driver and ready for LXC; the NVIDIA Quadro P400 is bound to <code>vfio-pci</code>, ready for VM passthrough.",
|
||||
"lxcImageAlt": "Graphics Cards section showing an Intel UHD Graphics iGPU on i915 and an NVIDIA Quadro P1000 on the nvidia driver, both labelled Ready for LXC containers",
|
||||
"lxcImageCaption": "Same node after switching the NVIDIA card back to the native driver — both GPUs now Ready for LXC containers.",
|
||||
"realtimeTitle": "Real-time monitoring modal",
|
||||
"realtimeBody": "Clicking a GPU card opens a per-slot monitoring modal that polls the appropriate vendor tool every three seconds. The modal exposes vendor, type, PCI slot, driver, kernel module(s), live engine utilisation (Render/3D, Video, Blitter, VideoEnhance), graphics & memory clocks, temperature, power draw (when reported), VRAM usage, and an Active Processes table with per-process engine load. Data is served from <code>/api/gpu/<slot>/realtime</code>.",
|
||||
"toolsIntro": "The vendor tool used per GPU:",
|
||||
"headerVendor": "Vendor",
|
||||
"headerTool": "Tool",
|
||||
"headerProject": "Project",
|
||||
"tools": [
|
||||
{
|
||||
"vendor": "NVIDIA",
|
||||
"tool": "nvidia-smi",
|
||||
"projectLabel": "developer.nvidia.com",
|
||||
"projectHref": "https://developer.nvidia.com/nvidia-system-management-interface"
|
||||
},
|
||||
{
|
||||
"vendor": "Intel iGPU",
|
||||
"tool": "intel_gpu_top (igt-gpu-tools)",
|
||||
"projectLabel": "gitlab.freedesktop.org",
|
||||
"projectHref": "https://gitlab.freedesktop.org/drm/igt-gpu-tools"
|
||||
},
|
||||
{
|
||||
"vendor": "AMD",
|
||||
"tool": "amdgpu_top",
|
||||
"projectLabel": "github.com/Umio-Yasuno/amdgpu_top",
|
||||
"projectHref": "https://github.com/Umio-Yasuno/amdgpu_top"
|
||||
},
|
||||
{
|
||||
"vendor": "Matrox / ASPEED (BMC)",
|
||||
"tool": "— (display only)",
|
||||
"projectLabel": "Detected and labelled as BMC; no realtime block."
|
||||
}
|
||||
],
|
||||
"nvidiaImageAlt": "GPU monitoring modal for an NVIDIA Quadro P1000: vendor NVIDIA, driver nvidia loaded, graphics clock 1.26 GHz, memory clock 2.50 GHz, temperature 50 °C, all engine utilisation bars at 0 %, no active processes, total memory 4096 MiB",
|
||||
"nvidiaImageCaption": "NVIDIA Quadro P1000 with the proprietary driver loaded — clocks, temperature, engine bars and active processes all visible.",
|
||||
"intelImageAlt": "GPU monitoring modal for an Intel UHD Graphics iGPU on i915 driver, showing 11.31 W power draw, 1 % video engine load and an ffmpeg process consuming 8 MB",
|
||||
"intelImageCaption": "Intel iGPU with <code>i915</code> active. The Active Processes table picks up an ffmpeg job using the video engine.",
|
||||
"amdImageAlt": "GPU monitoring modal for an AMD Lucienne integrated GPU on amdgpu driver, with engine utilisation bars at 0 % and amdgpu_top listed as an active process",
|
||||
"amdImageCaption": "AMD iGPU monitored through <code>amdgpu_top</code> — the tool itself shows up as an active process because it's the live polling backend.",
|
||||
"installTitle": "Installing the NVIDIA driver from the modal",
|
||||
"installBody": "When an NVIDIA GPU is bound to <code>nouveau</code>/<code>nvidiafb</code> (no proprietary driver installed), the realtime block can't read clocks, power or per-process load. The modal then replaces the metrics with an <strong>Install NVIDIA Drivers</strong> button that wires straight into the same script documented at <link>Install NVIDIA Drivers (Host)</link>.",
|
||||
"noDriverAlt": "GPU monitoring modal for an NVIDIA Quadro P620 with kernel modules nvidiafb and nouveau loaded, an Extended Monitoring Not Available callout and a blue Install NVIDIA Drivers button",
|
||||
"noDriverCaption": "No proprietary driver installed yet — the modal shows a one-click installer.",
|
||||
"promptAlt": "NVIDIA GPU Driver Installation confirmation dialog listing detected GPUs, LXC containers with NVIDIA passthrough and a Yes/Cancel pair",
|
||||
"promptCaption": "Pre-install summary: detected GPUs, LXC containers that already have NVIDIA passthrough, and what the script will do. Nothing is touched until you confirm.",
|
||||
"successAlt": "Terminal output showing the NVIDIA driver 580.105.08 installed successfully and nvidia-smi reporting a Quadro P620",
|
||||
"successCaption": "Successful install — the NVIDIA <code>.run</code> built via DKMS, the persistence service is in place, and <code>nvidia-smi</code> reports the GPU.",
|
||||
"warningTitle": "Pick a driver version your GPU actually supports",
|
||||
"warningBody": "Newer NVIDIA driver branches drop support for older GPU families (e.g. Maxwell / Kepler). If the install finishes but <code>nvidia-smi</code> reports <em>\"No devices were found\"</em> or DKMS errors out, the chosen branch most likely doesn't cover your GPU — re-run the installer and pick an older branch (legacy 470.x for Kepler-era cards, etc.). NVIDIA publishes the per-GPU compatibility on the <a>official driver lookup page</a>.",
|
||||
"whereGoIntro": "Where to go from here:",
|
||||
"whereGoItems": [
|
||||
"<link1>Install NVIDIA Drivers (Host)</link1> — full walk-through of the installer, kernel-compatibility matrix, optional NVENC patch and LXC propagation.",
|
||||
"<link2>Switch GPU Mode (VM ↔ LXC)</link2> — what the inline <em>Switch Mode</em> control actually does.",
|
||||
"<link3>Add GPU to VM (Passthrough)</link3> and <link4>Add GPU to LXC</link4> — first-time assignment of an unbound GPU."
|
||||
]
|
||||
},
|
||||
"coral": {
|
||||
"heading": "Coral TPU / AI Accelerators",
|
||||
"subHeading": "(conditional)",
|
||||
"intro": "Renders when the host has Google Coral or other AI-accelerator devices wired up. Each device opens a modal with its connection type (M.2 / mini-PCIe / USB), PCIe link width, vendor / product ID, kernel driver (<code>apex</code> for PCIe, <code>libedgetpu</code> for USB), kernel modules (<code>gasket</code> + <code>apex</code>), device nodes (<code>/dev/apex_*</code>), Edge TPU runtime status, live temperature and the firmware hardware-warning thresholds.",
|
||||
"imageAlt": "Coral Edge TPU detail modal: PCIe / M.2 connection, PCIe 5.0 GT/s x1 link, vendor 1ac1:089a, kernel driver apex, gasket and apex modules loaded, /dev/apex_0 present, Edge TPU Runtime not installed, temperature 53.5 °C with hardware warning thresholds",
|
||||
"imageCaption": "M.2 Coral with the host kernel modules loaded, the device node up and the firmware temperature warnings exposed. The runtime line goes green once the matching Edge TPU runtime is installed.",
|
||||
"pathsIntro": "Two install paths exist depending on the form factor:",
|
||||
"pathsItems": [
|
||||
"<strong>M.2 / Mini-PCIe</strong> — the host needs the <code>gasket</code> + <code>apex</code> kernel modules built via DKMS so the device node <code>/dev/apex_0</code> appears at boot.",
|
||||
"<strong>USB Accelerator</strong> — the host only needs the Edge TPU user-space runtime (<code>libedgetpu1-std</code>) from Google's APT repository."
|
||||
],
|
||||
"outro": "Both are handled by a single ProxMenux entry — <installLink>Install Coral TPU on the Host</installLink> — which auto-detects what you have. Background and the official runtime live at <a>coral.ai/docs</a>. Once the host side is ready, hand the device to a container with <lxcLink>Add Coral TPU to LXC</lxcLink>."
|
||||
},
|
||||
"storage": {
|
||||
"heading": "Storage Summary",
|
||||
"intro": "Every block device the kernel knows about, grouped by type. For each disk you get the kernel name (<code>sda</code>, <code>nvme0n1</code>, <code>zram0</code> …), the type tag (<em>SSD</em>, <em>HDD</em>, <em>NVMe SSD</em>), the model string and the negotiated link information. Click any disk to open a hardware-info modal with model, serial, capacity, interface and current vs maximum link speed.",
|
||||
"imageAlt": "Storage Summary card listing eleven block devices (SATA SSDs, SATA HDDs, NVMe SSDs and zram) with model strings and negotiated link speeds; the two NVMe drives show 3.0 x4 with the current speed highlighted",
|
||||
"imageCaption": "Eleven devices on this node. SATA links print as <em>SATA <version>, <Gb/s> (current: ...)</em>; NVMe drives print as <em><PCIe gen> x<width></em>.",
|
||||
"nvmeBody": "For NVMe drives the per-card line shows both the negotiated link and the maximum the device supports. When the two don't match (e.g. a Gen3 x4 SSD running at <strong>3.0 x1</strong> because it's sitting in a chipset slot wired to a single lane), the current speed is rendered in amber so the downgrade is visible at a glance — useful when troubleshooting unexpectedly slow disks or after a BIOS update remaps the lanes.",
|
||||
"nvmeModalAlt": "NVMe drive detail modal for nvme0n1: NVMe SSD type, 953.9 GB capacity, current link speed 3.0 x1 highlighted in amber, maximum link speed 3.0 x4, model WDC CL SN720, serial number, PCIe/NVMe interface",
|
||||
"nvmeModalCaption": "NVMe modal showing the lane downgrade — drive supports x4 but the slot is wired x1.",
|
||||
"outro": "SMART data, self-tests, history and the PDF disk report all live one tab over, in <storageLink>Dashboard: Storage tab</storageLink>. The same data feeds the script at <smartLink>SMART Disk Health & Test</smartLink> — running a long test from the script writes the JSON the Monitor displays in <em>Storage → History</em>."
|
||||
},
|
||||
"pci": {
|
||||
"heading": "PCI Devices",
|
||||
"intro": "Every PCI-addressable device, identified by its <strong>PCI BDF</strong> (Bus:Device.Function — e.g. <code>03:00.0</code>) and its device class (<em>Storage Controller</em>, <em>USB Controller</em>, <em>Graphics Card</em>, <em>Network Controller</em>, <em>Audio Controller</em> …). Each card shows the manufacturer, the device name and the <strong>kernel driver currently bound</strong> — which is the field you actually want when troubleshooting passthrough, IOMMU groups or a card the host isn't driving correctly.",
|
||||
"imageAlt": "PCI Devices section listing fifteen devices grouped by class: storage controllers on ahci/nvme, USB controllers, graphics cards (one on vfio-pci, one on the native driver), network controllers on igb / tg3, an audio controller alongside a passed-through GPU",
|
||||
"imageCaption": "Fifteen devices on this node. Note the GPU and its companion audio function both bound to <code>vfio-pci</code> — that's a card prepared for VM passthrough.",
|
||||
"bdfTitle": "Reading the BDF",
|
||||
"bdfBody": "<code>03:00.0</code> means PCI bus <code>03</code>, device <code>00</code>, function <code>0</code>. Multifunction devices like discrete GPUs typically claim <code>.0</code> for the GPU and <code>.1</code> for the HDMI audio function — both have to be passed through together, which is why <link>Switch GPU Mode</link> also handles the orphan-audio cleanup when leaving VM mode."
|
||||
},
|
||||
"usb": {
|
||||
"heading": "USB Devices",
|
||||
"intro": "Every USB device the host enumerates, with manufacturer / product strings, USB version, the <code>bus:device</code> address, the <code>vendor:product</code> ID pair and the kernel driver. The renderer also classifies common roles — <em>Communications</em> (Z-Wave / Zigbee sticks), <em>UPS</em>, storage, HID — so you can spot at a glance which of your sticks is which without cross-referencing IDs.",
|
||||
"imageAlt": "USB Devices card listing three devices: an Aeotec Z-Wave Z-Stick, a ConBee II Zigbee coordinator and an Ellipse ECO UPS, each with USB version, address, vendor:product ID and bound driver",
|
||||
"imageCaption": "Three USB devices — two home-automation radios on <code>usbfs</code> and a UPS on <code>usbfs</code> (NUT talks to it through libusb)."
|
||||
},
|
||||
"power": {
|
||||
"heading": "Power Consumption",
|
||||
"subHeading": "(conditional)",
|
||||
"intro": "Renders only when the host exposes power telemetry. Two independent sources are surfaced when available:",
|
||||
"items": [
|
||||
"<strong>ACPI / IPMI total draw</strong> — whole-system wattage from a board-level sensor or the BMC. Typical on server boards.",
|
||||
"<strong>CPU package power</strong> — read from the Intel RAPL counters (or AMD equivalent). Useful to separate CPU draw from the rest of the system on consumer boards that don't expose a total figure."
|
||||
],
|
||||
"supplyImageAlt": "Power Consumption section showing 198 W total draw via ACPI interface, plus a Power Supplies card with two PSUs both reporting OK (185 W and 5 W output)",
|
||||
"supplyImageCaption": "Server board with a single ACPI power sensor and dual PSUs reported through IPMI — the second PSU is the redundant one, idling at 5 W.",
|
||||
"cpuImageAlt": "Power Consumption section on a consumer board showing only CPU Power 8.7 W via Intel RAPL",
|
||||
"cpuImageCaption": "Consumer board with no whole-system sensor — the section falls back to RAPL CPU-only."
|
||||
},
|
||||
"psu": {
|
||||
"heading": "Power Supplies",
|
||||
"subHeading": "(conditional)",
|
||||
"body": "Server-board / dual-PSU machines via IPMI: presence (PSU 1 / PSU 2 / …), input voltage, output wattage, OK / failed flag. The first thing you check after a power blip on a node with redundant PSUs."
|
||||
},
|
||||
"fans": {
|
||||
"heading": "System Fans",
|
||||
"subHeading": "(conditional)",
|
||||
"body": "Per-fan RPM with a small sparkline (when supported). On boards without per-fan reporting the section falls back to a single chassis-fan reading."
|
||||
},
|
||||
"ups": {
|
||||
"heading": "UPS Status",
|
||||
"subHeading": "(conditional)",
|
||||
"body": "Renders when a NUT (Network UPS Tools) server is configured and reachable. Shows: state (online / on battery / charging / low battery), battery charge percentage, runtime estimate, load percentage, input voltage, model and firmware. The same data feeds the <em>Security & Certificates</em> category of the Health Monitor — a UPS that goes on-battery surfaces immediately."
|
||||
},
|
||||
"dataCollected": {
|
||||
"heading": "How the data is collected",
|
||||
"headerSection": "Section of the tab",
|
||||
"headerEndpoint": "Endpoint",
|
||||
"headerSource": "Source",
|
||||
"rows": [
|
||||
{
|
||||
"section": "Static inventory (PCI, CPU, BIOS)",
|
||||
"endpoint": "/api/hardware",
|
||||
"source": "<code>lspci -vmm</code>, <code>/proc/cpuinfo</code>, <code>dmidecode</code>; cached for the lifetime of the process."
|
||||
},
|
||||
{
|
||||
"section": "Live sensor values",
|
||||
"endpoint": "/api/hardware/live",
|
||||
"source": "<code>sensors</code> (lm-sensors), package temperatures, fan RPM. Refreshed each request."
|
||||
},
|
||||
{
|
||||
"section": "CPU temperature history",
|
||||
"endpoint": "/api/temperature/history",
|
||||
"source": "Time series sampled by the Health Monitor every 5 min and persisted to SQLite."
|
||||
},
|
||||
{
|
||||
"section": "GPU live metrics",
|
||||
"endpoint": "/api/gpu/<slot>/realtime",
|
||||
"source": "NVIDIA: <code>nvidia-smi --query-gpu=...</code>. Intel: <code>intel_gpu_top</code>. AMD: sysfs <code>/sys/class/drm/cardN</code>."
|
||||
}
|
||||
],
|
||||
"codeComment1": "# Cross-check inventory against the OS view",
|
||||
"codeComment2": "# Confirm the GPU card the dashboard sees"
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Install NVIDIA Drivers (Host)",
|
||||
"href": "/docs/hardware/nvidia-host",
|
||||
"tail": " — what the GPU modal's install button runs."
|
||||
},
|
||||
{
|
||||
"label": "Switch GPU Mode (VM ↔ LXC)",
|
||||
"href": "/docs/hardware/switch-gpu-mode",
|
||||
"tail": " — what the inline mode switch on each GPU card does to the host."
|
||||
},
|
||||
{
|
||||
"label": "Install Coral TPU on the Host",
|
||||
"href": "/docs/hardware/install-coral-tpu-host",
|
||||
"tail": " — the Coral kernel module / runtime install."
|
||||
},
|
||||
{
|
||||
"label": "SMART Disk Health & Test",
|
||||
"href": "/docs/disk-manager/smart-disk-test",
|
||||
"tail": " — the script behind the SMART data shown in the Storage tab's disk drill-in."
|
||||
},
|
||||
{
|
||||
"label": "Dashboard: Storage tab",
|
||||
"href": "/docs/monitor/dashboard/storage",
|
||||
"tail": " — full SMART attribute table, self-test history and PDF report."
|
||||
},
|
||||
{
|
||||
"label": "Health Monitor",
|
||||
"href": "/docs/monitor/health-monitor",
|
||||
"tail": " — the CPU & Temperature category that consumes the same sensors."
|
||||
},
|
||||
{
|
||||
"label": "API Reference",
|
||||
"href": "/docs/monitor/api",
|
||||
"tail": " — the hardware and GPU endpoints."
|
||||
},
|
||||
{
|
||||
"label": "Dashboard index",
|
||||
"href": "/docs/monitor/dashboard",
|
||||
"tail": " — the other tabs."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,91 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "ProxMenux Monitor — Dashboard | ProxMenux Documentation",
|
||||
"description": "The dashboard is the main UI of ProxMenux Monitor: nine tabs (System Overview, Storage, Network, VMs & LXCs, Hardware, System Logs, Terminal, Security, Settings) plus the global header with the Health Monitor status pill."
|
||||
},
|
||||
"header": {
|
||||
"title": "Dashboard",
|
||||
"description": "The dashboard is the everyday view of ProxMenux Monitor — nine tabs each focused on one slice of the host plus a global header with the Health Monitor status pill, the node identity and the quick-refresh control.",
|
||||
"section": "ProxMenux Monitor"
|
||||
},
|
||||
"oneHeader": {
|
||||
"title": "One header, nine tabs",
|
||||
"body": "The header (logo, node name, status pill, uptime, refresh, theme toggle) stays visible everywhere. The active tab below it changes the entire content area. The status pill colour mirrors the worst category of the <link>Health Monitor</link> — it's the same data point seen from the dashboard."
|
||||
},
|
||||
"tabs": {
|
||||
"heading": "The nine tabs",
|
||||
"intro": "Each tab has its own dedicated page. Pages are added incrementally as the documentation is filled in; below is the full list and what each one is responsible for.",
|
||||
"headerTab": "Tab",
|
||||
"headerOwns": "What it owns",
|
||||
"rows": [
|
||||
{
|
||||
"name": "System Overview",
|
||||
"linksTo": "/docs/monitor/dashboard/system-overview",
|
||||
"owns": "CPU / memory / temperature widgets, active VM & LXC count, historical metrics charts, storage and network summaries. Default landing tab."
|
||||
},
|
||||
{
|
||||
"name": "Storage",
|
||||
"owns": "Proxmox pools, physical disks, SMART data, ZFS state, wear & lifetime, observation history."
|
||||
},
|
||||
{
|
||||
"name": "Network",
|
||||
"owns": "Every interface (physical / bond / bridge / OVS), IP/MAC, RX/TX graphs, historical RRD per interface."
|
||||
},
|
||||
{
|
||||
"name": "VMs & LXCs",
|
||||
"owns": "Inventory of guests, drill-in for config / metrics / logs, start / stop / reboot / shutdown actions."
|
||||
},
|
||||
{
|
||||
"name": "Hardware",
|
||||
"owns": "CPU model and topology, memory layout, PCIe topology, GPUs with per-slot real-time monitoring."
|
||||
},
|
||||
{
|
||||
"name": "System Logs",
|
||||
"owns": "Live <code>journalctl</code> with filters, Proxmox task history, notification log, downloadable log bundles."
|
||||
},
|
||||
{
|
||||
"name": "Terminal",
|
||||
"owns": "Browser shell to host or to any VM/CT, powered by <code>xterm.js</code> over WebSockets."
|
||||
},
|
||||
{
|
||||
"name": "Security",
|
||||
"owns": "Auth setup, password / 2FA / API tokens, audit log, optional Fail2Ban panel, Secure Gateway deployment."
|
||||
},
|
||||
{
|
||||
"name": "Settings",
|
||||
"owns": "Notification channels, AI provider, suppression durations, branding, advanced flags."
|
||||
}
|
||||
]
|
||||
},
|
||||
"headerAnatomy": {
|
||||
"heading": "Header anatomy",
|
||||
"items": [
|
||||
"<strong>Logo + product name</strong> on the left. The logo turns into an \"update available\" variant when a newer Monitor release is detected.",
|
||||
"<strong>Node identity</strong> — the Proxmox node name resolved from <code>pvesh get /nodes</code>, falling back to <code>hostname</code>.",
|
||||
"<strong>Health status pill</strong> — Healthy (green), Warning (yellow), Critical (red). Click it to open the Health Monitor modal. An extra blue <em>info</em> badge appears when there are dismissed items still inside their suppression window.",
|
||||
"<strong>Uptime</strong> — host uptime, formatted human-readable.",
|
||||
"<strong>Refresh button</strong> — re-issues all the live API calls without a full page reload.",
|
||||
"<strong>Theme toggle</strong> — light / dark / system. Persisted in <code>localStorage</code>."
|
||||
]
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "System Overview tab",
|
||||
"href": "/docs/monitor/dashboard/system-overview",
|
||||
"tail": " — the landing tab, fully documented."
|
||||
},
|
||||
{
|
||||
"label": "Health Monitor",
|
||||
"href": "/docs/monitor/health-monitor",
|
||||
"tail": " — the modal behind the header status pill, ten categories deep-dive."
|
||||
},
|
||||
{
|
||||
"label": "Architecture",
|
||||
"href": "/docs/monitor/architecture",
|
||||
"tail": " — how the dashboard talks to the Flask backend."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,225 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "ProxMenux Monitor — Dashboard: Network tab | ProxMenux Documentation",
|
||||
"description": "The Network tab inventories every interface on the host: physical NICs, bridges, bonds, VLANs and the VM/LXC virtual interfaces. Per-interface drill-in shows IP / MAC / state / bond members / traffic since boot and a historical RRD chart."
|
||||
},
|
||||
"header": {
|
||||
"title": "Dashboard: Network tab",
|
||||
"description": "Every interface the kernel reports — physical NICs, bridges, bonds, VLANs and VM/LXC virtual ports — grouped into three cards. Each row is clickable and opens a drill-in with addressing, traffic counters and historical RRD data.",
|
||||
"section": "ProxMenux Monitor · Dashboard"
|
||||
},
|
||||
"intro": {
|
||||
"title": "Live + historical, both included",
|
||||
"body": "Live state comes from <code>psutil.net_if_stats()</code> and <code>ip</code>; historical bandwidth from Proxmox's RRD store via <code>/api/network/<interface>/metrics</code>. The page refreshes every ~5 seconds for live counters and pulls fresh RRD data on demand for the chart."
|
||||
},
|
||||
"topRow": {
|
||||
"heading": "Top row: four stat cards",
|
||||
"headerCard": "Card",
|
||||
"headerWhat": "What it shows",
|
||||
"rows": [
|
||||
{
|
||||
"card": "Network Traffic",
|
||||
"what": "Aggregate RX / TX rate across all interfaces, formatted in the right unit (bps / Kbps / Mbps / Gbps)."
|
||||
},
|
||||
{
|
||||
"card": "Active Interfaces",
|
||||
"what": "Two counters: <em>Physical X / Y</em> and <em>Bridges X / Y</em> (active over total). The first counter you watch when something stops working."
|
||||
},
|
||||
{
|
||||
"card": "Network Status",
|
||||
"what": "Quick verdict — Healthy / Warning / Critical based on link state, gateway reachability and bridge integrity. Mirrors the <em>Network Interfaces</em> category of the Health Monitor."
|
||||
},
|
||||
{
|
||||
"card": "Network Latency",
|
||||
"what": "Round-trip time to the gateway with a sparkline. Clicking the card opens the <strong>Network Latency modal</strong> documented further down — historical view + on-demand ping test against gateway / Cloudflare / Google with a downloadable PDF report."
|
||||
}
|
||||
]
|
||||
},
|
||||
"groups": {
|
||||
"heading": "Three interface groups",
|
||||
"intro": "Below the top row, three cards split the inventory by role. Each card has its own active-count badge in the header. Interface <strong>type</strong> is identified at a glance by a coloured badge on every row:",
|
||||
"badges": [
|
||||
"Blue <strong>Physical</strong> — real NIC.",
|
||||
"Green <strong>Bridge</strong> — Linux bridge or OVS bridge (<code>vmbr*</code>).",
|
||||
"Purple <strong>Bond</strong> — bond / LACP / active-backup aggregator.",
|
||||
"Cyan <strong>VLAN</strong> — VLAN sub-interface (<code>vmbr0.10</code>, <code>eno1.42</code>, …)."
|
||||
],
|
||||
"clickable": "<strong>Every row is clickable</strong> — physical, virtual, bridge or bond — and opens the per-interface drill-in described further down (basic info, IPs, traffic counters, historical RX/TX chart from Proxmox's RRD store).",
|
||||
"physicalTitle": "Physical Interfaces",
|
||||
"physicalBody": "Every NIC the kernel sees as a real device — <code>eno1</code>, <code>enp4s0</code>, <code>eth0</code>, <code>wlp3s0</code>, etc. One row per device with the blue <strong>Physical</strong> badge and the link state. <em>Bond members</em> (NICs enslaved to a bond) are shown here too, with a hint pointing to the parent bond.",
|
||||
"bridgeTitle": "Bridge Interfaces",
|
||||
"bridgeBody": "Linux bridges (<code>vmbr0</code>, <code>vmbr1</code>, …) and the OVS bridges Proxmox manages. Each row shows the green <strong>Bridge</strong> badge, the underlying physical interface (when it's a single-port bridge), and the bridge state. Bonds visible at this layer get the purple <strong>Bond</strong> badge; VLAN sub-interfaces get the cyan <strong>VLAN</strong> badge.",
|
||||
"vmTitle": "VM / LXC Interfaces",
|
||||
"vmBody": "The <code>tap*</code> and <code>veth*</code> interfaces created when guests start — one per virtual NIC. The card header shows <em>X / Y Active</em>; rows are linked to the VM/CT they belong to so you can jump directly to the guest in the VMs & LXCs tab. Inactive entries hang around briefly after a guest stops; they age out within a refresh cycle."
|
||||
},
|
||||
"drillIn": {
|
||||
"heading": "Per-interface drill-in",
|
||||
"intro": "Clicking any row opens a modal with five blocks:",
|
||||
"headerBlock": "Block",
|
||||
"headerContents": "Contents",
|
||||
"rows": [
|
||||
{
|
||||
"block": "Basic Information",
|
||||
"contents": "Interface name, type (physical / bridge / bond / VLAN / vm), MAC address, state (up / down), MTU, and the underlying physical interface for non-physical types."
|
||||
},
|
||||
{
|
||||
"block": "Bond Members",
|
||||
"contents": "Only for bonds. Lists each enslaved NIC with the active / failed flag, the bond mode (active-backup / 802.3ad / balance-alb / …) and the primary interface when configured."
|
||||
},
|
||||
{
|
||||
"block": "IP Addresses",
|
||||
"contents": "Every IPv4 / IPv6 address with the prefix length. Auto-configured link-local addresses are listed but greyed out."
|
||||
},
|
||||
{
|
||||
"block": "Historical chart",
|
||||
"contents": "RX / TX bandwidth over the selected timeframe (1 hour / 24 hours / 7 days / 30 days / 1 year), pulled from <code>/api/network/<interface>/metrics</code> (Proxmox RRD)."
|
||||
},
|
||||
{
|
||||
"block": "Traffic since last boot",
|
||||
"contents": "Total RX / TX bytes and packets since the host last booted, plus error and drop counters."
|
||||
}
|
||||
],
|
||||
"inactiveTitle": "Inactive interfaces still open the drill-in",
|
||||
"inactiveBody": "For an interface that is <em>down</em>, the modal renders a small \"Interface Inactive\" banner and skips the live counters. Configuration (addresses, bond members) and historical data are still shown — useful when diagnosing why something failed and when."
|
||||
},
|
||||
"latency": {
|
||||
"heading": "Network Latency modal",
|
||||
"intro": "Clicking the <em>Network Latency</em> card in the top row opens a dedicated modal. It has two modes (historical and on-demand test), three target options and a downloadable PDF report.",
|
||||
"targetsTitle": "Targets",
|
||||
"targetsIntro": "A target dropdown at the top of the modal selects what gets pinged:",
|
||||
"targets": [
|
||||
"<strong>Gateway</strong> — your LAN router. Tests the local-network leg only; useful when you suspect a switch / cabling issue and want to rule out the WAN.",
|
||||
"<strong>Cloudflare (1.1.1.1)</strong> — public DNS resolver, anycast network. Tests the WAN leg.",
|
||||
"<strong>Google (8.8.8.8)</strong> — alternative public target, useful as a sanity check or when Cloudflare is regionally degraded."
|
||||
],
|
||||
"mode1Title": "Mode 1 — Historical view",
|
||||
"mode1Alt": "Network Latency modal in historical mode — Gateway target with a 1-hour timeframe and the past samples plotted",
|
||||
"mode1Caption": "Historical view — Gateway target over the last hour, fed from the latency-history database written every 60 seconds by the temperature/latency collector thread.",
|
||||
"mode1Body1": "The default mode when the modal opens. A second dropdown picks the timeframe (<em>1 Hour / 24 Hours / 7 Days / 30 Days / 1 Year</em>); data resolution drops with the window so the chart stays readable. Headline stats — Current / Min / Avg / Max — render above the chart, with a status pill (Excellent / Good / Fair / Poor) reflecting the current value against the thresholds below.",
|
||||
"mode1Body2": "Source: the same latency samples the Health Monitor uses to detect persistent network slowdowns — sampled every 60 seconds against the gateway by the background <code>_temperature_collector_loop</code> thread, written to a local SQLite history.",
|
||||
"mode2Title": "Mode 2 — Real-time test",
|
||||
"mode2Alt": "Network Latency modal running a real-time ping test against Cloudflare — progress bar at 50%, live samples accumulating on the chart",
|
||||
"mode2Caption": "Real-time test against Cloudflare — 2-minute run with one reading every 5 seconds, samples plotted as they arrive. Click <em>Stop</em> to end early; <em>Test Again</em> appends more samples to the same dataset.",
|
||||
"mode2Intro": "Switching the target to Cloudflare or Google starts an on-demand ping test. Behaviour:",
|
||||
"mode2Items": [
|
||||
"<strong>Duration</strong> — 2 minutes per run, with a progress bar and a remaining-seconds counter.",
|
||||
"<strong>Cadence</strong> — one reading every 5 seconds (24 readings per run).",
|
||||
"<strong>Method</strong> — ICMP Echo Request (<code>ping</code>), 3 consecutive pings per sample, latency averaged.",
|
||||
"<strong>Stop</strong> — ends the test immediately; partial data is preserved.",
|
||||
"<strong>Test Again</strong> — appends new samples to the existing dataset rather than starting fresh, so you can build a longer record across several runs.",
|
||||
"<strong>Live status pill</strong> — re-evaluates after every sample using the same Excellent / Good / Fair / Poor thresholds."
|
||||
],
|
||||
"thresholdsTitle": "Performance thresholds",
|
||||
"headerStatus": "Status",
|
||||
"headerRange": "Range",
|
||||
"headerImpact": "Practical impact",
|
||||
"thresholdRows": [
|
||||
{
|
||||
"status": "Excellent",
|
||||
"range": "< 50 ms",
|
||||
"impact": "Optimal for real-time apps, gaming and video calls."
|
||||
},
|
||||
{
|
||||
"status": "Good",
|
||||
"range": "50 – 100 ms",
|
||||
"impact": "Acceptable for most applications with minimal impact."
|
||||
},
|
||||
{
|
||||
"status": "Fair",
|
||||
"range": "100 – 200 ms",
|
||||
"impact": "Noticeable delay. May affect VoIP and interactive apps."
|
||||
},
|
||||
{
|
||||
"status": "Poor",
|
||||
"range": "> 200 ms",
|
||||
"impact": "Significant latency. Investigation recommended."
|
||||
}
|
||||
],
|
||||
"reportTitle": "Network Latency Report (PDF)",
|
||||
"reportIntro": "Both modes have a <strong>Report</strong> button next to the target selector. Clicking it generates a PDF with everything you'd send to your ISP if you wanted to make a case for poor service.",
|
||||
"reportPreviewAlt": "First page of the Network Latency Report PDF — Executive Summary with the gauge, latency statistics, the latency graph and the threshold guide",
|
||||
"reportPreviewCaption": "First page of the Network Latency Report — Executive Summary with the gauge dial and headline stats, the per-second latency graph, and the threshold guide. Page 2 onwards has the per-sample table and methodology.",
|
||||
"downloadLabel": "Download sample Network Latency report (PDF)",
|
||||
"sectionsIntro": "The report has six sections:",
|
||||
"sections": [
|
||||
"<strong>Executive Summary</strong> — gauge dial (0–300+ ms with green / yellow / red zones), the status verdict (EXCELLENT / GOOD / FAIR / POOR), the target / mode / sample count and a one-line packet-loss summary.",
|
||||
"<strong>Latency Statistics</strong> — Current / Min / Avg / Max as four large tiles, plus Sample Count, Packet Loss (avg) and Test Period.",
|
||||
"<strong>Latency Graph</strong> — area chart of every sample over the test window with a min/avg/max y-axis grid.",
|
||||
"<strong>Performance Thresholds</strong> — the same four-tier scale documented above, with a coloured dot per tier.",
|
||||
"<strong>Detailed Test Results</strong> — numbered table with timestamp, latency, packet loss and status for every sample. Useful for spotting micro-bursts that the headline averages hide.",
|
||||
"<strong>Methodology</strong> — test method (ICMP Echo Request), samples per test (3 consecutive pings), target name, target IP and a final \"Performance Assessment\" paragraph derived from the verdict."
|
||||
],
|
||||
"useCaseTitle": "Use case: claims to your ISP",
|
||||
"useCaseBody": "Run the real-time test against Cloudflare for 2 minutes during a moment of perceived slowness, then click <em>Test Again</em> a few times to extend the dataset, and finally <em>Report</em>. The PDF carries the full per-sample table plus the methodology block — ISPs typically accept this as evidence, especially when correlated with timestamps from a separate complaint."
|
||||
},
|
||||
"excluding": {
|
||||
"heading": "Excluding noisy interfaces",
|
||||
"body1": "Like storages, individual interfaces can be excluded from health monitoring — useful for intentionally disabled bridges, test interfaces or NICs that are physically removed but still in the config. The flag is stored in the <code>excluded_interfaces</code> table and respected by the Health Monitor cycle: no warnings, no notifications, no contribution to the header status pill.",
|
||||
"body2": "From the row's context menu, pick <em>Exclude from monitoring</em>. The interface stays visible in the dashboard with a purple <strong>excluded</strong> badge, and you can re-enable monitoring from the same menu."
|
||||
},
|
||||
"dataCollected": {
|
||||
"heading": "How the data is collected",
|
||||
"headerSection": "Section of the tab",
|
||||
"headerEndpoint": "Endpoint",
|
||||
"headerSource": "Source",
|
||||
"rows": [
|
||||
{
|
||||
"section": "Interface inventory",
|
||||
"endpoint": "/api/network",
|
||||
"source": "<code>ip -j addr</code> + <code>ip -j link</code> + bond / bridge introspection."
|
||||
},
|
||||
{
|
||||
"section": "Summary cards",
|
||||
"endpoint": "/api/network/summary",
|
||||
"source": "Aggregation over the inventory plus per-interface up/down counts."
|
||||
},
|
||||
{
|
||||
"section": "Per-interface RX/TX time series",
|
||||
"endpoint": "/api/network/<iface>/metrics",
|
||||
"source": "<code>/proc/net/dev</code> sampled by the Health Monitor with byte-rate calculation."
|
||||
},
|
||||
{
|
||||
"section": "Latency: current",
|
||||
"endpoint": "/api/network/latency/current",
|
||||
"source": "A short <code>ping</code> burst against the configured gateway / target."
|
||||
},
|
||||
{
|
||||
"section": "Latency: historical",
|
||||
"endpoint": "/api/network/latency/history",
|
||||
"source": "Persisted samples — every 5 min by the Health Monitor cycle."
|
||||
}
|
||||
],
|
||||
"codeComment1": "# Cross-check the interface state seen by the dashboard",
|
||||
"codeComment2": "# Verify a current latency probe end-to-end"
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Health Monitor",
|
||||
"href": "/docs/monitor/health-monitor",
|
||||
"tail": " — the Network category and the latency-history thresholds."
|
||||
},
|
||||
{
|
||||
"label": "API Reference",
|
||||
"href": "/docs/monitor/api",
|
||||
"tail": " — the network and latency endpoints."
|
||||
},
|
||||
{
|
||||
"label": "Integrations",
|
||||
"href": "/docs/monitor/integrations",
|
||||
"tail": " — Prometheus scrape exposes the same network metrics."
|
||||
},
|
||||
{
|
||||
"label": "Dashboard index",
|
||||
"href": "/docs/monitor/dashboard",
|
||||
"tail": " — the other tabs."
|
||||
},
|
||||
{
|
||||
"label": "ProxMenux → Network",
|
||||
"href": "/docs/network",
|
||||
"tail": " — the actions side: bridge analysis, persistent interface names, backup & restart, iperf3."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,246 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "ProxMenux Monitor — Dashboard: Security tab | ProxMenux Documentation",
|
||||
"description": "The Security tab groups every protection-related control in two columns: ProxMenux Monitor (Authentication, SSL/HTTPS, API tokens, Secure Gateway) and Proxmox VE (Firewall, Fail2Ban, Lynis audit). Step-by-step Secure Gateway wizard, Lynis audit walkthrough, Fail2Ban install and rule tuning."
|
||||
},
|
||||
"header": {
|
||||
"title": "Dashboard: Security tab",
|
||||
"description": "Every security control in the dashboard, grouped into two clearly-labelled blocks: configuration of the Monitor itself (auth, SSL, tokens, Secure Gateway) and configuration of the Proxmox host it watches (firewall, Fail2Ban, Lynis).",
|
||||
"section": "ProxMenux Monitor · Dashboard"
|
||||
},
|
||||
"intro": {
|
||||
"title": "Two scopes, one tab",
|
||||
"body": "The Security tab is divided into two clearly separated sections: <strong>ProxMenux Monitor</strong> (settings for the dashboard itself) and <strong>Proxmox VE</strong> (settings for the host underneath). Cards render conditionally — Fail2Ban and Lynis only appear once installed."
|
||||
},
|
||||
"monitor": {
|
||||
"heading": "ProxMenux Monitor",
|
||||
"intro": "Four cards control how the dashboard itself is reached and authenticated."
|
||||
},
|
||||
"auth": {
|
||||
"heading": "Authentication",
|
||||
"imageAlt": "Authentication card showing status Enabled, Logout, Change Password, Two-Factor Authentication info, Enable Two-Factor Authentication and Disable Authentication buttons",
|
||||
"imageCaption": "Authentication card with auth enabled — status badge, Change Password, Enable 2FA and the (destructive) Disable Authentication action.",
|
||||
"intro": "The card lets you manage the dashboard's own login. The full first-launch flow, password policy, TOTP enrolment screens and lost-authenticator recovery are documented in <link>Access & Authentication</link> — this card is the day-to-day surface for those settings:",
|
||||
"items": [
|
||||
"<strong>Authentication Status</strong> — badge showing <em>Enabled</em> / <em>Disabled</em> / <em>Declined</em>.",
|
||||
"<strong>Change Password</strong> — current password + new password + confirmation.",
|
||||
"<strong>Enable / Disable Two-Factor Authentication</strong> — opens the QR enrolment dialog when enabling, asks for the current password when disabling.",
|
||||
"<strong>Disable Authentication</strong> — destructive action that re-shows the first-launch <em>Protect your dashboard?</em> dialog on next visit."
|
||||
]
|
||||
},
|
||||
"ssl": {
|
||||
"heading": "SSL / HTTPS",
|
||||
"imageAlt": "SSL / HTTPS card showing HTTP No SSL status, detected Proxmox host certificate with Subject, Issuer, Expires, Use Proxmox Certificate button and Use Custom Certificate option",
|
||||
"imageCaption": "SSL / HTTPS card with HTTPS off. The Monitor detects the certificate already installed on the Proxmox host and offers it as a one-click option, with a fallback to <em>Use Custom Certificate</em> if you have your own files elsewhere.",
|
||||
"intro": "Serves the dashboard over HTTPS without any reverse proxy in front. The card auto-detects the certificate that Proxmox itself uses (under <code>/etc/pve/local/</code>) and shows the subject, issuer and expiry so you can verify it before activating. Two paths to enable HTTPS:",
|
||||
"items": [
|
||||
"<strong>Use Proxmox Certificate</strong> — one click. The Monitor reuses the certificate installed on the host. Good fit for users who already have their PVE running on the same DNS name as the dashboard.",
|
||||
"<strong>Use Custom Certificate</strong> — paste absolute paths to your own <code>.pem</code> certificate and <code>.key</code> private key. The paths are validated before the service restarts; if loading fails, the dashboard falls back to HTTP automatically (no broken state)."
|
||||
],
|
||||
"enabledAlt": "SSL/HTTPS card with HTTPS Enabled status, Active Certificate showing pve-ssl.pem and pve-ssl.key paths, and a Disable HTTPS button",
|
||||
"enabledCaption": "HTTPS active — the card surfaces the certificate currently in use, the file paths and a <em>Disable HTTPS</em> action that drops back to HTTP on the same port.",
|
||||
"acmeTitle": "ACME / Let's Encrypt via Proxmox",
|
||||
"acmeBody": "If your Proxmox node already has Let's Encrypt configured at <em>Datacenter → Certificates → ACME</em>, that's the certificate the host serves to browsers — and that's what the dashboard reuses when you click <em>Use Proxmox Certificate</em>. You don't need separate ACME plumbing for the Monitor.",
|
||||
"walkthroughLink": "For a step-by-step walkthrough — including how the Monitor auto-detects the ACME-uploaded certificate, what gets written to disk, and how to fall back to a custom <code>.pem</code> / <code>.key</code> pair — see <link>HTTPS for ProxMenux Monitor</link>."
|
||||
},
|
||||
"apiTokens": {
|
||||
"heading": "API Access Tokens",
|
||||
"emptyAlt": "API Access Tokens card empty state with About API Tokens info box and Generate New API Token button",
|
||||
"emptyCaption": "API Access Tokens card on a fresh installation — the <em>About API Tokens</em> box summarises lifetime, usage and how to embed the token in <code>Authorization: Bearer</code> headers.",
|
||||
"intro": "Long-lived tokens (1 year) for unattended integrations — Homepage widgets, Home Assistant REST sensors, Grafana scrapers, n8n flows, custom scripts. The card walks you through three states: empty → form → generated.",
|
||||
"generateBody": "<strong>Generate a token.</strong> Click <em>Generate New API Token</em>. The form asks for a descriptive <em>Token Name</em> (helps you identify it in the active list later) and your <em>password</em> as second-factor confirmation. If 2FA is enabled, the form additionally asks for the current TOTP code.",
|
||||
"generateAlt": "API Access Tokens generate form with Token Name input, Password input, Generate Token and Cancel buttons",
|
||||
"generateCaption": "The Generate API Token form — fill in a name and confirm with your password (and TOTP if 2FA is on).",
|
||||
"saveBody": "<strong>Save the token immediately.</strong> The full token string is shown <strong>only once</strong> after generation. The card highlights this with an amber notice and a copy button. There's no way to retrieve it later — you'll only see the prefix in the Active Tokens list.",
|
||||
"generatedAlt": "API token generated successfully with masked token, copy button, instructions for Authorization Bearer header and Active Tokens list with prefix",
|
||||
"generatedCaption": "Token generated — the value is shown once with a copy button and the exact <code>Authorization: Bearer</code> snippet. Below, the Active Tokens list keeps name + prefix + creation date so you can revoke individual tokens later.",
|
||||
"outro": "The card shows every active token with a <em>Revoke</em> button per row. Revoking invalidates the token immediately; any integration using it stops working from that moment. Cookbooks for Homepage, Home Assistant, n8n and Prometheus are in <link>Integrations</link>."
|
||||
},
|
||||
"gateway": {
|
||||
"heading": "Secure Gateway",
|
||||
"cardAlt": "Secure Gateway card with Deploy Secure Gateway button before any gateway has been deployed",
|
||||
"cardCaption": "Secure Gateway card on a fresh setup — one button starts the wizard.",
|
||||
"intro": "Reach the dashboard, the Proxmox web UI and any guest from anywhere on your <a>Tailscale</a> tailnet, without exposing any port to the public internet. The Monitor deploys an Alpine LXC container on the host running <code>tailscaled</code> as a subnet router; once approved in the Tailscale admin console, your remote devices reach the host's LAN IP from anywhere.",
|
||||
"wizardTitle": "Step-by-step wizard",
|
||||
"wizardIntro": "Before clicking <em>Deploy Secure Gateway</em>, generate an auth key in your Tailscale admin console — the wizard will ask for it in step 2.",
|
||||
"step0Title": "0. Generate the Tailscale auth key",
|
||||
"step0Body": "Sign in to <a>login.tailscale.com/admin/settings/keys</a> and click <em>Generate auth key…</em>. Choose a <em>pre-authenticated</em> key (so the gateway doesn't need an interactive Tailscale login), and copy the value — it's shown only once.",
|
||||
"step0Alt": "Tailscale admin console Settings Keys page with Generate auth key button highlighted",
|
||||
"step0Caption": "Tailscale admin console — <em>Settings → Keys → Generate auth key…</em>. Use a free Tailscale account if you don't have one yet (link inside the wizard).",
|
||||
"step1Title": "1. Open the wizard",
|
||||
"step1Body": "Back on the Security tab, click <em>Deploy Secure Gateway</em>. The first step is an intro with what you'll get and what you need.",
|
||||
"step1Alt": "Secure Gateway Setup wizard intro step explaining what the gateway provides: VPN access, no port forwarding, end-to-end encryption",
|
||||
"step1Caption": "Step 1 — overview of what the Secure Gateway provides and a reminder that you'll need a free Tailscale account.",
|
||||
"step2Title": "2. Tailscale Authentication",
|
||||
"step2Body": "Paste the auth key from step 0 and pick a hostname (this is what the gateway will appear as in the Tailscale admin console — typically <code>proxmox-gateway</code> or your node name).",
|
||||
"step2Alt": "Secure Gateway wizard step asking for Tailscale Auth Key and Device Hostname with link to generate the key",
|
||||
"step2Caption": "Step 2 — paste the pre-auth key and choose the device hostname. The link below the field opens the Tailscale page from step 0 if you skipped ahead.",
|
||||
"step3Title": "3. Access Scope",
|
||||
"step3Intro": "Choose what your tailnet can reach through the gateway:",
|
||||
"step3Items": [
|
||||
"<strong>Proxmox Only</strong> — only the Proxmox UI and the Monitor. Smallest attack surface.",
|
||||
"<strong>Full Local Network</strong> — the entire LAN subnet (auto-detected from the host's primary interface). Lets you reach NAS, printers, VMs and any other LAN device.",
|
||||
"<strong>Custom Subnets</strong> — list specific CIDRs. For multi-VLAN setups where you want to expose some segments but not others."
|
||||
],
|
||||
"step3Alt": "Secure Gateway wizard Access Scope step with three options: Proxmox Only, Full Local Network, Custom Subnets",
|
||||
"step3Caption": "Step 3 — pick the access scope. <em>Full Local Network</em> auto-fills with the detected LAN subnet.",
|
||||
"step4Title": "4. Advanced Options (optional)",
|
||||
"step4Intro": "Two optional toggles. Both are <strong>off by default</strong>:",
|
||||
"step4Items": [
|
||||
"<strong>Exit Node</strong> — when enabled and selected from a remote device, all that device's internet traffic exits through the Proxmox host's WAN. Useful for travel scenarios where you want your phone's traffic to look like home.",
|
||||
"<strong>Accept Routes</strong> — let this gateway reach networks advertised by <em>other</em> tailnet subnet routers (for multi-site setups)."
|
||||
],
|
||||
"step4Alt": "Secure Gateway wizard Advanced Options step with Exit Node and Accept Routes checkboxes",
|
||||
"step4Caption": "Step 4 — Exit Node and Accept Routes. Skip both if all you want is dashboard access from your phone or laptop.",
|
||||
"step5Title": "5. Review & Deploy",
|
||||
"step5Body": "Final summary before the deploy starts. The wizard reminds you that one manual step in Tailscale admin is still pending after deploy: <strong>approving the subnet route</strong>.",
|
||||
"step5Alt": "Secure Gateway wizard Review and Deploy step with Configuration Summary showing hostname, access mode, networks, exit node, accept routes and a Deploy Gateway button",
|
||||
"step5Caption": "Step 5 — review the configuration and deploy. The blue notice at the bottom flags the pending route approval.",
|
||||
"approvalTitle": "One last manual step in Tailscale admin",
|
||||
"approvalBody": "After deploy, go back to <a>login.tailscale.com/admin/machines</a> and approve the subnet route the gateway is advertising. Until you do, remote devices on your tailnet won't actually be able to reach LAN IPs through the gateway. Tailscale marks pending routes with a yellow warning on the device row — click <em>Edit route settings</em> and tick the route box."
|
||||
},
|
||||
"pve": {
|
||||
"heading": "Proxmox VE",
|
||||
"intro": "The host's own protections — firewall, intrusion prevention and security audit. Last two only render when their respective tools are installed."
|
||||
},
|
||||
"firewall": {
|
||||
"heading": "Proxmox Firewall",
|
||||
"imageAlt": "Proxmox Firewall card showing Cluster Firewall and Host Firewall status, Quick Access Rules for ProxMenux Monitor and Proxmox Web UI, Rules Overview counters and a list of Firewall Rules with Add Rule button",
|
||||
"imageCaption": "Proxmox Firewall card — cluster-level and host-level enable / disable toggles, common ports as <em>Quick Access Rules</em>, totals across <em>Rules Overview</em>, and the full rule list with <em>+ Add Rule</em>.",
|
||||
"intro": "The card surfaces the Proxmox VE built-in firewall (which is independent from any host-level <code>iptables</code> / <code>nftables</code> you may run alongside). Three blocks:",
|
||||
"items": [
|
||||
"<strong>Cluster Firewall + Host Firewall</strong> — global toggles. The cluster firewall must be active for any host-level rule to take effect; the card flags this dependency inline.",
|
||||
"<strong>Quick Access Rules</strong> — pre-defined rows for ports that matter to ProxMenux itself: <code>8008/TCP</code> (Monitor), <code>8006/TCP</code> (Proxmox Web UI). Each row shows the current allow / deny / unprotected state. The Proxmox Web UI is allowed via the <em>built-in</em> Proxmox firewall macro and can't be removed accidentally.",
|
||||
"<strong>Rules Overview</strong> — counters for total rules, accept rules, drop / reject rules and distinct ports protected. Numbers are read from <code>/etc/pve/firewall/cluster.fw</code> and <code>/etc/pve/nodes/<node>/host.fw</code>.",
|
||||
"<strong>Firewall Rules</strong> — full list with action / protocol / port / source / level. <em>+ Add Rule</em> opens an inline editor; the trash icon on each row removes the rule. Edits write to the same files Proxmox uses, so changes also appear in the Proxmox UI (Datacenter / Node → Firewall)."
|
||||
]
|
||||
},
|
||||
"fail2ban": {
|
||||
"heading": "Fail2Ban",
|
||||
"subHeading": "(conditional)",
|
||||
"whatIs": "<strong>What it is.</strong> Fail2Ban is an open-source intrusion-prevention daemon that watches log files for repeated failed login attempts and bans the offending IP at the firewall level. It's the de-facto answer to brute-force scanners that hit SSH and web login forms 24/7. ProxMenux wires it to three jails by default: SSH, the Proxmox web UI login (port 8006), and the ProxMenux Monitor login (port 8008).",
|
||||
"notBundled": "Fail2Ban is <strong>not bundled</strong>. The card detects whether it's installed and adapts: when missing it offers a one-click install; once installed it shows live jail status, banned IPs and lets you tune retries / ban time per jail.",
|
||||
"notInstalledAlt": "Fail2Ban card showing Not Installed state with explanation of what it would configure: SSH, Proxmox web UI and ProxMenux Monitor protection with nftables backend, plus an Install and Configure Fail2Ban button",
|
||||
"notInstalledCaption": "Fail2Ban card before install — the blue box previews what the install would configure.",
|
||||
"clickBody": "Click <em>Install and Configure Fail2Ban</em> and you get a confirmation modal listing every change the script will make on the host:",
|
||||
"confirmAlt": "Install Fail2Ban confirmation modal listing SSH protection aggressive mode, Proxmox web interface protection port 8006, ProxMenux Monitor protection port 8008, auto-detected nftables backend, journald log level adjustment and SSH MaxAuthTries hardening",
|
||||
"confirmCaption": "Install confirmation — explicit list of jails, tweaks to journald log level (so the auth jail can read SSH events) and an SSH-hardening side effect (<code>MaxAuthTries=3</code>).",
|
||||
"confirmIntro": "Confirmation triggers a streaming install panel (apt + jail config + tests). Same plumbing as the ProxMenux CLI installer.",
|
||||
"progressAlt": "Fail2Ban Installation panel showing live install log: package install, journald MaxLevelStore tuned for auth logging, jails configured, nftables backend detected, MaxAuthTries hardening, fail2ban-client communication test, completion message",
|
||||
"progressCaption": "Install in progress — every step is logged in the panel. Connection-closed at the bottom marks the end of the streaming session.",
|
||||
"afterInstall": "After install the card flips to the live status view: jails configured, banned IPs counter, recent ban events. The big tabs split <em>Jails & Banned IPs</em> from <em>Recent Activity</em> (the last N entries from the Fail2Ban log).",
|
||||
"activeAlt": "Fail2Ban card after install with Active status, three jails configured (proxmenux, proxmox, sshd), Banned IPs counter, Total Bans, Failed Attempts, and per-jail rows with retries, ban time, window and gear icon",
|
||||
"activeCaption": "Fail2Ban active — the three default jails (<code>proxmenux</code>, <code>proxmox</code>, <code>sshd</code>) with their retries / ban time / window settings.",
|
||||
"tuneBody": "<strong>Tune jail rules.</strong> Click the gear icon on any jail row to adjust <em>Max Retries</em>, <em>Ban Time</em> (use a permanent ban if you want offenders blocked until you manually unban) and <em>Find Time</em> (the rolling window for counting retries). Common values are documented inside the form.",
|
||||
"configAlt": "Configure sshd jail form with Max Retries, Ban Time in seconds with Permanent Ban option, Find Time, common values reminder, and Save Configuration button",
|
||||
"configCaption": "Editing the sshd jail — pick a stricter <em>Max Retries</em> for SSH if you only ever log in from your own subnet, or extend <em>Ban Time</em> for the public-facing dashboard.",
|
||||
"outro": "The full <em>What it installs / how it's configured / how to uninstall</em> walkthrough — including the manual install path, the SSH MaxAuthTries side effect, and the relationship with the <code>proxmenux-auth.log</code> journal — is in <link>ProxMenux → Security → Fail2Ban</link>.",
|
||||
"calloutTitle": "Without Fail2Ban, brute-force protection is best-effort",
|
||||
"calloutBody": "ProxMenux Monitor has its own <em>application-level</em> ban hook in the Flask request pipeline — but it only takes effect if Fail2Ban is installed and writes to the bans table. Without Fail2Ban, the Monitor logs failed logins to <code>proxmenux-auth.log</code> for future inspection but doesn't actively block IPs."
|
||||
},
|
||||
"lynis": {
|
||||
"heading": "Lynis Security Audit",
|
||||
"subHeading": "(conditional)",
|
||||
"whatIs": "<strong>What it is.</strong> Lynis is an open-source security auditing tool that runs ~280 tests across the host (file permissions, kernel hardening, SSH config, package vulnerabilities, crypto policy, scheduled tasks, banner grabbing, etc.) and produces a hardening score 0–100, a list of warnings and a list of suggestions. It's the de-facto baseline for \"is this server in good shape\" on Debian-based servers.",
|
||||
"whyUseful": "<strong>Why it's useful.</strong> Knowing the security posture of your server is hard to do by reading config files one by one. Lynis catches the things that are routinely overlooked: kernel hardening flags missing, weak SSH ciphers enabled, journal not persistent, sudoers <code>NOPASSWD</code> on default accounts, and many more. Re-running it after applying ProxMenux post-install tweaks gives you an objective number for the improvement.",
|
||||
"notInstalledAlt": "Lynis Security Audit card with Not Installed state and Install Lynis button, listing features: hardening scoring, vulnerability detection, compliance checking and GitHub source",
|
||||
"notInstalledCaption": "Lynis card before install — the blue box summarises what the tool does.",
|
||||
"notBundled": "Lynis is also <strong>not bundled</strong>. ProxMenux installs the latest release directly from the official GitHub source (not the Debian package, which lags several minor versions).",
|
||||
"confirmAlt": "Install Lynis confirmation modal listing what Lynis does: hardening scoring, vulnerability detection, configuration analysis, compliance checking, source from official GitHub repository",
|
||||
"confirmCaption": "Install confirmation — explicit about the GitHub source.",
|
||||
"progressAlt": "Lynis Installation streaming panel: installing latest scan tool, version 3.1.6 confirmed, installation completed message",
|
||||
"progressCaption": "Install in progress — same streaming panel pattern as Fail2Ban.",
|
||||
"afterInstall": "After install the card shows the version and an empty audit history. Click <em>Run Security Audit</em> to start the first scan.",
|
||||
"installedAlt": "Lynis Security Audit card after install with version 3.1.6 Installed badge, Last Scan timestamp, Hardening Index 0, Warnings 0, Suggestions 0, an empty audit report row and a Run Security Audit button",
|
||||
"installedCaption": "Lynis installed, no audit yet. The card prefills the metric tiles with zeros.",
|
||||
"runningAlt": "Lynis Security Audit card while audit is running showing Security audit in progress message, estimated 2-5 minutes duration, and a disabled Running Audit button",
|
||||
"runningCaption": "Audit in progress — the action button shows a spinner and the card explicitly warns that the scan can take 2–5 minutes.",
|
||||
"finishedBody": "When it finishes, the card flips to the results view: hardening index, warnings, suggestions and an <em>Audit Reports</em> list with each historical scan.",
|
||||
"resultsAlt": "Lynis Security Audit card with results: Hardening Index 71 with Lynis 66 PVE 71 breakdown, 3 warnings, 40 suggestions, Security Hardening Score progress bar Proxmox Adjusted 71 of 100 in the Good range, audit reports list with PDF download and Run Security Audit button",
|
||||
"resultsCaption": "Audit results — Hardening Index <strong>71/100 (Good)</strong> on a sample run. The card also shows the \"Lynis raw score\" (66) versus the Proxmox-adjusted score (71) which adds back 11 points for findings the Lynis test corpus flags but are expected behaviour on Proxmox VE.",
|
||||
"scoreTitle": "Lynis raw score vs Proxmox-adjusted score",
|
||||
"scoreIntro": "Lynis ships rules tuned for general-purpose Debian. Proxmox legitimately diverges from some of them (services running as root for cluster reasons, custom <code>journald</code> tuning, etc.). The card shows both numbers so you can:",
|
||||
"scoreItems": [
|
||||
"Track your <em>Lynis raw score</em> the same way external auditors would.",
|
||||
"Track the <em>Proxmox-adjusted</em> score — a fairer baseline if you're comparing nodes inside the same cluster."
|
||||
],
|
||||
"reportBody": "<strong>The full report.</strong> Each audit row in the list has a <em>PDF</em> button that downloads a multi-page report with the executive summary, system info, security posture, every warning with explanation, every suggestion ranked by impact, and the package inventory. It's the artifact you would attach to a security review.",
|
||||
"reportAlt": "Sample first page of the Lynis Security Audit Report PDF showing executive summary with hardening 71 of 100, warnings and suggestions counts, system information block with hostname, kernel, Lynis version, report date, security posture overview",
|
||||
"reportCaption": "First page of a downloaded report — executive summary, system information and security posture overview. The full report continues with detailed warnings, suggestions and the installed-packages list. A <a>sample PDF</a> is attached for reference.",
|
||||
"runPeriodically": "Run the audit periodically (after major Proxmox upgrades, after applying post-install tweaks, before opening a remote-access path) and keep the reports — the trend matters more than any single number.",
|
||||
"outro": "The full <em>What it installs / how it's configured / how to uninstall</em> walkthrough and a written sample report breakdown are in <link>ProxMenux → Security → Lynis</link>."
|
||||
},
|
||||
"dataCollected": {
|
||||
"heading": "How the data is collected",
|
||||
"headerCard": "Card",
|
||||
"headerEndpoint": "Endpoint",
|
||||
"headerSource": "Source",
|
||||
"rows": [
|
||||
{
|
||||
"card": "Authentication, 2FA, password change",
|
||||
"endpoint": "/api/auth/*",
|
||||
"source": "Local SQLite + JWT issued by the Monitor."
|
||||
},
|
||||
{
|
||||
"card": "SSL / HTTPS",
|
||||
"endpoint": "/api/auth/ssl/*",
|
||||
"source": "<code>openssl x509</code> on <code>/etc/pve/local/pve-ssl.pem</code> + <code>/etc/proxmenux/ssl_config.json</code>."
|
||||
},
|
||||
{
|
||||
"card": "API tokens list / mint / revoke",
|
||||
"endpoint": "/api/auth/api-tokens",
|
||||
"source": "Token rows kept locally; nothing leaves the host."
|
||||
},
|
||||
{
|
||||
"card": "Secure Gateway (deploy + status)",
|
||||
"endpoint": "/api/oci/*",
|
||||
"source": "Provisions Alpine LXC + <code>tailscaled</code> via <code>pct create</code> / <code>pct exec</code>."
|
||||
},
|
||||
{
|
||||
"card": "Firewall status & rules",
|
||||
"endpoint": "/api/security/firewall/*",
|
||||
"source": "<code>pve-firewall</code> + <code>/etc/pve/firewall/<cluster|host>.fw</code>."
|
||||
},
|
||||
{
|
||||
"card": "Fail2Ban (only when installed)",
|
||||
"endpoint": "/api/security/fail2ban/*",
|
||||
"source": "<code>fail2ban-client status</code>, <code>/var/log/fail2ban.log</code>, <code>/etc/fail2ban/jail.local</code>."
|
||||
},
|
||||
{
|
||||
"card": "Lynis audit (only when installed)",
|
||||
"endpoint": "/api/security/lynis/*",
|
||||
"source": "Runs <code>lynis audit system</code> in the background; report parsed from <code>/var/log/lynis-report.dat</code>."
|
||||
}
|
||||
]
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Access & Authentication",
|
||||
"href": "/docs/monitor/access-auth",
|
||||
"tail": " — full first-launch flow, 2FA app picker, lost-authenticator recovery, reverse-proxy snippets."
|
||||
},
|
||||
{
|
||||
"label": "Integrations",
|
||||
"href": "/docs/monitor/integrations",
|
||||
"tail": " — cookbooks for using API tokens with Homepage, Home Assistant, Prometheus, n8n and the Secure Gateway end-to-end."
|
||||
},
|
||||
{
|
||||
"label": "API Reference",
|
||||
"href": "/docs/monitor/api",
|
||||
"tailRich": " — every <code>/api/auth</code>, <code>/api/security</code> and <code>/api/oci</code> endpoint with method, body and curl examples."
|
||||
},
|
||||
{
|
||||
"label": "ProxMenux → Security → Fail2Ban",
|
||||
"href": "/docs/security/fail2ban",
|
||||
"tail": " — install walkthrough, jails configured, manual install path."
|
||||
},
|
||||
{
|
||||
"label": "ProxMenux → Security → Lynis",
|
||||
"href": "/docs/security/lynis",
|
||||
"tail": " — sample report, score interpretation, when to re-run."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,327 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "ProxMenux Monitor — Dashboard: Settings tab | ProxMenux Documentation",
|
||||
"description": "The Settings tab groups dashboard preferences (network units, suppression durations, storage / interface exclusions), the embedded notification + AI panel, and a transparent inventory of every ProxMenux post-install optimization currently active on the host with click-through to its source code."
|
||||
},
|
||||
"header": {
|
||||
"title": "Dashboard: Settings tab",
|
||||
"description": "Dashboard preferences, monitoring exclusions, the embedded notification + AI configuration panel, and a live inventory of the ProxMenux post-install optimizations currently active on the host.",
|
||||
"section": "ProxMenux Monitor · Dashboard"
|
||||
},
|
||||
"intro": {
|
||||
"title": "Where each setting actually lives",
|
||||
"body": "The Settings tab is a single surface for several distinct concerns: how the dashboard renders, what gets watched by the Health Monitor, how alerts go out, and what ProxMenux has already changed on the host. Cards that have their own deep documentation page link out rather than duplicating content here — Settings is the entry point, not the manual."
|
||||
},
|
||||
"networkUnits": {
|
||||
"heading": "Network Units",
|
||||
"imageAlt": "Network Units card with Network Unit Display dropdown set to Bytes",
|
||||
"imageCaption": "Choose between bits per second and bytes per second for every network rate displayed in the dashboard.",
|
||||
"body": "Choose how network throughput is displayed across the dashboard: <strong>bits per second</strong> (Mbps / Gbps) or <strong>bytes per second</strong> (MB/s / GB/s). Bits is the default because it's what NIC vendors and ISPs label their products with; bytes is what most file-transfer tools report. The setting affects every chart, badge and tooltip that shows network rate — applied immediately, no reload needed."
|
||||
},
|
||||
"health": {
|
||||
"heading": "Health Monitor",
|
||||
"intro": "The card surfaces the per-category <strong>Suppression Duration</strong> setting — once an alert is dismissed, how long before the scanner is allowed to re-fire it. Each of the ten Health Monitor categories (CPU, Memory, Storage, Disks, Network, VMs, PVE Services, Logs, Updates, Security) has its own dropdown with these values:",
|
||||
"items": [
|
||||
"<strong>24 h</strong> — default for most transient categories.",
|
||||
"<strong>72 h</strong> — for events you want a few days of silence on.",
|
||||
"<strong>168 h</strong> (1 week) and <strong>720 h</strong> (1 month) — periodic checks.",
|
||||
"<strong>8760 h</strong> (1 year) — effectively \"quiet for the foreseeable future\".",
|
||||
"<strong>-1</strong> — permanent silence until you re-enable it manually.",
|
||||
"<strong>Custom hours</strong> — any integer if you need an in-between value."
|
||||
],
|
||||
"imageAlt": "Settings → Health Monitor card with the per-category suppression dropdowns and the Active Suppressions section",
|
||||
"imageCaption": "Health Monitor card — the per-category dropdowns set defaults for new dismisses; the Active Suppressions section below lists every alert currently silenced and lets you revert them.",
|
||||
"editTitle": "Edit mode",
|
||||
"editBody": "The card is read-only by default. Click <strong>Edit</strong> in the top-right of the card to enable the dropdowns and the Re-enable buttons. <strong>Save</strong> commits every pending change (suppression-duration changes and queued re-enables) in a single batch; <strong>Cancel</strong> discards them all. The Save button only activates when there is at least one pending change.",
|
||||
"activeTitle": "Active Suppressions",
|
||||
"activeIntro": "Below the suppression-duration dropdowns, the <strong>Active Suppressions</strong> section lists every alert that is currently silenced — both time-limited dismisses (24 h, 7 days, custom windows) and <em>Permanent</em> ones. Each row shows:",
|
||||
"activeItems": [
|
||||
"A coloured badge — <strong>Permanent</strong> (amber) or a countdown such as <strong>24h remaining</strong> / <strong>7d remaining</strong> (blue).",
|
||||
"The alert identifier, normalized for readability (e.g. <code>pve_storage_full_PBS-Cloud</code> → <em>PVE Storage Full: PBS-Cloud</em>).",
|
||||
"Category, severity and the timestamp the alert was dismissed.",
|
||||
"A <strong>Re-enable</strong> button (active only in Edit mode) that queues the alert to be un-acknowledged on the next Save."
|
||||
],
|
||||
"activeReenableTitle": "Re-enable flow",
|
||||
"activeReenableBody": "Clicking <strong>Re-enable</strong> in Edit mode marks the row in green and strikes its identifier — it is queued but not yet applied. Click again on the same row to <strong>Undo</strong> the queue. When you press <strong>Save</strong>, every queued re-enable fires <code>POST /api/health/un-acknowledge</code> in parallel and the affected rows disappear from the list. If the underlying condition is still present and the category supports re-firing, the alert reappears in the Health Monitor's Active list on the next scan cycle.",
|
||||
"activePermanentNote": "Permanent dismisses (alerts dismissed with <em>Permanently</em> from the Health Monitor modal, or those whose category default is set to <code>-1</code>) <strong>can only be reverted from here</strong>. The dashboard modal does not expose an un-dismiss button for them — the Active Suppressions panel is the single audit log + revert UI.",
|
||||
"activeAutoRefreshTitle": "Auto-refresh",
|
||||
"activeAutoRefreshBody": "The list refreshes automatically when you dismiss or un-dismiss an alert from the Health Monitor modal (via an in-browser event), when the browser tab regains focus, and on visibility change. You do not need to reload the page after dismissing an alert from the dashboard.",
|
||||
"calloutTitle": "Full semantics live in the Health Monitor page",
|
||||
"calloutBody": "The escalation rules (when a re-fire becomes critical), the auto-resolve behaviour for events whose underlying device disappears, and the difference between dismissed and resolved — all documented under <link>Health Monitor → Dismissing alerts and the Suppression Duration</link>. This card just exposes the per-category dropdowns and the Active Suppressions panel."
|
||||
},
|
||||
"thresholds": {
|
||||
"heading": "Health Monitor Thresholds",
|
||||
"intro": "Where the previous card decides <em>how long to stay quiet after a dismiss</em>, this one decides <strong>at what value an alert fires in the first place</strong>. Every check the Health Monitor runs is parameterised by a pair of numbers — a <strong>Warning</strong> threshold and a <strong>Critical</strong> threshold — and both are exposed here for the operator to tune.",
|
||||
"whatForTitle": "What it is for",
|
||||
"whatForIntro": "Defaults that ship with ProxMenux are sane for the average Proxmox host, but every environment has its own envelope:",
|
||||
"whatForItems": [
|
||||
"A small homelab with a single-disk SSD may want to page earlier on capacity (75 / 90 %) to leave room for snapshots.",
|
||||
"A datacenter host with redundant Ceph nodes can be more relaxed on memory warnings (a 90 % working set is normal under ZFS ARC).",
|
||||
"A passively-cooled mini-PC needs lower temperature thresholds than a server with forced-air cooling — same drive class, different physical envelope.",
|
||||
"A heavily-virtualized host that pegs CPU during builds should not page on every 80 % spike, but must still alert on sustained pressure."
|
||||
],
|
||||
"whatForOutro": "Editing a threshold takes effect <strong>on the next scan</strong> — the Health Monitor re-reads the values from <code>/usr/local/share/proxmenux/health_thresholds.json</code> on every cycle, no service restart needed. The same numbers also feed the colour ranges of the dashboard widgets (the temperature line in the disk-temperature modal, the bars on the storage cards) so the visual classification matches what triggers the alert.",
|
||||
"coloursTitle": "Status colours: how Warning and Critical render in the dashboard",
|
||||
"coloursIntro": "Every threshold below produces the same three-state classification across the dashboard — same colours for storage bars, CPU/memory rings, temperature chips, and the dot on the disk modal. Reading a colour anywhere in the Monitor maps to a definite range relative to the configured pair:",
|
||||
"headerColour": "Colour",
|
||||
"headerRange": "Range",
|
||||
"headerMeaning": "Meaning",
|
||||
"colourRows": [
|
||||
{
|
||||
"colour": "Green",
|
||||
"range": "value < Warning",
|
||||
"meaning": "Normal operating range. No alert fires."
|
||||
},
|
||||
{
|
||||
"colour": "Amber",
|
||||
"range": "Warning ≤ value < Critical",
|
||||
"meaning": "Warning state. The Health Monitor fires a WARNING-severity event; notifications respect the channel filters and Quiet Hours."
|
||||
},
|
||||
{
|
||||
"colour": "Red",
|
||||
"range": "value ≥ Critical",
|
||||
"meaning": "Critical state. The Health Monitor fires a CRITICAL event; CRITICAL bypasses Quiet Hours and always reaches the channel."
|
||||
}
|
||||
],
|
||||
"sectionsTitle": "Sections and recommended defaults",
|
||||
"sectionsIntro": "These are the values ProxMenux ships with — the recommended baseline. They're what you see on a fresh host until you override anything. Sections are ordered top-to-bottom from <em>compute</em> → <em>heat</em> → <em>storage capacity</em> so reading down moves from concrete (current load) to accumulated state (free space).",
|
||||
"headerSection": "Section",
|
||||
"headerWarning": "Warning",
|
||||
"headerCritical": "Critical",
|
||||
"headerGates": "What it gates",
|
||||
"thresholdRows": [
|
||||
{
|
||||
"section": "CPU usage",
|
||||
"warning": "85 %",
|
||||
"critical": "95 %",
|
||||
"gates": "Sustained-load alert when CPU averages above the threshold for the scan window."
|
||||
},
|
||||
{
|
||||
"section": "Memory",
|
||||
"warning": "85 %",
|
||||
"critical": "95 %",
|
||||
"gates": "RAM pressure on the host."
|
||||
},
|
||||
{
|
||||
"section": "Swap (critical only)",
|
||||
"warning": "—",
|
||||
"critical": "5 %",
|
||||
"gates": "Swap actually being used. The number is intentionally low: a healthy Proxmox host should rarely touch swap, so even 5 % is a meaningful signal of RAM pressure."
|
||||
},
|
||||
{
|
||||
"section": "CPU temperature",
|
||||
"warning": "80 °C",
|
||||
"critical": "90 °C",
|
||||
"gates": "CPU package / core temperature reading from <code>lm-sensors</code>."
|
||||
},
|
||||
{
|
||||
"section": "Disk temp — HDD",
|
||||
"warning": "60 °C",
|
||||
"critical": "65 °C",
|
||||
"gates": "Standard spinning drives. Manufacturer envelope tops out around 60–65 °C, so Critical is set right at the hard limit."
|
||||
},
|
||||
{
|
||||
"section": "Disk temp — SSD",
|
||||
"warning": "70 °C",
|
||||
"critical": "75 °C",
|
||||
"gates": "2.5'' / M.2 SATA SSDs — run cooler than NVMe but warmer than HDDs."
|
||||
},
|
||||
{
|
||||
"section": "Disk temp — NVMe",
|
||||
"warning": "80 °C",
|
||||
"critical": "85 °C",
|
||||
"gates": "NVMe drives run hotter by design; controllers self-throttle above ~85 °C, so Warning catches the climb before throttling kicks in."
|
||||
},
|
||||
{
|
||||
"section": "Disk temp — SAS",
|
||||
"warning": "55 °C",
|
||||
"critical": "65 °C",
|
||||
"gates": "Enterprise SAS drives share the same ~65 °C manufacturer limit as HDDs, but are normally deployed in rack chassis with active cooling. A reading at 55 °C already signals a cooling problem (failed fan, HVAC issue) <em>before</em> the drive itself is at risk — hence a lower Warning than HDD, not because SAS is less heat-tolerant."
|
||||
},
|
||||
{
|
||||
"section": "Disk space — host",
|
||||
"warning": "85 %",
|
||||
"critical": "95 %",
|
||||
"gates": "Capacity of <code>/</code> and every host mountpoint (<code>/var/lib/vz</code>, <code>/mnt/*</code>…)."
|
||||
},
|
||||
{
|
||||
"section": "Disk space — LXC rootfs",
|
||||
"warning": "85 %",
|
||||
"critical": "95 %",
|
||||
"gates": "Per-container root disk, evaluated against the rootfs size from PVE."
|
||||
},
|
||||
{
|
||||
"section": "LXC mount points",
|
||||
"warning": "85 %",
|
||||
"critical": "95 %",
|
||||
"gates": "Capacity of mountpoints inside running CTs (mp0, mp1, NFS, bind mounts). Excludes rootfs."
|
||||
},
|
||||
{
|
||||
"section": "PVE storage",
|
||||
"warning": "85 %",
|
||||
"critical": "95 %",
|
||||
"gates": "Block-style PVE storages (LVM, LVM-thin, ZFS-pool, RBD/Ceph, PBS)."
|
||||
},
|
||||
{
|
||||
"section": "ZFS pool",
|
||||
"warning": "85 %",
|
||||
"critical": "95 %",
|
||||
"gates": "ZFS pools at host level — independent of PVE registration."
|
||||
}
|
||||
],
|
||||
"defaultsTitle": "Defaults, overrides and reset",
|
||||
"defaultsBody": "The backend exposes a merged view: every section starts from the ProxMenux defaults (the values you see when the host is fresh) and you override only the knobs you care about. The card shows the <em>effective</em> value — the override if you set one, otherwise the default. A <strong>Reset</strong> button wipes every override and goes back to defaults across all sections at once.",
|
||||
"validationTitle": "Validation",
|
||||
"validationBody": "Saving rejects values that don't make sense (percentages outside 0–100, critical below warning, negative temperatures). The frontend shows the inline error; the backend validates again before persisting, so the API can't be tricked into a broken threshold by a hand-crafted PUT."
|
||||
},
|
||||
"lxcDetection": {
|
||||
"heading": "LXC Update Detection",
|
||||
"imageAlt": "LXC Update Detection card with a single switch — when enabled, the Monitor periodically scans running Debian/Ubuntu/Alpine LXC containers for pending package updates.",
|
||||
"imageCaption": "The toggle for the periodic <code>apt list --upgradable</code> / <code>apk list -u</code> scan across every running CT. Default is ON. The matching notification toggle in Notifications → Services only appears while detection is enabled.",
|
||||
"intro": "A dedicated toggle for the LXC update scan, sitting between the Health Monitor Thresholds and the Notifications card. When ON, ProxMenux walks every running CT on the host and queries the in-container package manager for pending updates; results land in the Hardware tab badge counts and feed the <code>lxc_updates_available</code> notification. When OFF, the scan stops entirely (no <code>pct exec</code> calls) and any existing LXC entries in <code>managed_installs.json</code> are purged immediately, so the dashboard and the <code>/api/managed-installs</code> endpoint stop reporting LXC update state without waiting for the next 24h cycle.",
|
||||
"whatRunsTitle": "What the scan actually runs",
|
||||
"whatRunsIntro": "For each CT in <code>running</code> state with a supported package manager:",
|
||||
"whatRunsItems": [
|
||||
"<strong>Cache freshness gate.</strong> If the in-container apt/apk metadata cache is older than <strong>24 hours</strong>, a best-effort refresh runs first (<code>apt-get update -qq</code> on Debian/Ubuntu, <code>apk update</code> on Alpine) with a 60 s timeout. Any failure (no network, broken repo, timeout) is swallowed silently — the listing below still runs against whatever cache exists, so a transient repo issue can never make detection worse than before.",
|
||||
"<strong>Listing.</strong> Then ProxMenux runs <code>apt list --upgradable</code> / <code>apk list -u</code> and parses the output into a structured count plus a sample of the top package names.",
|
||||
"<strong>Per-CT dedup.</strong> A fingerprint built from count, security-count and the sorted top names is stored so a stable set of pending updates doesn't re-notify daily, while a meaningfully different set does."
|
||||
],
|
||||
"selfUpdateTitle": "CTs that self-update outside apt may legitimately report 0",
|
||||
"selfUpdateBody": "Detection only sees what the in-container package manager knows about. A CT whose key software updates itself outside apt (Plex's <code>plexupdate</code> cron, Docker containers updated via <code>docker pull</code>, Frigate's built-in updater, etc.) will keep reporting low or zero apt updates even when the appliance is actively staying current — that's correct, not a bug. The apt-level base system on the same CT may still have its own pending updates, which the scan does surface.",
|
||||
"refreshTitle": "Why the 24 h auto-refresh",
|
||||
"refreshBody": "Long-running appliance CTs frequently end up with apt caches months out of date because nobody routinely runs <code>apt update</code> inside them. Without the refresh, <code>apt list --upgradable</code> reports 0 updates from a frozen snapshot and the operator never sees the backlog. The threshold matches the rest of the check cycle — if the CT was refreshed within the last 24 h, ProxMenux trusts that signal and skips the refresh.",
|
||||
"toggleTitle": "Conditional notification toggle",
|
||||
"toggleBody": "The <code>lxc_updates_available</code> per-channel notification toggle in <strong>Notifications → Services</strong> only renders while detection is enabled. When you turn detection OFF, that row disappears from every channel's category list — but its stored preference is preserved in the DB, so re-enabling detection brings the toggle back at the value it had before.",
|
||||
"purgeTitle": "What gets purged when you disable detection",
|
||||
"purgeBody": "Turning the switch OFF immediately removes every <code>type=lxc</code> entry from <code>/usr/local/share/proxmenux/managed_installs.json</code>. The Hardware tab badges drop to zero on the next dashboard refresh. Turning it back ON repopulates the registry on the next detection cycle (or sooner if you trigger a manual refresh from the API)."
|
||||
},
|
||||
"storageExclusions": {
|
||||
"heading": "Remote Storage Exclusions",
|
||||
"imageAlt": "Remote Storage Exclusions card listing PBS-Cloud, PBS and PBS2 storages with Health and Alerts toggles per row",
|
||||
"imageCaption": "Per-storage <em>Health</em> and <em>Alerts</em> toggles. Storages with both toggles off stop counting against the Health Monitor and stop generating notifications — but still render on the Storage tab marked as excluded.",
|
||||
"intro": "Mark Proxmox-managed storages (NFS / CIFS / PBS / Ceph / iSCSI / etc.) as excluded from monitoring. Two independent toggles per storage:",
|
||||
"items": [
|
||||
"<strong>Health</strong> — when off, the storage stops contributing to the Storage category of the Health Monitor. Useful for archive volumes that are intentionally offline most of the time or remote backup targets only powered up on schedule.",
|
||||
"<strong>Alerts</strong> — when off, alerts about this storage no longer go out through configured channels, even if Health checks remain enabled. Useful when you want the dashboard view but not the notifications."
|
||||
],
|
||||
"outro": "Excluded storages still render on the <link>Storage tab</link> with a purple <em>excluded</em> badge so the entry doesn't silently disappear from your inventory. State is persisted in the <code>excluded_storages</code> SQLite table."
|
||||
},
|
||||
"interfaceExclusions": {
|
||||
"heading": "Network Interface Exclusions",
|
||||
"imageAlt": "Network Interface Exclusions card listing vmbr0, vmbr1, vmbr2, bond0 and eno1 with Health and Alerts toggles per interface",
|
||||
"imageCaption": "Same shape as Storage Exclusions — per-interface <em>Health</em> and <em>Alerts</em> toggles. Each row shows the interface, type badge (bridge / bond / physical), the IP and the link speed.",
|
||||
"intro": "Same shape as Storage Exclusions but for network interfaces. Per interface: exclude from Health checks and / or exclude from notifications. Typical use cases:",
|
||||
"items": [
|
||||
"An intentionally-down spare bridge.",
|
||||
"A NIC that was physically removed but still references in <code>/etc/network/interfaces</code>.",
|
||||
"A VLAN sub-interface used only during maintenance windows.",
|
||||
"A management bridge that is up but doesn't carry traffic — flapping noisily on every cycle."
|
||||
],
|
||||
"outro": "State is persisted in the <code>excluded_interfaces</code> SQLite table. Same purple <em>excluded</em> badge on the <link>Network tab</link> so excluded interfaces stay visible."
|
||||
},
|
||||
"notifications": {
|
||||
"heading": "Notifications & AI",
|
||||
"body1": "This section of the Settings tab is where ProxMenux Monitor notifications and the AI rewriter are turned on. Pressing <em>Enable Notifications</em> starts the dispatch background thread, registers a Proxmox VE webhook target on the host so PVE-emitted events flow into the same pipeline, and unfolds the channel form so you can connect Telegram, Discord, Email, Gotify and the rest. The AI rewriter sits inside the same panel as a collapsible advanced section.",
|
||||
"body2": "Both surfaces have a lot of moving parts — channels, per-event toggles, Rich messages, the Display Name, the dispatch pipeline (dedup, cooldown, aggregation, quiet hours), the PVE webhook integration, AI providers, prompt modes — and live on their own dedicated pages rather than being repeated here:",
|
||||
"items": [
|
||||
"<notifLink>Notifications</notifLink> — channel walk-throughs (Telegram, Discord, Gotify, Email + Gmail / Microsoft app passwords, ntfy, Slack, Teams, generic webhook), per-event categories, Rich messages, Display Name, dispatch pipeline, PVE webhook integration, history and API.",
|
||||
"<aiLink>AI Assistant</aiLink> — providers (OpenAI, Anthropic, Gemini, Groq, OpenRouter, Ollama), model selection, prompt modes (default / custom), output language, per-channel detail levels and AI suggestions."
|
||||
]
|
||||
},
|
||||
"optimizations": {
|
||||
"heading": "ProxMenux Optimizations",
|
||||
"intro": "A live, transparent inventory of every ProxMenux post-install optimization currently active on the host. Every time you apply a post-install option from the Scripts side — either via the <autoLink>Automated post-install</autoLink> or via the à-la-carte <customLink>Customizable post-install</customLink> — the corresponding script registers itself in <code>/usr/local/share/proxmenux/installed_tools.json</code>. The Monitor reads that file and renders this card so you can see, at a glance, what's been changed on your server.",
|
||||
"imageAlt": "ProxMenux Optimizations card with grid of installed tools, each row showing a green dot, tool name and version. Examples include APT IPv4 Force, Bashrc Customization, Fastfetch, Log2ram SSD Protection, Memory Settings Optimization, Setting persistent network interfaces, System Limits Increase, APT Language Skip, Entropy Generation haveged with Legacy badge, Kernel Panic Configuration, Logrotate Optimization, Network Optimizations, Subscription Banner Removal, VFIO IOMMU Passthrough — 14 active total",
|
||||
"imageCaption": "The card lists every active optimization with its name, version, a coloured dot and an orange <em>14 active</em> counter at the top right. Tools whose source is reachable are clickable.",
|
||||
"dotsTitle": "What the dots mean",
|
||||
"dotsItems": [
|
||||
"<green/> <strong>Green dot</strong> — current optimization, registered by the active version of ProxMenux. Source code is reachable: click the row to open it.",
|
||||
"<amber/> <strong>Amber dot + <em>legacy</em> badge</strong> — applied by an older ProxMenux version whose script has since been renamed or replaced. Still active on the host; the source opens in \"legacy\" mode (with an amber accent) so you can audit what was actually run."
|
||||
],
|
||||
"clickTitle": "Click-through to source code",
|
||||
"clickBody": "Clicking a tool opens a modal with the exact bash function that applied the change, plus the script file path it lives in (<code>auto_post_install.sh</code> for the Automated bundle, <code>customizable_post_install.sh</code> for the à-la-carte side). Comments and shell constructs are syntax-highlighted; a Copy button puts the source on your clipboard. This is the \"show your work\" surface — verify what ProxMenux did to your host before any manual change you make on top.",
|
||||
"detailAlt": "Tool source code modal for APT IPv4 Force showing the bash function force_apt_ipv4 from customizable_post_install.sh version 1.0 with syntax-highlighted code that configures /etc/apt/apt.conf.d/99-force-ipv4 with Acquire ForceIPv4 true, registers the tool and emits a translate APT IPv4 configuration completed message",
|
||||
"detailCaption": "Source modal for <em>APT IPv4 Force</em> — exact <code>force_apt_ipv4()</code> function from <code>customizable_post_install.sh v1.0</code>, with syntax highlighting and a one-click Copy.",
|
||||
"whyTitle": "Why this matters",
|
||||
"whyBody": "ProxMenux changes things on your host: kernel parameters, repository configuration, network bits, log rotation, GPU passthrough, etc. Knowing exactly what's active is essential before you start adding manual customisation on top — and even more so if a different admin runs the host than the one who set it up. This card is the auditable record of every optimisation currently in effect, with the exact code that produced it.",
|
||||
"updatesTitle": "Updates available banner",
|
||||
"updatesBody": "When a post-install optimization gets a newer version on disk than the one currently registered on the host, the card shows an \"Updates available\" banner at the top with the count and an <strong>Apply</strong> button. Clicking <strong>Apply</strong> opens a per-optimization picker (the same one available from the Post-Install menu's <em>Apply available updates</em> entry). Pick which optimizations to lift; ProxMenux re-runs the corresponding function and refreshes the version in the registry. When everything is current the banner disappears.",
|
||||
"updatesAlt": "ProxMenux Optimizations card with an Updates available banner at the top — count of pending updates plus an Apply button that opens the per-optimization picker",
|
||||
"updatesCaption": "The banner only renders when at least one optimization has a newer version on disk. See <link>Apply Available Updates</link> for the full update flow and the Path-A equivalent in the shell menu.",
|
||||
"revertTitle": "Reverting an optimization",
|
||||
"revertBody": "The card is read-only — to undo an optimization, run the corresponding <link>Uninstall Optimizations</link> option from the ProxMenux Scripts menu. The uninstall step removes the entry from <code>installed_tools.json</code>, so it disappears from this card on the next refresh."
|
||||
},
|
||||
"dataCollected": {
|
||||
"heading": "How the data is collected",
|
||||
"headerCard": "Card",
|
||||
"headerEndpoint": "Endpoint",
|
||||
"headerSource": "Source",
|
||||
"rows": [
|
||||
{
|
||||
"card": "Network Units",
|
||||
"endpoint": "/api/settings",
|
||||
"source": "Persisted in the dashboard's SQLite settings table."
|
||||
},
|
||||
{
|
||||
"card": "Health Monitor durations",
|
||||
"endpoint": "/api/health/settings",
|
||||
"source": "Per-category suppression durations in the Health DB."
|
||||
},
|
||||
{
|
||||
"card": "Storage / interface exclusions",
|
||||
"endpoint": "/api/storage/exclusions, /api/network/exclusions",
|
||||
"source": "SQLite tables <code>excluded_storages</code> and <code>excluded_interfaces</code>."
|
||||
},
|
||||
{
|
||||
"card": "Notifications & AI panel",
|
||||
"endpoint": "/api/notifications/*",
|
||||
"source": "See the dedicated <notifLink>Notifications</notifLink> / <aiLink>AI Assistant</aiLink> pages."
|
||||
},
|
||||
{
|
||||
"card": "ProxMenux Optimizations list",
|
||||
"endpoint": "/api/proxmenux/installed-tools",
|
||||
"source": "Reads <code>/usr/local/share/proxmenux/installed_tools.json</code>, written by <code>register_tool</code> calls inside the post-install scripts."
|
||||
},
|
||||
{
|
||||
"card": "Optimization source-code modal",
|
||||
"endpoint": "/api/proxmenux/tool-source",
|
||||
"source": "Extracts the matching bash function from <code>auto_post_install.sh</code> or <code>customizable_post_install.sh</code> on the host."
|
||||
}
|
||||
]
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Notifications",
|
||||
"href": "/docs/monitor/notifications",
|
||||
"tail": " — channels, per-event toggles, channel overrides, history, test-send."
|
||||
},
|
||||
{
|
||||
"label": "AI Assistant",
|
||||
"href": "/docs/monitor/ai-assistant",
|
||||
"tail": " — providers, models, prompt modes, languages, per-channel detail levels."
|
||||
},
|
||||
{
|
||||
"label": "Health Monitor → Dismissing alerts and the Suppression Duration",
|
||||
"href": "/docs/monitor/health-monitor#dismissing-alerts-and-the-suppression-duration",
|
||||
"tail": " — the semantics behind the per-category dropdowns above."
|
||||
},
|
||||
{
|
||||
"label": "ProxMenux Scripts → Automated post-install",
|
||||
"href": "/docs/post-install/automated",
|
||||
"tailRich": " and <customLink>Customizable post-install</customLink> — the actual scripts that register themselves in the optimization list above."
|
||||
},
|
||||
{
|
||||
"label": "Uninstall Optimizations",
|
||||
"href": "/docs/post-install/uninstall",
|
||||
"tail": " — how to revert an optimization registered above."
|
||||
},
|
||||
{
|
||||
"label": "Dashboard index",
|
||||
"href": "/docs/monitor/dashboard",
|
||||
"tail": " — back to the tab overview."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,268 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "ProxMenux Monitor — Dashboard: Storage tab | ProxMenux Documentation",
|
||||
"description": "The Storage tab consolidates four views: Proxmox-managed storages with their state, ZFS pools, internal physical disks with SMART data, and external (USB) drives. Each disk drill-in exposes SMART attributes, wear & lifetime and the permanent observation history."
|
||||
},
|
||||
"header": {
|
||||
"title": "Dashboard: Storage tab",
|
||||
"description": "The host's storage state in one screen — Proxmox pools (NFS / CIFS / LVM / ZFS / dir), ZFS pool health, internal SATA / NVMe disks with SMART, and external USB drives. Click any disk to open a drill-in with the full SMART attribute table and the per-disk observation history.",
|
||||
"section": "ProxMenux Monitor · Dashboard"
|
||||
},
|
||||
"intro": {
|
||||
"title": "Backed by three sources",
|
||||
"body": "Proxmox storages come from <code>pvesm status</code>; ZFS state from <code>zpool status</code>; physical disks from <code>lsblk</code> + <code>smartctl</code> (and <code>nvme</code> for NVMe-specific fields). The tab refreshes every ~60 seconds; the per-disk drill-in triggers a fresh SMART read on demand."
|
||||
},
|
||||
"thresholds": {
|
||||
"title": "Status colours and thresholds applied here",
|
||||
"intro": "Every bar, chip, and dot on this tab follows the same three-state classification — <green/> <strong>green</strong> below Warning, <amber/> <strong>amber</strong> from Warning to Critical, <red/> <strong>red</strong> at Critical and above. Recommended defaults shipped with ProxMenux:",
|
||||
"items": [
|
||||
"<strong>Capacity</strong> (host disks, PVE storages, ZFS pools, LXC mounts) — Warning 85 %, Critical 95 %.",
|
||||
"<strong>Disk temperature</strong> — HDD 60/65 °C · SSD 70/75 °C · NVMe 80/85 °C · SAS 55/65 °C (warning / critical)."
|
||||
],
|
||||
"outro": "Every value is configurable per host — <link>Settings → Health Monitor Thresholds</link> is the single source of truth and explains how to tune them."
|
||||
},
|
||||
"topRow": {
|
||||
"heading": "Top row: storage at-a-glance",
|
||||
"intro": "Opening the Storage tab lands you on a four-card summary of the host's storage state — total capacity, what's used locally, what's used on remote storages, and the physical-disk inventory. Each card is a one-line answer to a common question; the cards below the row are where you drill into the detail.",
|
||||
"imageAlt": "Storage tab — top row of four stat cards: Total Storage, Local Used, Remote Used, Physical Disks",
|
||||
"imageCaption": "Top row of the Storage tab — total capacity and disk count, used bytes split into local vs remote storages, and a typed breakdown of physical disks with their health summary.",
|
||||
"headerCard": "Card",
|
||||
"headerWhat": "What it shows",
|
||||
"totalLabel": "Total Storage",
|
||||
"totalWhat": "Combined raw capacity across every physical disk. Footer line shows the count of physical disks discovered.",
|
||||
"localLabel": "Local Used",
|
||||
"localWhat": "Bytes used on local storages (LVM / LVM-thin / ZFS / dir on the host's own disks). Shows the used bytes prominently, with a footer line of <em>X.XX % of Y TB</em> so you see the fill-percentage at the same time.",
|
||||
"remoteLabel": "Remote Used",
|
||||
"remoteWhat": "Same shape as Local Used but for remote storages (NFS / CIFS / PBS / Ceph / iSCSI). Counted separately because remote outages don't affect local data and you typically size and monitor them differently.",
|
||||
"disksLabel": "Physical Disks",
|
||||
"disksIntro": "Two lines of breakdown for the inventory:",
|
||||
"disksItems": [
|
||||
"<strong>By type</strong> — counts of NVMe (purple), SSD (blue) and HDD (blue) discovered. Mixed-disk hosts get all three; an all-NVMe host shows only the NVMe count.",
|
||||
"<strong>By health</strong> — counts of <em>normal</em> (green), <em>warning</em> (yellow) and <em>critical</em> (red) disks. The healthy state usually shows just \"X normal\"; warnings and critical only appear when something escalated."
|
||||
]
|
||||
},
|
||||
"pveStorage": {
|
||||
"heading": "Proxmox Storage card",
|
||||
"intro": "One row per storage configured in <code>/etc/pve/storage.cfg</code>. Each row shows the type badge (<code>nfs</code> / <code>cifs</code> / <code>zfspool</code> / <code>lvm</code> / <code>lvmthin</code> / <code>dir</code> / <code>pbs</code>), the storage name, an active / error / not-monitored badge, the usage percentage and a coloured progress bar:",
|
||||
"items": [
|
||||
"<strong>< 75 %</strong> — blue progress bar, value in blue.",
|
||||
"<strong>75 – 90 %</strong> — yellow progress bar, value in yellow (Health Monitor warns at this point).",
|
||||
"<strong>> 90 %</strong> — red progress bar, value in red (Health Monitor escalates).",
|
||||
"<strong>error</strong> — full row outlined in red, used when the storage is configured but unreachable (NFS server down, CIFS creds expired).",
|
||||
"<strong>excluded</strong> — purple outline + the badge \"not monitored\". Storages explicitly excluded by the user from health checks (handy for manual / archive volumes that are intentionally offline)."
|
||||
],
|
||||
"calloutTitle": "Excluding a noisy storage",
|
||||
"calloutBody": "From the storage row, the per-storage menu lets you mark it as <em>excluded from monitoring</em>. The flag is stored in the <code>excluded_storages</code> table and respected by both the dashboard view and the Health Monitor cycle — no notifications fire for excluded storages, and they don't bump the header pill."
|
||||
},
|
||||
"zfs": {
|
||||
"heading": "ZFS Pools card",
|
||||
"intro": "Renders only when ZFS is installed and at least one pool exists. One row per pool with a health badge, size / allocated / free, and an icon mirroring the health state:",
|
||||
"items": [
|
||||
"<strong>ONLINE</strong> — green. Everything healthy.",
|
||||
"<strong>DEGRADED</strong> — yellow. Pool is serving data but at least one device is unavailable; replacement window starts.",
|
||||
"<strong>FAULTED</strong> / <strong>UNAVAIL</strong> / <strong>SUSPENDED</strong> — red. Pool not serving data; immediate intervention required."
|
||||
],
|
||||
"outro": "Both ZFS state and the per-disk SMART status feed the <em>Disks & I/O</em> category of the <link>Health Monitor</link>."
|
||||
},
|
||||
"physical": {
|
||||
"heading": "Physical Disks & SMART Status",
|
||||
"intro": "Internal disks (SATA / NVMe). Each row condenses the most useful fields at a glance:",
|
||||
"items": [
|
||||
"<strong>Device path</strong> — <code>/dev/sda</code>, <code>/dev/nvme0n1</code>.",
|
||||
"<strong>Type badge</strong> — SATA / NVMe (and the relevant icon).",
|
||||
"<strong>System badge</strong> — orange tag that marks disks the host's OS is running from. The dashboard derives this from the mountpoints of <code>/</code> and <code>/boot</code>: any physical disk hosting them gets the <em>System</em> tag so you don't accidentally wipe or repurpose it. Disks without the tag are pure data drives.",
|
||||
"<strong>Model</strong> — vendor + model string from <code>smartctl -i</code>.",
|
||||
"<strong>Capacity</strong> — formatted human-readable.",
|
||||
"<strong>Temperature</strong> — current °C, coloured by the disk-type threshold (NVMe runs warmer than SATA).",
|
||||
"<strong>SMART status</strong> — passed / failed / unknown.",
|
||||
"<strong>Observations badge</strong> — when the permanent <code>disk_observations</code> history has un-dismissed entries for this disk, a blue badge with the count appears (e.g. <em>3 obs.</em>). Click the disk to drill in and review them.",
|
||||
"<strong>Health badge</strong> — Healthy / Warning / Critical, derived from the SMART check + recent observations."
|
||||
],
|
||||
"clickHint": "The whole row is clickable and opens the per-disk drill-in described below.",
|
||||
"warningTitle": "Don't touch System-tagged disks lightly",
|
||||
"warningBody": "Disks with the orange <strong>System</strong> badge host the running OS. The dashboard surfaces the tag as a guard rail — destructive actions launched from <link>ProxMenux → Disk Manager → Format / Wipe</link> explicitly refuse to act on them. If you really need to repurpose the boot disk, do it from a rescue environment, not from inside Proxmox."
|
||||
},
|
||||
"external": {
|
||||
"heading": "External Storage (USB)",
|
||||
"body": "A separate card for USB-attached drives, only renders when at least one is present. Same fields as internal disks plus an orange <strong>USB</strong> tag. USB drives often appear and disappear (cold backups, occasional offload jobs), so the Health Monitor is conservative about them — observations are retained, but I/O errors on a disconnected USB drive don't escalate."
|
||||
},
|
||||
"drillIn": {
|
||||
"heading": "Disk drill-in modal",
|
||||
"intro": "Clicking any disk row opens a four-tab modal: <strong>Overview</strong> · <strong>SMART</strong> · <strong>History</strong> · <strong>Schedule</strong>. The header always shows the device path, the model + capacity and the orange <em>System</em> badge if applicable.",
|
||||
"overviewTitle": "Tab 1 — Overview",
|
||||
"overviewImageAlt": "Disk drill-in modal — Overview tab with health status, Wear & Lifetime ring, and quick SMART attributes",
|
||||
"overviewImageCaption": "Overview tab — identity, health badge, life-remaining ring with current wear and data written, plus a quick block of the most-watched SMART attributes.",
|
||||
"overviewIntro": "The default landing tab — everything you need to answer \"is this disk OK?\" without running a test. Three blocks:",
|
||||
"overviewItems": [
|
||||
"<strong>Identity</strong> — model, serial, capacity, Health badge (Healthy / Warning / Critical).",
|
||||
"<strong>Wear & Lifetime</strong> — large life-remaining ring (97 %, 50 %, …) with the source attribute spelled out (<em>Media Wearout Indicator</em>, <em>Percentage Used</em>, …), a wear bar (current consumption %), an <em>Est. Life</em> projection in years and the total Data Written. NVMe drives also show <em>Available Spare</em>.",
|
||||
"<strong>SMART Attributes</strong> — six headline fields on a 2-column grid: Temperature, Power On Hours (with humanised duration like <em>3y 116d</em>), Rotation Rate (or <em>SSD</em>), Power Cycles, SMART Status, Reallocated Sectors, Pending Sectors, CRC Errors. The full attribute table lives in the SMART tab."
|
||||
],
|
||||
"smartTitle": "Tab 2 — SMART",
|
||||
"smartImageAlt": "Disk drill-in modal — SMART tab with Run SMART Test buttons (Short / Extended), last-test result and the full SMART attribute table",
|
||||
"smartImageCaption": "SMART tab — run a Short or Extended test, see the last-test outcome, scroll the full SMART attribute table, and generate the full PDF health report.",
|
||||
"smartIntro": "Where the actions live. Three sections:",
|
||||
"smartItems": [
|
||||
"<strong>Run SMART Test</strong> — two buttons. <em>Short Test (~2 min)</em> runs synchronously and shows the result inline. <em>Extended Test (background)</em> can take hours on big drives, runs server-side and fires a notification when it completes.",
|
||||
"<strong>Last Test</strong> — type, status badge (<em>passed</em> / <em>failed</em>) and timestamp of the most recent run.",
|
||||
"<strong>SMART Attributes</strong> — the full attribute table (ID / name / value / worst / status with OK / warning / critical icons). For SATA / SAS, the classical numbered list. For NVMe, the structured fields from <code>nvme smart-log</code> (temperature, available spare, percentage used, data units written / read, host reads / writes, controller busy time, power cycles, unsafe shutdowns, media errors, error-log entries, warning / critical composite temperature time)."
|
||||
],
|
||||
"pdfTitle": "View Full SMART Report (PDF)",
|
||||
"pdfIntro": "At the bottom of the SMART tab, the <strong>View Full SMART Report</strong> button generates a printable, archive-ready PDF — the same structured report you'd send to a vendor for an RMA.",
|
||||
"pdfPreviewAlt": "First page of the generated SMART Health Report PDF — Executive Summary with the PASSED ring + Disk Information block",
|
||||
"pdfPreviewCaption": "First page of the SMART Health Report — Executive Summary with the PASSED ring and the full Disk Information block. The full PDF below has the SSD wear ring, every SMART attribute and the test history.",
|
||||
"pdfDownloadLabel": "Download sample SMART report (PDF)",
|
||||
"pdfSectionsIntro": "The report has five top-level sections:",
|
||||
"pdfSections": [
|
||||
"<strong>Executive Summary</strong> — large PASSED / FAILED verdict, plain-language disk health assessment paragraph (\"your disk is healthy / showing signs of wear / failing\"), and four quick stats (report timestamp, last-test type, test result, attributes checked).",
|
||||
"<strong>Disk Information</strong> — model, serial, capacity, type (HDD / SSD / NVMe), family, form factor, interface (SATA 3.3 · 6.0 Gb/s, …), TRIM support, current temperature with the optimal threshold, power-on time, power cycles, SMART status, plus the headline counters (pending sectors, CRC errors, reallocated sectors).",
|
||||
"<strong>SSD Wear & Lifetime</strong> (SSD / NVMe only) — life-remaining ring, source attribute, current wear level, data written, power-on hours.",
|
||||
"<strong>SMART Attributes (full)</strong> — every attribute the drive reports, with ID, name, value, worst, threshold, raw value and a status pill. The most user-relevant ones (Reallocated Sector Ct, Power On Hours, Reported Uncorrect, UDMA CRC Error Count, Media Wearout Indicator, …) include a one-line plain-language explanation under the row.",
|
||||
"<strong>Last Self-Test Result + Full Self-Test History</strong> — the latest test (type, result, completion message, at which power-on-hours mark) plus a numbered table of every retained test.",
|
||||
"<strong>Recommendations</strong> — action items based on the verdict: <em>Disk is Healthy / Schedule periodic tests / Backup strategy</em> for healthy drives, escalating language with replacement guidance when attributes are out of range."
|
||||
],
|
||||
"pdfOutro": "The PDF is produced server-side and downloaded with a stable filename pattern (<code>SMART-<short-id>.pdf</code>) so multiple snapshots over time can sit side-by-side in your archive. Useful when you're tracking degradation across months or sending evidence to vendor support.",
|
||||
"historyTitle": "Tab 3 — History",
|
||||
"historyImageAlt": "Disk drill-in modal — History tab listing past SMART tests with download and delete actions",
|
||||
"historyImageCaption": "History tab — every retained SMART test for this disk. Per row: type, timestamp, \"X days ago\" tag, latest marker, download (raw <code>smartctl</code> output) and delete actions.",
|
||||
"historyIntro": "The retained pool of SMART tests for this disk — both short and extended runs that completed. Each entry is the raw <code>smartctl</code> output captured at run time, plus the structured fields the Monitor parsed out for the dashboard. Per-row actions:",
|
||||
"historyItems": [
|
||||
"<strong>Download</strong> — saves the raw <code>smartctl -a</code> output as a text file. Identical to what the PDF report parses, useful when you need the exact line a vendor asks for.",
|
||||
"<strong>Delete</strong> — removes the test from history. The retention limit set in the Schedule tab (<em>Last 5 / 10 / 20</em>) deletes oldest-first automatically; this action is the manual override."
|
||||
],
|
||||
"scheduleTitle": "Tab 4 — Schedule",
|
||||
"scheduleImageAlt": "Disk drill-in modal — Schedule tab with the toggle for Automatic SMART Tests, the configured-schedules list and the Add Schedule button",
|
||||
"scheduleImageCaption": "Schedule tab — pick test type, frequency and retention; the Monitor wires it into <code>cron</code> so tests run unattended.",
|
||||
"scheduleIntro": "Cron-driven automatic SMART tests, no shell needed. The page has three areas:",
|
||||
"scheduleItems": [
|
||||
"<strong>Automatic SMART Tests toggle</strong> — global on/off switch for every schedule on this disk. Useful when you want to pause everything during maintenance without losing the schedule definitions.",
|
||||
"<strong>Configured Schedules</strong> — one row per existing schedule with the test type badge (<em>short</em> / <em>long</em>), the cron expression in human form (<em>\"Day 1 of month at 03:00\"</em>, <em>\"Every Sunday at 02:00\"</em>), the disks it covers and the retention setting.",
|
||||
"<strong>Add Schedule / Edit Schedule</strong> — form with: Test Type (<em>Short ~2 min</em> / <em>Long 1-4 h</em>), Frequency (<em>Daily / Weekly / Monthly</em>), Day of Month / Day of Week, Time, Keep Results (<em>Last 5 / 10 / 20</em>)."
|
||||
],
|
||||
"scheduleOutro": "The schedule is materialised as a cron entry on the host that calls back into the Monitor; results are saved to the same SMART history shown in Tab 3, and the retention setting auto-prunes the oldest test when a new one finishes.",
|
||||
"tempTitle": "Temperature history modal",
|
||||
"tempIntro": "Every disk that exposes a temperature sensor has its readings sampled continuously by the Monitor and persisted to a local time-series. The current value appears as one of the six headline SMART attributes in the Overview tab; clicking that block opens a dedicated temperature-history modal with the full picture.",
|
||||
"tempImageAlt": "Disk temperature history modal — header with the disk path and model, a timeframe selector (1 Hour / 24 Hours / 7 Days / 30 Days), a row of four stat cards (Current / Min / Avg / Max), and a line chart of the temperature over the selected range coloured by the per-disk-type thresholds",
|
||||
"tempImageCaption": "Temperature detail — opens from the Overview tab on any disk whose sensor returns a non-zero reading. The chart is coloured against the disk-type threshold (HDD / SSD / NVMe / SAS).",
|
||||
"tempShowsTitle": "What the modal shows",
|
||||
"tempShowsItems": [
|
||||
"<strong>Timeframe selector</strong> with four ranges: <em>1 Hour</em>, <em>24 Hours</em> (default), <em>7 Days</em>, <em>30 Days</em>. Each one queries the same backend with a different downsampling so the chart stays readable at every horizon.",
|
||||
"<strong>Four stat cards</strong> at the top of the modal: <em>Current</em>, <em>Min</em>, <em>Avg</em>, <em>Max</em> for the selected range. The <em>Current</em> card is coloured by the same status thresholds the Storage tab and the notifications use, so you can see at a glance whether the disk is in normal / warm / hot territory.",
|
||||
"<strong>Line chart</strong> of the temperature over time, with the line and shaded area coloured by disk type:"
|
||||
],
|
||||
"tempDiskTypes": [
|
||||
"HDD — typically cooler thresholds.",
|
||||
"SSD — moderate thresholds.",
|
||||
"NVMe — higher thresholds (NVMe runs hotter by design).",
|
||||
"SAS — same defaults as HDD."
|
||||
],
|
||||
"tempConfigurable": "All four are configurable from <em>Settings → Health Monitor Thresholds</em>.",
|
||||
"tempWhyTitle": "Why a history matters here",
|
||||
"tempWhyItems": [
|
||||
"<strong>Drift detection.</strong> Disks that progressively heat up over weeks (failing fan, dust build-up, neighbour disk dying and pushing hot air across) are invisible to a single \"current temperature\" readout. The 7-day and 30-day views surface the drift.",
|
||||
"<strong>Spike correlation.</strong> When a backup window or a rebuild pushed the disk briefly over its threshold, the 1-hour and 24-hour ranges show whether it was a one-off or a recurring pattern.",
|
||||
"<strong>Threshold tuning.</strong> Before raising or lowering a threshold in <em>Settings → Health Monitor Thresholds</em>, the 30-day chart shows the disk's actual operating range so the new value lines up with what the hardware really does rather than a guess."
|
||||
],
|
||||
"obsTitle": "Observation history (across tabs)",
|
||||
"obsIntro": "Modern disks fail gradually. A disk can report SMART <strong>PASSED</strong> and still log occasional read errors in dmesg, drop SATA links, or expose pending sectors that come and go. The standard Proxmox UI shows you the current SMART verdict — it does not keep a history of those <em>signals</em>. ProxMenux does, and surfaces them right inside the disk modal.",
|
||||
"obsImageAlt": "Disk Details modal Overview tab showing a healthy disk with SMART status Passed, 0 reallocated/pending/CRC errors, and an Observations section listing one recorded I/O Error event with the raw kernel message, a human translation of the ATA error code, first and last occurrence timestamps and an occurrence count",
|
||||
"obsImageCaption": "A disk that <strong>SMART says is fine</strong> can still have an observation history. The card is the historical signal layer underneath the SMART verdict.",
|
||||
"obsWhatTitle": "What an observation is",
|
||||
"obsWhatIntro": "Anything ProxMenux catches in the kernel log, dmesg or SMART output that looks like a disk-level event — and that on its own would be too granular for a notification — is recorded as an <strong>observation</strong>. Each row shows:",
|
||||
"obsWhatItems": [
|
||||
"<strong>Type badge</strong> (I/O Error, SMART Error, Filesystem Error, ZFS Pool Error, Connection Error).",
|
||||
"<strong>Raw kernel message</strong> as it appeared in dmesg — useful when copy-pasting into a search engine or a support ticket.",
|
||||
"<strong>A human one-liner</strong> under the raw message for known ATA codes (<code>IDNF</code> → \"Sector address not found — possible bad sector or cable issue\", <code>UNC</code> → \"Uncorrectable read error — bad sector\", and the rest of the standard codes).",
|
||||
"<strong>First and last occurrence timestamps</strong>, plus an <strong>occurrence count</strong> deduplicated by error signature."
|
||||
],
|
||||
"obsWhyTitle": "Why ProxMenux records and shows them",
|
||||
"obsWhyItems": [
|
||||
"<strong>Disk failure is rarely a single event.</strong> It usually starts with sporadic ATA bus errors, the odd UNC sector, or a couple of medium errors weeks before SMART flips to <em>FAILED</em>. Without persistence those early warnings disappear from dmesg on the next boot.",
|
||||
"<strong>SMART can lie.</strong> A drive can show all attributes green and still be on the way out — the observation layer catches the symptoms SMART doesn't expose (especially ICRC, IDNF, link resets at lower SATA speeds).",
|
||||
"<strong>It separates \"is happening now\" from \"happened recently\".</strong> The Health Monitor auto-resolves transient errors as soon as they stop firing, which is great for keeping the active alert list clean — but you still want to see, days later, that this disk had three I/O errors that night. The observation table is the answer.",
|
||||
"<strong>It feeds the tiered notification model.</strong> The disk_io detector reads observation rate from this table to decide silent / WARNING / CRITICAL (the sliding 24h window introduced in 1.2.1.2). The history is what makes that classification possible."
|
||||
],
|
||||
"obsDedupTitle": "How dedup and re-notification work",
|
||||
"obsDedupBody1": "Observations are deduplicated by their <strong>signature</strong> — a stable fingerprint of the error type, device and key fields of the kernel line. The same event repeating bumps the <code>occurrence_count</code> on the existing row rather than creating a new one. A <strong>different signature</strong> on the same disk creates a new observation and is treated as a new event for notification purposes.",
|
||||
"obsDedupBody2": "Notifications follow an anti-cascade rule: the first occurrence of a given (disk, signature, severity) combination pages the operator, and ProxMenux then waits 24 hours before pinging again about the same combination — even if the count keeps climbing. Escalating severity (WARNING → CRITICAL) breaks the cooldown so the operator is told when things get worse, not just when they happen.",
|
||||
"obsDismissTitle": "Dismissing vs resolving",
|
||||
"obsDismissBody1": "Each row has a <strong>dismiss</strong> action. Dismissing an observation tells ProxMenux \"I've seen this, stop notifying me about it\". It does <strong>not</strong> freeze the occurrence counter — if the same fault keeps happening the count keeps climbing in the background, ready to alert again if it ever escalates to a different severity tier or signature. A dismissed observation stays visible on the card with a muted style, so a future operator can still see \"this disk had history here\".",
|
||||
"obsDismissBody2": "Resolving on the active-error side (Health Monitor) is independent of observation dismiss — the observation persists past the active error's auto-resolve. That's the whole point: it survives, so a transient warning from last week is still visible on the disk card today. See <link>Health Monitor</link> for the active-error side of the same picture."
|
||||
},
|
||||
"dataCollected": {
|
||||
"heading": "How the data is collected",
|
||||
"headerSection": "Section of the tab",
|
||||
"headerEndpoint": "Endpoint",
|
||||
"headerSource": "Source",
|
||||
"rows": [
|
||||
{
|
||||
"section": "Top summary cards",
|
||||
"endpoint": "/api/storage/summary",
|
||||
"source": "Aggregated from <code>lsblk</code>, <code>zpool list</code>, <code>vgs</code> / <code>lvs</code>."
|
||||
},
|
||||
{
|
||||
"section": "Per-disk inventory",
|
||||
"endpoint": "/api/storage",
|
||||
"source": "<code>lsblk -O</code> + <code>smartctl -i</code> per device, with stable disk identity cache (cleared on hot-plug events)."
|
||||
},
|
||||
{
|
||||
"section": "Proxmox storages",
|
||||
"endpoint": "/api/proxmox-storage",
|
||||
"source": "<code>pvesh get /nodes/<node>/storage</code> with the active/online state of each."
|
||||
},
|
||||
{
|
||||
"section": "SMART current values",
|
||||
"endpoint": "/api/storage/smart/<disk>",
|
||||
"source": "<code>smartctl -A <dev></code> — refreshed on demand, not cached."
|
||||
},
|
||||
{
|
||||
"section": "SMART self-test history",
|
||||
"endpoint": "/api/storage/smart/<disk>/history",
|
||||
"source": "Stored under <code>/var/lib/proxmenux-monitor/smart/<disk>/</code> as JSON snapshots."
|
||||
},
|
||||
{
|
||||
"section": "Permanent observations",
|
||||
"endpoint": "/api/storage/observations",
|
||||
"source": "SQLite table fed by the Health Monitor every cycle (kept across auto-resolve)."
|
||||
}
|
||||
],
|
||||
"outro": "Verifying the collection chain on the host:",
|
||||
"codeComment1": "# Pull the current snapshot from a script",
|
||||
"codeComment2": "# Cross-check what the dashboard sees against the raw OS view"
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Health Monitor",
|
||||
"href": "/docs/monitor/health-monitor",
|
||||
"tail": " — the disks-and-I/O category and the suppression model."
|
||||
},
|
||||
{
|
||||
"label": "API Reference",
|
||||
"href": "/docs/monitor/api",
|
||||
"tail": " — the storage and SMART endpoints."
|
||||
},
|
||||
{
|
||||
"label": "Notifications",
|
||||
"href": "/docs/monitor/notifications",
|
||||
"tailRich": " — what <code>disk_io_error</code>, <code>storage_unavailable</code> and <code>smart_test_failed</code> trigger downstream."
|
||||
},
|
||||
{
|
||||
"label": "Dashboard index",
|
||||
"href": "/docs/monitor/dashboard",
|
||||
"tail": " — the other tabs."
|
||||
},
|
||||
{
|
||||
"label": "ProxMenux → Disk Manager",
|
||||
"href": "/docs/disk-manager",
|
||||
"tail": " — the actions side: format / wipe / SMART tests / import disks into VMs and CTs from the TUI."
|
||||
},
|
||||
{
|
||||
"label": "ProxMenux → SMART Disk Health & Test",
|
||||
"href": "/docs/disk-manager/smart-disk-test",
|
||||
"tail": " — the CLI counterpart of this tab: schedule SMART tests, export the JSON the dashboard renders, and the deeper test-type / interpretation reference."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,120 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "ProxMenux Monitor — Dashboard: System Logs tab | ProxMenux Documentation",
|
||||
"description": "The System Logs tab consolidates three sources into one screen: live journalctl with filters and download, Proxmox task history (UPIDs), and notification log — all searchable, filterable by severity / time range, and downloadable as text bundles."
|
||||
},
|
||||
"header": {
|
||||
"title": "Dashboard: System Logs tab",
|
||||
"description": "Three sub-tabs under one roof: the system journal (journalctl with filters), Proxmox task history, and the notification log. All three are searchable, filterable and downloadable as text bundles.",
|
||||
"section": "ProxMenux Monitor · Dashboard"
|
||||
},
|
||||
"readOnly": {
|
||||
"title": "Read-only by design",
|
||||
"body": "Nothing on this tab modifies log files. Filters live in the URL / state, downloads are server-side bundles. The dashboard never deletes log entries — for housekeeping use the host's own <code>journalctl --vacuum-time=<N></code> or <code>logrotate</code>."
|
||||
},
|
||||
"topRow": {
|
||||
"heading": "Top row: four counters",
|
||||
"items": [
|
||||
"<strong>Total Entries</strong> — number of log records inside the active filter window.",
|
||||
"<strong>Errors</strong> — count of severity ≤ 3 (<code>err</code> / <code>crit</code> / <code>alert</code> / <code>emerg</code>).",
|
||||
"<strong>Warnings</strong> — count of severity 4 (<code>warning</code>).",
|
||||
"<strong>Backups</strong> — count of vzdump / PBS task entries in the same window."
|
||||
]
|
||||
},
|
||||
"subtabs": {
|
||||
"heading": "Three sub-tabs",
|
||||
"logsTitle": "Logs",
|
||||
"logsIntro": "The system journal, served by <code>journalctl</code> on the backend. Filters available in the toolbar:",
|
||||
"logsFilters": [
|
||||
"<strong>Severity</strong> — emerg / alert / crit / err / warning / notice / info / debug, or any combination.",
|
||||
"<strong>Time range</strong> — last 5 min / 15 min / 1 h / 6 h / 24 h / 7 d / custom.",
|
||||
"<strong>Free-text search</strong> — substring or regex (<code>journalctl --grep</code>).",
|
||||
"<strong>Unit filter</strong> — restrict to a specific systemd unit (<code>pveproxy.service</code>, <code>nginx.service</code>, …)."
|
||||
],
|
||||
"logsRowsAfter": "Each row shows timestamp, severity badge, source unit and the message. Long messages collapse with a \"show more\" toggle. The <strong>Download</strong> action bundles the current filter into a single <code>.txt</code> file via <code>GET /api/logs/download</code> — useful when you want to share a slice of journal with someone.",
|
||||
"logDetailsModalTitle": "Log Details modal",
|
||||
"logDetailsBody": "Clicking any row opens a <strong>Log Details</strong> modal with every structured field journald captured for that single entry — the same view you'd build by hand running <code>journalctl --output=verbose</code> on the host.",
|
||||
"logDetailsImageAlt": "Log Details modal — single journal entry expanded with Level, Service, Timestamp, Source, Systemd Unit, Process ID, Hostname and the full Message",
|
||||
"logDetailsImageCaption": "Log Details modal — every structured field journald carries for this entry, with the full untruncated message at the bottom. Useful for cron and service logs where the executed command line matters.",
|
||||
"fieldsIntro": "Fields shown:",
|
||||
"fields": [
|
||||
"<strong>Level</strong> — coloured severity badge (INFO / WARNING / ERROR / CRITICAL).",
|
||||
"<strong>Service</strong> — short name of the unit / process that emitted the entry.",
|
||||
"<strong>Timestamp</strong> — full date and time of the log line.",
|
||||
"<strong>Source</strong> — origin of the entry (journal, kernel, audit, …).",
|
||||
"<strong>Systemd Unit</strong> — the actual <code>.service</code> / <code>.timer</code> / <code>.socket</code> unit if the entry was associated with one.",
|
||||
"<strong>Process ID</strong> — PID of the emitting process.",
|
||||
"<strong>Hostname</strong> — useful when journals are forwarded across cluster nodes.",
|
||||
"<strong>Message</strong> — the full untruncated message in a monospace block, ready to copy."
|
||||
],
|
||||
"maxLevelStoreTitle": "Journald MaxLevelStore",
|
||||
"maxLevelStoreBody": "On a fresh Proxmox install, journald defaults to <code>MaxLevelStore=warning</code>, which silently drops info-level messages. The Monitor detects this on startup and adds a drop-in (<code>/etc/systemd/journald.conf.d/proxmenux-loglevel.conf</code>) raising the threshold to <code>info</code> so the Logs tab actually has something to show across all severities.",
|
||||
"backupsTitle": "Backups",
|
||||
"backupsBody": "Proxmox task history filtered to backup-related entries. One row per task (<code>vzdump</code>, PBS transfers, Garbage Collect, Verify) with the status (OK / WARNINGS / ERROR), guest involved, source storage, duration and the UPID. Click a row to load the full task log via <code>GET /api/task-log/<upid></code> — the same data Proxmox exposes through <em>Datacenter → Tasks</em>, scoped to backups.",
|
||||
"notificationsTitle": "Notifications",
|
||||
"notificationsBody1": "Every notification dispatched by the Monitor — Telegram, Discord, Email, Gotify, ntfy, Slack, Teams, webhook. Each row: timestamp, channel, event type, severity, the rendered title, the rendered body, and (if AI is enabled) a toggle to view the AI rewrite next to the original.",
|
||||
"notificationsBody2": "Use this tab to verify a channel is actually delivering and to compare what the AI rewrite produced vs the template baseline. Channel configuration lives in the <link>Notifications</link> deep page."
|
||||
},
|
||||
"dataCollected": {
|
||||
"heading": "How the data is collected",
|
||||
"headerSubtab": "Sub-tab",
|
||||
"headerEndpoint": "Endpoint",
|
||||
"headerSource": "Source",
|
||||
"rows": [
|
||||
{
|
||||
"subtab": "Logs (live filter)",
|
||||
"endpoint": "/api/logs",
|
||||
"source": "<code>journalctl --output json --since <range></code> with severity / unit / search filters applied server-side."
|
||||
},
|
||||
{
|
||||
"subtab": "Download",
|
||||
"endpoint": "/api/logs/download",
|
||||
"source": "Same query, returned as plain text for grep / less."
|
||||
},
|
||||
{
|
||||
"subtab": "Backups",
|
||||
"endpoint": "/api/backups",
|
||||
"source": "PVE task history filtered by <code>vzdump</code>, PBS transfers, Garbage Collect, Verify."
|
||||
},
|
||||
{
|
||||
"subtab": "Backup task drill-in",
|
||||
"endpoint": "/api/task-log/<upid>",
|
||||
"source": "Plain-text full task log read from <code>/var/log/pve/tasks/<index>/<upid></code>."
|
||||
},
|
||||
{
|
||||
"subtab": "Notifications history",
|
||||
"endpoint": "/api/notifications/history",
|
||||
"source": "SQLite <code>notification_history</code> table fed by the dispatch loop."
|
||||
}
|
||||
],
|
||||
"apiIntro": "Both the live filter and the downloads are also reachable via the API:",
|
||||
"codeComment1": "# Last hour of errors and worse, with a keyword",
|
||||
"codeComment2": "# Download the full journal of the last 6 hours as plain text",
|
||||
"codeComment3": "# Look up the full output of a specific task by UPID"
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Health Monitor",
|
||||
"href": "/docs/monitor/health-monitor",
|
||||
"tail": " — the System Logs category that watches for persistent / spike / cascade patterns."
|
||||
},
|
||||
{
|
||||
"label": "Notifications",
|
||||
"href": "/docs/monitor/notifications",
|
||||
"tail": " — the journal watcher reads the same source and turns matches into notifications."
|
||||
},
|
||||
{
|
||||
"label": "API Reference",
|
||||
"href": "/docs/monitor/api",
|
||||
"tail": " — the logs and task-log endpoints with their query parameters."
|
||||
},
|
||||
{
|
||||
"label": "Dashboard index",
|
||||
"href": "/docs/monitor/dashboard",
|
||||
"tail": " — the other tabs."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,153 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "ProxMenux Monitor — Dashboard: System Overview tab | ProxMenux Documentation",
|
||||
"description": "The default landing tab of ProxMenux Monitor: four metric cards (CPU, Memory, Active VMs & LXCs, Temperature) with live updates and sparkline, the historical metrics chart, and condensed Storage and Network panels with click-through to their dedicated tabs."
|
||||
},
|
||||
"header": {
|
||||
"title": "Dashboard: System Overview tab",
|
||||
"description": "The first tab the dashboard opens on. Four live metric cards across the top, the historical-metrics chart in the middle, and condensed storage / network panels at the bottom — all derived from the same APIs that drive the dedicated tabs.",
|
||||
"section": "ProxMenux Monitor · Dashboard"
|
||||
},
|
||||
"readOnly": {
|
||||
"title": "A read-only snapshot",
|
||||
"body": "Nothing on this tab is a control surface — every panel is informational. Actions live in the dedicated tabs they link to: drill into Storage to manage disks, into VMs & LXCs to start / stop guests, into the Security tab to configure auth, and so on."
|
||||
},
|
||||
"captureAlt": "System Overview tab — four metric cards (CPU, Memory, Active VMs, Temperature), node metrics chart, and Storage / Network summary cards",
|
||||
"captureCaption": "The System Overview tab — what the dashboard opens on. The four cards are live, the chart below is historical, and the two cards at the bottom summarise Storage and Network.",
|
||||
"topRow": {
|
||||
"heading": "Top row: live metric cards",
|
||||
"intro": "Four cards in a 2×2 grid on mobile, single row on desktop. Each updates from <code>/api/system</code> every few seconds.",
|
||||
"headerCard": "Card",
|
||||
"headerWhat": "What it shows",
|
||||
"headerSource": "Source",
|
||||
"rows": [
|
||||
{
|
||||
"card": "CPU Usage",
|
||||
"what": "Current percentage with a progress bar. Updates ~1 s via the vital-signs sampler.",
|
||||
"source": "psutil.cpu_percent()"
|
||||
},
|
||||
{
|
||||
"card": "Memory Usage",
|
||||
"what": "Used GB, percentage, total GB. Progress bar tracks the percentage.",
|
||||
"source": "psutil.virtual_memory()"
|
||||
},
|
||||
{
|
||||
"card": "Active VM & LXC",
|
||||
"what": "Count of currently running guests, with a Running / Stopped breakdown badge and a footer line for total VMs and LXCs.",
|
||||
"source": "/api/vms (consolidated)"
|
||||
},
|
||||
{
|
||||
"card": "Temperature",
|
||||
"what": "CPU temperature in °C with status badge (cool / warm / hot) and a 5-minute sparkline behind it. Shows <em>N/A</em> when no sensor is detected. Click to open the temperature detail modal.",
|
||||
"source": "sensors / coretemp"
|
||||
}
|
||||
],
|
||||
"thresholdsTitle": "Status colours and thresholds applied here",
|
||||
"thresholdsIntro": "Every ring, bar, and sparkline on the four metric cards follows the same classification — <green/> <strong>green</strong> below Warning, <amber/> <strong>amber</strong> from Warning to Critical, <red/> <strong>red</strong> at Critical and above. Recommended defaults shipped with ProxMenux:",
|
||||
"thresholdsItems": [
|
||||
"<strong>CPU usage</strong> — Warning 85 %, Critical 95 %.",
|
||||
"<strong>Memory</strong> — Warning 85 %, Critical 95 % (Swap also fires Critical at 5 % used — a healthy Proxmox host should rarely touch swap).",
|
||||
"<strong>CPU temperature</strong> — Warning 80 °C, Critical 90 °C."
|
||||
],
|
||||
"thresholdsOutro": "Every value is configurable per host — <link>Settings → Health Monitor Thresholds</link> is the single source of truth and explains how to tune them.",
|
||||
"sparklineTitle": "The sparkline is meaningful",
|
||||
"sparklineBody": "The temperature card draws a 5-minute trace under the value, with the line and gradient colour following the same Warning/Critical pair documented above. It's the fastest way to see whether the host is in a thermal climb without opening the detail modal."
|
||||
},
|
||||
"middle": {
|
||||
"heading": "Middle: node metrics charts",
|
||||
"body1": "Below the top row sits the <code>NodeMetricsCharts</code> component — historical CPU, memory and disk-I/O graphs sourced from Proxmox's own RRD store via <code>/api/node/metrics</code>. A timeframe selector switches between <em>1 hour / 24 hours / 7 days / 30 days / 1 year</em>; data resolution drops as the window grows so the chart stays smooth.",
|
||||
"body2": "These are the same graphs that the Proxmox web UI renders for a node, just consolidated into the Monitor's dark theme and aligned with the other panels."
|
||||
},
|
||||
"bottom": {
|
||||
"heading": "Bottom row: Storage & Network summaries",
|
||||
"storageTitle": "Storage Overview card",
|
||||
"storageIntro": "A condensed view of the host's storage state, broken into three blocks:",
|
||||
"storageItems": [
|
||||
"<strong>Total Node Capacity</strong> — sum of all VM/LXC storages plus the local system storage, with a gradient progress bar of the total used / free split.",
|
||||
"<strong>Total Capacity / Physical Disks</strong> — raw capacity headline and the count of physical disks discovered.",
|
||||
"<strong>VM/LXC Storage</strong> — used / free / percentage for the storages where guests live, plus a counter when more than one is configured.",
|
||||
"<strong>Local Storage (System)</strong> — the host's own root / system mount, separately from the guest pool."
|
||||
],
|
||||
"storageDrillIn": "Drill-in lives in the <link>Storage tab</link> — per-disk SMART, ZFS pool details, observation history, etc.",
|
||||
"networkTitle": "Network Overview card",
|
||||
"networkBody1": "Top line shows the count of active interfaces (physical + bridges combined). Below that, two rows of coloured badges for the interfaces that are <code>up</code> — physical NICs in blue, bridges in a secondary colour. A timeframe selector at the top right (1 hour / 24 hours / 7 days / 30 days / 1 year) controls a small RX / TX traffic chart.",
|
||||
"networkBody2": "Per-interface drill-in (IP/MAC, RRD chart, bridge members, bond mode, etc.) lives in the <link>Network tab</link>."
|
||||
},
|
||||
"refresh": {
|
||||
"heading": "Refresh model",
|
||||
"intro": "Each panel manages its own loading state (<code>loadingStates.cpu</code>, <code>loadingStates.storage</code>, …) so a slow source doesn't block the rest. While a panel is fetching, it shows a pulse-animated skeleton; failed fetches degrade gracefully — for example, a missing temperature sensor renders the card as <em>N/A</em> instead of an error.",
|
||||
"items": [
|
||||
"<strong>Top metric cards</strong> — refresh every ~5 s. The CPU and temperature panels also receive a 1 s push from the vital-signs sampler.",
|
||||
"<strong>Node metrics chart</strong> — refresh every 30 s, or on timeframe change.",
|
||||
"<strong>Storage card</strong> — refresh every 60 s. SMART data is cached longer (the Storage tab triggers a fresh read on demand).",
|
||||
"<strong>Network card</strong> — refresh every 5 s on the active timeframe.",
|
||||
"<strong>Manual refresh</strong> — the Refresh button in the header forces all panels to re-fetch immediately."
|
||||
]
|
||||
},
|
||||
"dataCollected": {
|
||||
"heading": "How the data is collected",
|
||||
"headerCard": "Card",
|
||||
"headerEndpoint": "Endpoint",
|
||||
"headerSource": "Source",
|
||||
"rows": [
|
||||
{
|
||||
"card": "Header status pill",
|
||||
"endpoint": "/api/health",
|
||||
"source": "The cached overall status produced by the Health Monitor each cycle."
|
||||
},
|
||||
{
|
||||
"card": "CPU / RAM / Swap / Uptime",
|
||||
"endpoint": "/api/system",
|
||||
"source": "<code>/proc/stat</code>, <code>/proc/meminfo</code>, <code>/proc/uptime</code> with short-window CPU sampling."
|
||||
},
|
||||
{
|
||||
"card": "Host info (kernel, BIOS, distro)",
|
||||
"endpoint": "/api/info",
|
||||
"source": "<code>uname -a</code>, <code>dmidecode</code>, PVE version. Cached per process."
|
||||
},
|
||||
{
|
||||
"card": "Storage / network / VMs cards",
|
||||
"endpoint": "/api/storage/summary, /api/network/summary, /api/vms",
|
||||
"source": "See the dedicated tabs for each. The header cards show a compacted view from the same endpoints."
|
||||
},
|
||||
{
|
||||
"card": "Refresh cadence",
|
||||
"endpoint": "—",
|
||||
"source": "CPU / network 5 s; storage / VMs 30 s; static info every 5 min. The Refresh button in the header forces an immediate re-fetch on every panel."
|
||||
}
|
||||
],
|
||||
"codeComment1": "# Single call that backs the header pill",
|
||||
"codeComment2": "# public, no token",
|
||||
"codeComment3": "# Authenticated snapshot used by the cards"
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Health Monitor",
|
||||
"href": "/docs/monitor/health-monitor",
|
||||
"tail": " — the modal behind the header status pill (ten categories, dismissals, suppression)."
|
||||
},
|
||||
{
|
||||
"label": "API Reference",
|
||||
"href": "/docs/monitor/api",
|
||||
"tail": " — the system, info and health endpoints."
|
||||
},
|
||||
{
|
||||
"label": "Notifications",
|
||||
"href": "/docs/monitor/notifications",
|
||||
"tail": " — how the same statuses turn into Telegram / Discord / Email messages."
|
||||
},
|
||||
{
|
||||
"label": "Dashboard index",
|
||||
"href": "/docs/monitor/dashboard",
|
||||
"tail": " — the other eight tabs at a glance."
|
||||
},
|
||||
{
|
||||
"label": "Architecture",
|
||||
"href": "/docs/monitor/architecture",
|
||||
"tail": " — the background threads and APIs that power this view."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,169 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "ProxMenux Monitor — Dashboard: Terminal tab | ProxMenux Documentation",
|
||||
"description": "Browser-based shell to the Proxmox host: up to 4 terminals at once with grid view, mobile-friendly keyboard helpers (ESC, TAB, arrows, Ctrl combos), an integrated commands cheatsheet powered by cheat.sh, JWT-protected."
|
||||
},
|
||||
"header": {
|
||||
"title": "Dashboard: Terminal tab",
|
||||
"description": "A real shell session in the browser, on the Proxmox host. Up to four terminals at once, mobile-friendly keyboard helpers, an integrated commands cheatsheet — all in the same theme as the rest of the dashboard.",
|
||||
"section": "ProxMenux Monitor · Dashboard"
|
||||
},
|
||||
"intro": {
|
||||
"title": "A real PTY in the browser",
|
||||
"body": "The terminal allocates a server-side PTY through <code>flask_terminal_routes</code>, pipes it over a WebSocket to <code>xterm.js</code> in the browser, and runs as <code>root</code> (the systemd unit's user). Anything you can do in <code>ssh root@<host></code> works here — including <code>vim</code>, <code>tmux</code>, ncurses tools and Proxmox CLIs (<code>qm</code>, <code>pct</code>, <code>pvesh</code>, <code>pvecm</code>)."
|
||||
},
|
||||
"singleAlt": "ProxMenux Monitor Terminal tab — single terminal session showing Fastfetch system summary on login",
|
||||
"singleCaption": "One host terminal open — the toolbar above shows the count (<em>1 / 4 terminals</em>), <em>+ New</em>, <em>Search</em>, <em>Clear</em> and <em>Close</em>. The mobile keyboard helpers appear under the terminal on touch devices.",
|
||||
"target": {
|
||||
"heading": "Connection target",
|
||||
"body1": "The Terminal tab opens a shell on the <strong>Proxmox host itself</strong> — the same login you would get over SSH. Each tab opens a brand-new host terminal.",
|
||||
"body2": "To reach an <strong>LXC container</strong> from the browser, use the dedicated <em>Console</em> button on every running CT card in the <link>VMs & LXCs tab</link>. It opens a modal that runs <code>pct enter <vmid></code> and reuses the same mobile-friendly toolbar described below."
|
||||
},
|
||||
"fourTerminals": {
|
||||
"heading": "Up to four terminals at once",
|
||||
"intro": "The tab lets you open up to four host terminals simultaneously. Each one gets its own PTY and its own WebSocket — they are fully independent sessions. Two layouts switch with the icons next to the \"New\" button:",
|
||||
"items": [
|
||||
"<strong>Tabs view</strong> — one terminal visible at a time, the others as named tabs at the top (<em>Terminal 1</em>, <em>Terminal 2</em>…). Best for working on one task with the rest as background.",
|
||||
"<strong>Grid view</strong> — all open terminals visible at once in a 2×2 grid. Useful for watching <code>htop</code> on one panel, <code>iftop</code> on another, and editing on a third without switching back and forth."
|
||||
],
|
||||
"outro": "The toolbar shows the current count (<em>1/4 terminals</em>, <em>4/4 terminals</em>). New tabs open with <strong>+ New</strong> and individual ones close from the small <code>×</code> on the tab header. The big red <strong>Close</strong> button at the top tears down all terminals at once."
|
||||
},
|
||||
"gridAlt": "ProxMenux Monitor Terminal tab — grid view with four host terminals running ls, network config, iftop and the ProxMenux main menu side by side",
|
||||
"gridCaption": "Grid view (4 / 4 terminals) — four independent host PTYs running in parallel: directory listing, <code>/etc/network/interfaces</code> on one side, <code>iftop</code> on another, and the ProxMenux main menu on the fourth. Switch between grid and tabs with the layout toggle in the toolbar.",
|
||||
"keyboard": {
|
||||
"heading": "Mobile-friendly keyboard helpers",
|
||||
"intro": "Phone and tablet keyboards usually don't expose ESC, TAB, the arrow keys or modifier combinations. Without them, navigating <code>vim</code>, <code>nano</code>, <code>htop</code> or any TUI menu is impossible. The Terminal tab solves that by rendering a row of touch-friendly buttons under the terminal whenever the device is small enough or touch-capable:",
|
||||
"headerButton": "Button",
|
||||
"headerSends": "Sends",
|
||||
"headerUse": "Typical use",
|
||||
"rows": [
|
||||
{
|
||||
"button": "ESC",
|
||||
"sends": "\\x1b",
|
||||
"use": "Exit insert mode in <code>vim</code>, cancel a TUI dialog, leave a search."
|
||||
},
|
||||
{
|
||||
"button": "TAB",
|
||||
"sends": "\\t",
|
||||
"use": "Path autocompletion, field navigation in dialog/whiptail."
|
||||
},
|
||||
{
|
||||
"button": "↑ ↓ ← →",
|
||||
"sends": "\\x1bO[ABCD]",
|
||||
"use": "Shell history, cursor movement, menu navigation."
|
||||
},
|
||||
{
|
||||
"button": "↵ Enter",
|
||||
"sends": "\\r",
|
||||
"use": "Confirm. Some on-screen keyboards swap Enter for Go/Done — this button is unambiguous."
|
||||
},
|
||||
{
|
||||
"button": "Ctrl ▾",
|
||||
"sends": "Dropdown",
|
||||
"useRich": true
|
||||
}
|
||||
],
|
||||
"ctrlIntro": "Three control sequences:",
|
||||
"ctrlItems": [
|
||||
"<code>Ctrl+C</code> — cancel / interrupt the running command (<code>\\x03</code>).",
|
||||
"<code>Ctrl+X</code> — exit <code>nano</code> (<code>\\x18</code>).",
|
||||
"<code>Ctrl+R</code> — reverse history search in bash (<code>\\x12</code>)."
|
||||
],
|
||||
"modalTitle": "Same toolbar in the LXC console modal",
|
||||
"modalBody": "The container console you launch from <link>VMs & LXCs → Console</link> renders the same keyboard helpers under the modal. The modal also auto-types <code>pct enter <vmid></code> on connect, so you land directly inside the container."
|
||||
},
|
||||
"lxcAlt": "ProxMenux Monitor LXC console modal — Terminal: ubuntu (ID: 103) with the same mobile-friendly toolbar (ESC, TAB, arrows, Enter, Ctrl) under the terminal",
|
||||
"lxcCaption": "The LXC console modal — opened from <em>VMs & LXCs → Console</em>. The header shows the target container (<em>Terminal: ubuntu (ID: 103)</em>) and the same touch-friendly toolbar appears under the terminal.",
|
||||
"search": {
|
||||
"heading": "Search Commands — integrated cheatsheet",
|
||||
"intro": "The blue <strong>Search</strong> button in the toolbar opens a modal with a fuzzy command lookup. Type a few letters of any Linux or Proxmox command (<code>ls</code>, <code>tar</code>, <code>qm</code>, <code>pct</code>, <code>zpool</code>, <code>systemctl</code>…) and the modal lists usage examples with one-tap <em>Send to active terminal</em>. It removes the \"wait, what flag was that\" round-trip to a separate browser tab.",
|
||||
"modalAlt": "ProxMenux Monitor Search Commands modal — fuzzy lookup for Linux and Proxmox commands powered by cheat.sh, showing several ls usage examples",
|
||||
"modalCaption": "The Search Commands modal querying <code>ls</code> — each result shows the command, its description and a small \"send\" arrow that pipes it to the active terminal. Bottom-right corner indicates the data source (<em>Powered by cheat.sh</em>).",
|
||||
"aboutLabel": "About cheat.sh:",
|
||||
"aboutBody": "is a community-curated, open-source unified cheatsheet that aggregates short, practical usage examples for hundreds of Linux commands, sysadmin tools and programming languages. Originally designed to be queried from a terminal with <code>curl cheat.sh/<command></code>, it's also reachable from any browser. ProxMenux Monitor proxies the queries server-side so the modal keeps working under the same origin as the dashboard.",
|
||||
"headerSource": "Source",
|
||||
"headerWhen": "When it's used",
|
||||
"headerWhat": "What you see",
|
||||
"onlineLabel": "(online)",
|
||||
"onlineWhen": "When the host has internet access and the cheat.sh proxy responds.",
|
||||
"onlineWhat": "Several real-world examples per command, typed with their description on top. The status dot in the modal header is <green>green</green>.",
|
||||
"fallbackLabel": "Local fallback",
|
||||
"fallbackWhen": "When cheat.sh is unreachable (offline host, restrictive firewall, cheat.sh outage).",
|
||||
"fallbackWhat": "A bundled list of common Linux + Proxmox commands. Smaller catalogue but always available. The status dot is <red>red</red>.",
|
||||
"sendingNote": "<strong>How sending works</strong>: clicking the small \"send\" arrow next to a result forwards the command text to whichever terminal is currently active (the focused tab, or the one you last clicked in grid view). The modal closes automatically so you can hit Enter immediately."
|
||||
},
|
||||
"auth": {
|
||||
"heading": "Authentication",
|
||||
"items": [
|
||||
"The WebSocket upgrade carries the JWT in the <code>Authorization</code> header. If auth is enabled and the token is missing or expired, the connection is rejected with HTTP 401 before a PTY is allocated.",
|
||||
"If the Monitor sits behind a reverse proxy, the proxy must forward WebSocket upgrades. See the <link>Access & Authentication</link> page for Nginx / Caddy / Traefik snippets."
|
||||
]
|
||||
},
|
||||
"clipboard": {
|
||||
"heading": "Clipboard, scrollback and resize",
|
||||
"items": [
|
||||
"<strong>Copy / paste</strong> — uses the browser's native clipboard. Select text with mouse / trackpad and use the OS shortcut (<code>Cmd+C</code> on macOS, <code>Ctrl+Shift+C</code> on Linux/Windows). Linux desktops also support middle-click paste.",
|
||||
"<strong>Scrollback</strong> — wheel / two-finger scroll. xterm.js keeps the last several thousand lines in memory.",
|
||||
"<strong>Resize</strong> — the terminal re-negotiates the PTY window size when you resize the dashboard pane, so <code>htop</code> and <code>vim</code> render properly.",
|
||||
"<strong>Reconnect on tab focus</strong> — if you switch apps on a phone or tablet (a common iPad behaviour), the WebSocket would normally drop. The Terminal tab detects the visibility change and reconnects automatically when you come back, with a 15-second timeout for slow VPN paths."
|
||||
]
|
||||
},
|
||||
"disconnect": {
|
||||
"heading": "Disconnect causes",
|
||||
"intro": "The most common reasons a session ends and what to do about each:",
|
||||
"headerCause": "Cause",
|
||||
"headerFix": "Fix",
|
||||
"rows": [
|
||||
{
|
||||
"cause": "Session JWT expired (24 h window).",
|
||||
"fix": "Refresh the page and log in again. The terminal isn't designed for unattended sessions, so the JWT lifetime matches the regular dashboard login."
|
||||
},
|
||||
{
|
||||
"cause": "Reverse proxy idle timeout.",
|
||||
"fix": "Bump <code>proxy_read_timeout</code> on Nginx or the equivalent on Caddy / Traefik (snippets in Access & Authentication)."
|
||||
},
|
||||
{
|
||||
"cause": "Phone or tablet sleep.",
|
||||
"fix": "When the device wakes back up the tab auto-reconnects (15 s timeout for VPN paths). If it doesn't, reload the tab."
|
||||
},
|
||||
{
|
||||
"cause": "Service restart on the host.",
|
||||
"fix": "Any restart of <code>proxmenux-monitor.service</code> drops every PTY. Open new terminals after the dashboard finishes reloading."
|
||||
}
|
||||
]
|
||||
},
|
||||
"warning": {
|
||||
"title": "The terminal is a root shell on the host",
|
||||
"body": "The terminal inherits the systemd unit's identity (<code>root</code>) and therefore has full privileges over the Proxmox host. Configure a username, password and 2FA in <authLink>Access & Authentication</authLink> before exposing the dashboard beyond your local network: anyone who reaches port 8008 without authentication would land directly in a root shell — no extra prompts, no SSH credentials. For access from outside the LAN, route the dashboard through <gatewayLink>Secure Gateway</gatewayLink> (Tailscale) or a reverse proxy with HTTPS, instead of opening the port to the public internet."
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Access & Authentication",
|
||||
"href": "/docs/monitor/access-auth",
|
||||
"tail": " — reverse-proxy snippets including the WebSocket upgrade lines required for the terminal."
|
||||
},
|
||||
{
|
||||
"label": "Architecture",
|
||||
"href": "/docs/monitor/architecture",
|
||||
"tail": " — the WebSocket transport (HTTP via flask-sock vs HTTPS / WSS via gevent)."
|
||||
},
|
||||
{
|
||||
"label": "API Reference",
|
||||
"href": "/docs/monitor/api",
|
||||
"tail": " — the /ws/terminal and /ws/script/<sid> WebSocket endpoints alongside the rest of the API."
|
||||
},
|
||||
{
|
||||
"label": "Integrations → Secure Gateway",
|
||||
"href": "/docs/monitor/integrations",
|
||||
"tail": " — when you want terminal access from outside the LAN without exposing port 8008."
|
||||
},
|
||||
{
|
||||
"label": "Dashboard index",
|
||||
"href": "/docs/monitor/dashboard",
|
||||
"tail": " — the other tabs."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,248 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "ProxMenux Monitor — Dashboard: VMs & LXCs tab | ProxMenux Documentation",
|
||||
"description": "The VMs & LXCs tab inventories every guest on the host with live CPU / memory / disk usage. Per-guest drill-in shows configuration, resources, backups, full guest logs, notes, and Start / Shutdown / Reboot / Stop controls."
|
||||
},
|
||||
"header": {
|
||||
"title": "Dashboard: VMs & LXCs tab",
|
||||
"description": "The full inventory of guests on the node. Four headline metrics across the top, a sortable list of every VM and LXC below, and a drill-in per guest with config, resources, backups, logs and the four lifecycle controls (Start / Shutdown / Reboot / Stop).",
|
||||
"section": "ProxMenux Monitor · Dashboard"
|
||||
},
|
||||
"intro": {
|
||||
"title": "The control surface for guests",
|
||||
"body": "Other tabs are read-only; this is the one you act from. Anything that changes guest state goes through <code>POST /api/vms/<vmid>/control</code> with an explicit confirmation and the response is reflected back in the guest's row. There is no force-shutdown without going through the dedicated Stop button."
|
||||
},
|
||||
"topRow": {
|
||||
"heading": "Top row: four stat cards",
|
||||
"intro": "Opening the VMs & LXCs tab lands you on a four-card summary of guest state — totals, CPU utilisation, memory commitment vs host capacity, and disk allocation.",
|
||||
"imageAlt": "VMs & LXCs tab — top row of four stat cards: Total VMs & LXCs, Total CPU, Total Memory, Total Disk",
|
||||
"imageCaption": "Top row of the VMs & LXCs tab — totals + Running / Stopped badges, current CPU utilisation, memory broken down into used / running-allocated / total-allocated (with a Within Limits badge), and allocated disk space.",
|
||||
"headerCard": "Card",
|
||||
"headerWhat": "What it shows",
|
||||
"totalLabel": "Total VMs & LXCs",
|
||||
"totalWhat": "Total count with two badges — <em>X Running</em> (green) and <em>Y Stopped</em> (red, only when > 0). The number you watch when something didn't come back up after a reboot.",
|
||||
"cpuLabel": "Total CPU",
|
||||
"cpuWhat": "Aggregate live CPU utilisation across all guests as a percentage of the host's physical CPU, with a footer line <em>\"Allocated CPU usage\"</em>.",
|
||||
"memoryLabel": "Total Memory",
|
||||
"memoryIntro": "Three readings stacked vertically:",
|
||||
"memoryItems": [
|
||||
"<strong>Currently used</strong> — large value (e.g. <em>15.4 GB</em>) plus <em>X.X % of Y GB</em> against the host's total RAM. A blue progress bar tracks the percentage.",
|
||||
"<strong>Running allocated</strong> + <strong>Total allocated</strong> — sum of <code>maxmem</code> across guests that are <em>currently up</em> next to the same sum across <em>every</em> guest including stopped ones. The first matters today; the second matters when you start everything at once.",
|
||||
"<strong>Within Limits</strong> badge (green) — flips to <em>Over-committed</em> if total allocated exceeds the host's RAM. Healthy memory over-commit is fine on hosts with KSM, but the badge is the early warning when it's no longer comfortable."
|
||||
],
|
||||
"diskLabel": "Total Disk",
|
||||
"diskWhat": "Sum of disk space allocated across all guests, in the appropriate unit (GB / TB), with the footer line <em>\"Allocated disk space\"</em>."
|
||||
},
|
||||
"inventory": {
|
||||
"heading": "Virtual Machines & Containers list",
|
||||
"intro": "One row per guest. The list is single-sourced from <code>/api/vms</code>, which consolidates <code>qm list</code> + <code>pct list</code> + <code>pvesh /cluster/resources</code> on the host.",
|
||||
"imageAlt": "Virtual Machines & Containers list — one row per guest with status, type badge, name, ID and inline CPU / memory / disk percentages",
|
||||
"imageCaption": "The mobile-optimized layout of the inventory — the same data the desktop view shows, restacked into a single column with the percentages and status indicators kept compact.",
|
||||
"rowsIntro": "Each row shows:",
|
||||
"rows": [
|
||||
"<strong>Status icon</strong> — green play (running) or red square (stopped). For stopped guests, the rest of the row dims so you instantly see what's offline.",
|
||||
"<strong>Type badge</strong> — <em>LXC</em> (cyan) for containers, <em>VM</em> (purple) for virtual machines.",
|
||||
"<strong>Name</strong> — guest hostname / display name.",
|
||||
"<strong>VMID</strong> — the Proxmox numeric ID below the name.",
|
||||
"<strong>Inline metrics</strong> — three percentages with their icon (CPU %, Memory %, Disk %). Each icon turns orange when the metric crosses an attention threshold (e.g. memory above 90 %), so a quick scan tells you which guest is under pressure without opening it."
|
||||
],
|
||||
"clickHint": "Clicking any row — running or stopped — opens the drill-in modal described below.",
|
||||
"mobileTitle": "The list is built mobile-first",
|
||||
"mobileBody": "On phones and narrow windows the inventory reflows into a single column with type badge, name, ID and the three metric percentages on one line each — exactly the screenshot above. On wider viewports the same data spreads horizontally with extra room for the percentages. Either way, every row is the same full target: tap to drill in."
|
||||
},
|
||||
"drillIn": {
|
||||
"heading": "Per-guest drill-in modal",
|
||||
"intro": "The modal opens with a header showing the guest name, VMID, type badge (LXC / VM), state badge (RUNNING / STOPPED / …) and current uptime. Below the header are <strong>two tabs</strong> — <em>Status</em> and <em>Backups</em> — and a fixed action bar at the bottom of the modal with the four lifecycle controls (Start / Shutdown / Reboot / Force Stop) and, on running LXC containers, a Console button.",
|
||||
"statusTitle": "Tab 1 — Status",
|
||||
"statusImageAlt": "Per-guest drill-in modal — Status tab with CPU / Memory / Disk live cards, Disk and Network I/O totals, the OS distro logo, and the Resources / IP Addresses block",
|
||||
"statusImageCaption": "Status tab — live CPU / Memory / Disk with progress bars at the top, accumulated I/O totals (disk read/write, network down/up) below, then the static Resources block with Notes and + Info expansions and the IP Addresses pill list.",
|
||||
"statusIntro": "The default tab — the \"is this guest behaving?\" view. Three blocks:",
|
||||
"liveTitle": "1. Live metrics row",
|
||||
"liveItems": [
|
||||
"<strong>CPU Usage (X cores)</strong> — current percentage with a progress bar. The header shows the configured core count so you know what 100 % would mean.",
|
||||
"<strong>Memory</strong> — <em>used / max</em> in GB with a progress bar.",
|
||||
"<strong>Disk</strong> — <em>used / max</em> across the guest's primary disk image, same shape."
|
||||
],
|
||||
"ioTitle": "2. I/O totals + OS logo",
|
||||
"ioItems": [
|
||||
"<strong>Disk I/O</strong> — accumulated read (↓) and write (↑) totals since boot. Useful to spot a guest that's suddenly become I/O-heavy compared to its baseline.",
|
||||
"<strong>Network I/O</strong> — accumulated download (↓) and upload (↑). Same idea on the network side.",
|
||||
"<strong>OS distro logo</strong> — the Debian / Ubuntu / Alpine / Windows / etc. icon detected from the guest's OS type. A quick visual cue when scrolling several modals open."
|
||||
],
|
||||
"resourcesTitle": "3. Resources block",
|
||||
"resourcesIntro": "The configuration of the guest as Proxmox sees it — CPU Cores, Memory (configured <code>maxmem</code>), Swap. Two collapsible buttons in the block header:",
|
||||
"resourcesItems": [
|
||||
"<strong>Notes</strong> — the guest's description field. Editable: typing here and saving calls <code>PUT /api/vms/<vmid>/config</code> and writes back to <code>/etc/pve/qemu-server/<vmid>.conf</code> or <code>/etc/pve/lxc/<vmid>.conf</code>.",
|
||||
"<strong>+ Info</strong> — extra fields that are too verbose for the default view: bios mode, machine type, agent state, hostpci passthrough entries, mount points (CT), boot order."
|
||||
],
|
||||
"ipsTitle": "4. IP Addresses",
|
||||
"ipsBody": "Pill list of every IPv4 / IPv6 address the guest currently exposes — green pill per address. Empty when the guest is stopped or when the QEMU agent isn't installed in a VM (LXCs always report addresses directly).",
|
||||
"mountsTitle": "Tab 2 — Mounts (LXC only)",
|
||||
"mountsImageAlt": "LXC drill-in modal — Mounts tab listing every mount point the container is using: PVE volumes, host binds, binds from PVE storage and ad-hoc NFS/CIFS mounts the operator mounted from inside the CT. Each card carries a type badge, capacity bar, used/total bytes, mount options, and a colour-coded state dot (green healthy, amber readonly/divergent, red stale)",
|
||||
"mountsImageCaption": "Mounts tab — only renders for LXC containers, and only when at least one mount point or ad-hoc remote mount is present. A CT without mounts gets no tab.",
|
||||
"mountsIntro": "Proxmox's own UI shows the mount-point entries defined in the container config (<code>mpX</code>) but stops there — anything you mount from inside the CT later (<code>mount.cifs</code>, NFS via <code>autofs</code>, …) is invisible. This tab merges <strong>both views</strong>: the configured mounts <strong>and</strong> the runtime mounts ProxMenux probes from inside the container, with a per-mount health status and a capacity bar wherever the backend can resolve one.",
|
||||
"mountTypesTitle": "Types of mount detected",
|
||||
"mountTypesItems": [
|
||||
"<strong>PVE volume</strong> — backed by a Proxmox-managed storage (a ZFS subvol, a directory entry, a Ceph RBD, …). Capacity comes from the PVE storage stats so the bar matches what Proxmox itself shows.",
|
||||
"<strong>Bind from PVE storage</strong> — <code>mpX</code> entry pointing at a path on a PVE-known storage.",
|
||||
"<strong>Bind from host</strong> — <code>mpX</code> entry pointing at an arbitrary host path (<code>/mnt/something</code>). Capacity is the <code>df</code> of that host path.",
|
||||
"<strong>Ad-hoc inside CT</strong> — mount that <em>only</em> exists in the container's mount namespace (e.g. an NFS share that the CT mounts on its own). Capacity is read via <code>pct exec <vmid> df</code>, which is the only way to see it — <code>/proc/<pid>/root</code> from the host doesn't expose the remote mount's real stats."
|
||||
],
|
||||
"mountStateTitle": "Per-card state dot and warnings",
|
||||
"mountStateItems": [
|
||||
"<green/> <strong>Green</strong> — mount is healthy and reachable.",
|
||||
"<amber/> <strong>Amber</strong> — divergent (configured but not actually mounted), read-only, or <em>zombie bind</em> (the host source was removed but the CT still sees the bind as mounted — typical when a USB drive was unplugged or a manual <code>umount</code> happened on the host).",
|
||||
"<red/> <strong>Red</strong> — stale: the runtime probe couldn't reach the mount (common with NFS exports whose server is down)."
|
||||
],
|
||||
"mountsCalloutTitle": "What this gives you over the native UI",
|
||||
"mountsCalloutBody": "A truthful, capacity-aware view of every place the container reads or writes. NFS or CIFS shares mounted from inside the CT — invisible to the Proxmox web UI — appear here with the same look and the same health probe as any configured mount point. Stale remote mounts and zombie binds are flagged before they bite during a backup.",
|
||||
"backupsTitle": "Tab 3 — Backups",
|
||||
"backupsImageAlt": "Per-guest drill-in modal — Backups tab with the available backups list, destination tag, sizes and the Create Backup button",
|
||||
"backupsImageCaption": "Backups tab — every backup stored on configured Proxmox storages for this guest, sorted newest first. The tab header carries the count badge.",
|
||||
"backupsIntro": "Lists every backup stored across configured Proxmox storages for this guest, sorted newest first. The tab title carries a count badge so you see at a glance whether the guest is backed up. Per row:",
|
||||
"backupsItems": [
|
||||
"<strong>Timestamp</strong> — date and time of the run.",
|
||||
"<strong>Destination tag</strong> — the storage where it lives (PBS-Cloud, PBS-Local, NFS-Backup, …) coloured by status.",
|
||||
"<strong>Size</strong> — final on-disk size of the backup."
|
||||
],
|
||||
"backupsOutro": "The <strong>+ Create Backup</strong> button at the top right kicks off a new run on the storage marked as \"Backup target\" in the Proxmox storage config. Restore lives in the Proxmox web UI — the Monitor exposes the \"is this guest backed up recently?\" view, not the recovery flow.",
|
||||
"updatesTitle": "Updates badge (LXC only)",
|
||||
"updatesImageAlt": "LXC drill-in modal — clickable violet 'updates available' badge in the header of a container that has pending apt or apk updates. Clicking it expands a panel listing every upgradable package with its current and target versions, plus a security-only counter when the underlying repo flags any of them as security",
|
||||
"updatesImageCaption": "The badge only appears on running LXC containers that have at least one upgradable package. Click it to open the package list inside the modal — no separate tab in the nav strip.",
|
||||
"updatesIntro": "ProxMenux probes every running container on the host once a day and counts the upgradable packages. Currently supported in this phase: <strong>Debian / Ubuntu</strong> via <code>apt list --upgradable</code> and <strong>Alpine</strong> via <code>apk list -u</code>. Containers running other distributions (CentOS, Arch, …) are skipped for now — they show no badge instead of a misleading zero.",
|
||||
"updatesPanelTitle": "What the panel shows",
|
||||
"updatesPanelItems": [
|
||||
"<strong>Total upgradable count</strong> at the top, plus a separate <strong>security</strong> counter when the underlying repository flags any of the packages as security (Debian/Ubuntu \"-security\" suite). Alpine doesn't expose a separate security suite via apk metadata, so security is always 0 on Alpine containers.",
|
||||
"<strong>Per-package list</strong> with name, current version and target version. Use this to decide whether to run the upgrade now or wait for a maintenance window."
|
||||
],
|
||||
"updatesScopeTitle": "What the system tracks vs what the script counts",
|
||||
"updatesScopeBody": "This update detector follows whatever is already installed inside the container — it does <strong>not</strong> install anything new and does <strong>not</strong> know about applications that were deployed outside apt / apk (a Docker container running inside the LXC, a Vaultwarden installed from source, a binary dropped into <code>/usr/local/bin</code>). It is a <em>package-manager</em> view, not an <em>application</em> view. Future phases of this work will integrate community-script application metadata so per-app upstream tracking (Vaultwarden, Jellyfin, …) becomes possible.",
|
||||
"updatesToggleTitle": "Detection vs notification — toggle semantics",
|
||||
"updatesToggleCalloutTitle": "Detection is always on; the toggle only controls the notification",
|
||||
"updatesToggleCalloutBody": "The package-update detection on running containers runs unconditionally — the badge appears in this modal whenever there are updates pending, regardless of any other setting. The <code>lxc_updates_available</code> notification toggle in <strong>Settings → Notifications</strong> only controls whether a grouped \"N CT(s) have pending updates\" message is delivered to your channels. This keeps the toggle semantics consistent with every other update stream (NVIDIA driver, Coral driver, ProxMenux optimizations): turning notifications off never hides the information in the dashboard.",
|
||||
"updatesApplyTitle": "Applying the updates",
|
||||
"updatesApplyBody": "Open the container shell from the bottom action bar, or use <code>pct exec <vmid> -- apt full-upgrade -y</code> / <code>pct exec <vmid> -- apk upgrade -y</code> from the host. The dashboard re-scans on its 24h cycle (or after the next manual refresh) and the badge updates.",
|
||||
"firewallTitle": "Tab 5 — Firewall",
|
||||
"firewallIntro": "Reads the per-guest Proxmox firewall log straight from the host (no extra service, no polling). The tab is always present in the navigation strip; the panel decides what to render depending on whether the firewall is enabled for that guest and whether any rule is actually logging:",
|
||||
"firewallItems": [
|
||||
"<strong>Firewall disabled</strong> — an amber notice explains exactly where to enable it in the Proxmox UI (<em><Container|VM> → Firewall → Options</em>) and reminds you that at least one rule needs <code>log: info</code> (or higher) before packets show up.",
|
||||
"<strong>Firewall enabled, no events yet</strong> — empty-state hint with the same logging requirement, useful when you just turned the firewall on.",
|
||||
"<strong>Events present</strong> — a scrollable monospace pane with the raw entries coloured by action: <green>ACCEPT</green> (green), <orange>REJECT</orange> (orange), <red>DROP</red> (red). A count badge in the header shows how many entries are currently loaded."
|
||||
],
|
||||
"firewallRefresh": "A <em>Refresh</em> button at the top right of the panel pulls the latest entries on demand — there is no auto-refresh inside the modal, so the list is a snapshot of the moment you opened the tab or pressed refresh. The data comes from the per-guest log file that Proxmox writes under <code>/var/log/pve-firewall.log</code> filtered by VMID, exposed via <code>GET /api/vms/<vmid>/firewall/log</code>.",
|
||||
"firewallCalloutTitle": "Why have it here when the Proxmox UI already shows it?",
|
||||
"firewallCalloutBody": "Two reasons: it removes the round-trip through the Proxmox web UI when you're already inspecting a guest from the dashboard, and it keeps the same VMID-scoped view the rest of the modal uses — start the guest, check its mounts, look at recent firewall hits and stop it again without leaving the panel. The Monitor never edits firewall rules; rule editing stays in the native Proxmox interface where it belongs.",
|
||||
"actionBarTitle": "Bottom action bar",
|
||||
"actionBarIntro": "Always visible at the foot of the modal regardless of which tab is active:",
|
||||
"consoleItem": "<strong>Console</strong> (LXC only, running) — opens a modal that runs <code>pct enter <vmid></code> and lands you inside the container. Same xterm.js + WebSocket plumbing as the standalone <link>Terminal tab</link>, including the <strong>mobile-friendly toolbar</strong> with ESC, TAB, arrow keys, Enter and the Ctrl combos (Ctrl+C / Ctrl+X / Ctrl+R) under the terminal — making the modal usable from a phone or tablet keyboard. VMs do not expose a Console button here; use the Proxmox web console (noVNC) for guest access.",
|
||||
"lifecycleIntro": "Below it, four lifecycle buttons in a 2×2 grid. Each fires <code>POST /api/vms/<vmid>/control</code> with the matching <code>action</code>; enabled state depends on whether the guest is currently running:",
|
||||
"headerButton": "Button",
|
||||
"headerEnabled": "Enabled when",
|
||||
"headerAction": "Action sent to host",
|
||||
"lifecycleRows": [
|
||||
{
|
||||
"button": "Start",
|
||||
"color": "green",
|
||||
"enabled": "Guest is stopped.",
|
||||
"action": "qm start / pct start"
|
||||
},
|
||||
{
|
||||
"button": "Shutdown",
|
||||
"color": "blue",
|
||||
"enabled": "Guest is running.",
|
||||
"action": "qm shutdown / pct shutdown — graceful, ACPI"
|
||||
},
|
||||
{
|
||||
"button": "Reboot",
|
||||
"color": "blue",
|
||||
"enabled": "Guest is running.",
|
||||
"action": "qm reboot / pct reboot — graceful restart"
|
||||
},
|
||||
{
|
||||
"button": "Force Stop",
|
||||
"color": "red",
|
||||
"enabled": "Guest is running.",
|
||||
"action": "qm stop / pct stop — hard power-off"
|
||||
}
|
||||
],
|
||||
"forceStopTitle": "Force Stop is the kill switch, not the polite option",
|
||||
"forceStopBody": "<strong>Force Stop</strong> bypasses the guest's shutdown sequence — equivalent to pulling the power cable. Use <strong>Shutdown</strong> when the guest is responsive; reach for Force Stop only when Shutdown hangs and you accept the data-loss risk of an uncoordinated power-off. The button is red and labelled deliberately so you don't click it by reflex."
|
||||
},
|
||||
"dataCollected": {
|
||||
"heading": "How the data is collected",
|
||||
"headerSection": "Section of the tab",
|
||||
"headerEndpoint": "Endpoint",
|
||||
"headerSource": "Source",
|
||||
"rows": [
|
||||
{
|
||||
"section": "Inventory list",
|
||||
"endpoint": "/api/vms",
|
||||
"source": "<code>pvesh get /cluster/resources --type vm</code> for VMs and CTs."
|
||||
},
|
||||
{
|
||||
"section": "Detail panel (config, network, disks)",
|
||||
"endpoint": "/api/vms/<vmid>",
|
||||
"source": "<code>qm config <id></code> for VMs / <code>pct config <id></code> for CTs."
|
||||
},
|
||||
{
|
||||
"section": "Per-guest metrics chart",
|
||||
"endpoint": "/api/vms/<vmid>/metrics",
|
||||
"source": "PVE RRD data (<code>pvesh get /nodes/<node>/qemu/<id>/rrddata</code>) condensed to a chart-friendly shape."
|
||||
},
|
||||
{
|
||||
"section": "Recent task logs (modal)",
|
||||
"endpoint": "/api/vms/<vmid>/logs",
|
||||
"source": "Tasks for that <code>vmid</code> from <code>/var/log/pve/tasks/index</code>."
|
||||
},
|
||||
{
|
||||
"section": "Backups available for guest",
|
||||
"endpoint": "/api/vms/<vmid>/backups",
|
||||
"source": "<code>pvesm list <storage></code> filtered by VMID."
|
||||
},
|
||||
{
|
||||
"section": "Per-guest firewall log (Firewall tab)",
|
||||
"endpoint": "/api/vms/<vmid>/firewall/log",
|
||||
"source": "<code>/var/log/pve-firewall.log</code> filtered by VMID."
|
||||
},
|
||||
{
|
||||
"section": "Power buttons (Start / Stop / Reboot / Shutdown)",
|
||||
"endpoint": "/api/vms/<vmid>/control",
|
||||
"source": "<code>qm start|stop|reboot|shutdown</code> or <code>pct</code> equivalents."
|
||||
}
|
||||
],
|
||||
"codeComment1": "# Cross-check what the dashboard sees against PVE",
|
||||
"codeComment2": "# Inspect a specific guest's config exactly as the modal sees it",
|
||||
"codeComment3": "# VM",
|
||||
"codeComment4": "# CT"
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Health Monitor",
|
||||
"href": "/docs/monitor/health-monitor",
|
||||
"tailRich": " — the VMs & Containers category (failed boot, QMP timeouts, CT shutdown failures)."
|
||||
},
|
||||
{
|
||||
"label": "Notifications",
|
||||
"href": "/docs/monitor/notifications",
|
||||
"tailRich": " — what the <code>vm_*</code>, <code>ct_*</code>, <code>migration_*</code> and <code>backup_*</code> events trigger downstream."
|
||||
},
|
||||
{
|
||||
"label": "API Reference",
|
||||
"href": "/docs/monitor/api",
|
||||
"tailRich": " — the VM and backup endpoints."
|
||||
},
|
||||
{
|
||||
"label": "Dashboard index",
|
||||
"href": "/docs/monitor/dashboard",
|
||||
"tailRich": " — the other tabs."
|
||||
},
|
||||
{
|
||||
"label": "ProxMenux → Create VM",
|
||||
"href": "/docs/create-vm",
|
||||
"tailRich": " — provisioning side: System NAS templates (Synology and others), Linux / Windows VMs, defaults tailored for Proxmox."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,369 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "Proxmox Health Monitor — CPU, Memory, Storage, SMART, ZFS, Logs | ProxMenux",
|
||||
"description": "Proactive Proxmox VE health monitoring: ten categories scanned every five minutes (CPU & temperature, memory & swap, storage, disks/SMART, network, VMs, services, logs, updates, security), four severity levels, per-category suppression durations, automatic cleanup of resolved errors, a permanent disk observation history and the path from a raw event to a Telegram, Discord, Gotify or email notification.",
|
||||
"ogTitle": "Proxmox Health Monitor — CPU, Memory, Storage, SMART, ZFS, Logs",
|
||||
"ogDescription": "Proactive Proxmox VE health monitoring across ten categories with severity levels, suppression durations and event-driven notifications.",
|
||||
"twitterTitle": "Proxmox Health Monitor | ProxMenux",
|
||||
"twitterDescription": "Proactive Proxmox VE health monitoring across ten categories with severity levels and notifications."
|
||||
},
|
||||
"header": {
|
||||
"title": "Health Monitor",
|
||||
"description": "The continuous self-check that scans ten categories of host state on a five-minute cycle, samples vital signs continuously between cycles, deduplicates findings into a structured event stream, and feeds the dashboard, the notification engine and the optional AI rewriter from one source of truth.",
|
||||
"section": "ProxMenux Monitor"
|
||||
},
|
||||
"intro": {
|
||||
"title": "One scanner, three consumers",
|
||||
"body": "A background thread runs the full health cycle every 5 minutes, persists each finding into SQLite under a stable <code>error_key</code>, and lets <strong>(1)</strong> the dashboard render the current state, <strong>(2)</strong> the notification engine fan out new events to the configured channels, and <strong>(3)</strong> the optional AI assistant rewrite alerts in plain language. You configure the scanner once; everything downstream stays in sync."
|
||||
},
|
||||
"howItWorks": {
|
||||
"heading": "How it works",
|
||||
"intro": "The Health Monitor runs on two parallel lanes inside the Monitor process. A lightweight <strong>vital signs sampler</strong> reads CPU, memory and temperature every few seconds so that sustained-threshold conditions are detected fast; in parallel, the <strong>full health cycle</strong> runs every five minutes and exercises every category from end to end. Both lanes converge into the same SQLite tables — and from there, three consumers read the state independently.",
|
||||
"scannerTitle": "From sample to stored finding",
|
||||
"scannerCaption": "The scanner. Vital signs are sampled fast so sustained-CPU / sustained-memory pressure can be detected before the next 5-min cycle. The full cycle reads those buffers and runs the heavier checks (SMART, ZFS pool state, journal scanning, service health, etc.) before writing the structured findings to SQLite.",
|
||||
"scannerArrowLabel": "step",
|
||||
"scannerNodes": {
|
||||
"samplerLabel": "Vital signs sampler",
|
||||
"samplerDetail": "CPU usage 30 s\nMemory 30 s\nTemperature 15 s\n→ history buffers",
|
||||
"cycleLabel": "Full health cycle",
|
||||
"cycleDetail": "Every 5 min\nReads buffers\n+ live probes\n(SMART, ZFS,\nservices, journal…)",
|
||||
"checksLabel": "Per-category checks",
|
||||
"checksDetail": "Ten categories\n(CPU, memory,\nstorage, disks,\nnetwork, VMs,\nservices, logs,\nupdates, security)",
|
||||
"sqliteLabel": "SQLite",
|
||||
"sqliteDetail": "errors table\n(active +\ndismissed)\n+ disk_observations\n(permanent\nper-disk history)"
|
||||
},
|
||||
"notifTitle": "From stored finding to user",
|
||||
"notifCaption": "The notification path. The same errors table also drives the dashboard view (Active / Dismissed lists rendered live) and is consumed by the cleanup routine at the end of each cycle to auto-resolve stale entries — both run from the same data without going through the dispatcher.",
|
||||
"notifArrowLabel": "event",
|
||||
"notifNodes": {
|
||||
"errorsLabel": "errors table",
|
||||
"errorsDetail": "Active +\nDismissed rows\nkeyed by\nerror_key",
|
||||
"dispatcherLabel": "Notification dispatcher",
|
||||
"dispatcherDetail": "New + escalated\nevents queued\nThrough toggles\n+ cooldown",
|
||||
"templatesLabel": "Templates + AI rewrite",
|
||||
"templatesDetail": "Per-event\ntemplate\n→ optional AI\nplain-language\nrewrite",
|
||||
"channelsLabel": "Channels",
|
||||
"channelsDetail": "Telegram\nDiscord\nGotify\nEmail (SMTP)"
|
||||
}
|
||||
},
|
||||
"categories": {
|
||||
"heading": "The ten categories",
|
||||
"imageAlt": "Health Monitor view showing the ten categories with their current statuses (CPU, Memory, Storage, Disks, Network, VMs, Services, Logs, Updates, Security)",
|
||||
"imageCaption": "Health Monitor view — the ten categories with their current status. Categories on a healthy host all show OK; warnings and critical events appear inline with the rows that produced them.",
|
||||
"intro": "Every cycle exercises ten independent checkers. Each produces one of four statuses (<strong>OK</strong>, <strong>INFO</strong>, <strong>WARNING</strong>, <strong>CRITICAL</strong>) plus a structured payload — device names, sample log lines, exact thresholds — that surface in the dashboard and travel through to the notification body.",
|
||||
"headerCategory": "Category",
|
||||
"headerChecks": "Sub-checks",
|
||||
"headerEvents": "Typical events",
|
||||
"rows": [
|
||||
{
|
||||
"category": "CPU & Temperature",
|
||||
"checks": "CPU usage with hysteresis, sensor temperature",
|
||||
"events": "High sustained load; CPU temperature crossing the vendor warning / critical thresholds."
|
||||
},
|
||||
{
|
||||
"category": "Memory & Swap",
|
||||
"checks": "RAM usage, swap usage",
|
||||
"events": "Sustained memory pressure; OOM-killer activity; swap exhaustion."
|
||||
},
|
||||
{
|
||||
"category": "Storage",
|
||||
"checks": "Proxmox storages, root filesystem",
|
||||
"events": "Storage offline (NFS server unreachable, CIFS expired creds); root mount > 90 %; LVM thin pool nearing full."
|
||||
},
|
||||
{
|
||||
"category": "Disks & SMART",
|
||||
"checks": "SMART, dmesg I/O errors, ZFS pools, LVM, filesystem errors",
|
||||
"events": "SMART health failed; reallocated / pending sectors; ATA I/O errors; ZFS pool DEGRADED / FAULTED; ext4 read-only remount."
|
||||
},
|
||||
{
|
||||
"category": "Network",
|
||||
"checks": "Connectivity, link state, gateway latency",
|
||||
"events": "Bridge or bond down; gateway unreachable; persistent latency spikes."
|
||||
},
|
||||
{
|
||||
"category": "VMs & Containers",
|
||||
"checks": "QMP communication, VM startup, container startup",
|
||||
"events": "Failed VM boot; CT shutdown failure; QMP socket timeout; missing config / disk after a clone."
|
||||
},
|
||||
{
|
||||
"category": "PVE Services",
|
||||
"checks": "<code>pveproxy</code>, <code>pvedaemon</code>, <code>pvestatd</code>, <code>pve-cluster</code>, cluster mode",
|
||||
"events": "Service crashed; cluster quorum lost; <code>pmxcfs</code> stuck."
|
||||
},
|
||||
{
|
||||
"category": "System Logs",
|
||||
"checks": "Persistent errors, error spikes, error cascades, critical kernel messages",
|
||||
"events": "Repeated identical errors; sudden burst of warnings (cascade pattern); <code>BUG:</code> / <code>OOPS:</code> / <code>oom-killer</code> in dmesg."
|
||||
},
|
||||
{
|
||||
"category": "System Updates",
|
||||
"checks": "Pending updates, security updates, kernel / PVE version, system age",
|
||||
"events": "Security updates available; pinned kernel several minor versions behind; host uptime > 90 days."
|
||||
},
|
||||
{
|
||||
"category": "Security & Certificates",
|
||||
"checks": "Login attempts, certificates expiring, optional Fail2Ban jail status",
|
||||
"events": "Repeated SSH / web auth failures; PVE certificate < 30 days from expiring; Fail2Ban active bans."
|
||||
}
|
||||
]
|
||||
},
|
||||
"severity": {
|
||||
"heading": "Severity model",
|
||||
"headerStatus": "Status",
|
||||
"headerColour": "Colour",
|
||||
"headerMeaning": "Meaning",
|
||||
"headerNotification": "Notification",
|
||||
"rows": [
|
||||
{
|
||||
"status": "OK",
|
||||
"colour": "Green",
|
||||
"meaning": "Healthy. No findings in this category.",
|
||||
"notification": "Silent."
|
||||
},
|
||||
{
|
||||
"status": "INFO",
|
||||
"colour": "Blue",
|
||||
"meaning": "Transient or already-resolved condition worth noting once. Also used for categories that have <em>only</em> dismissed items left.",
|
||||
"notification": "Optional. Each event type can be opted in or out per channel."
|
||||
},
|
||||
{
|
||||
"status": "WARNING",
|
||||
"colour": "Yellow",
|
||||
"meaning": "Attention is needed but the host is still functional. Cause is non-trivial — read the details.",
|
||||
"notification": "Sent when the per-event toggle is on for the channel."
|
||||
},
|
||||
{
|
||||
"status": "CRITICAL",
|
||||
"colour": "Red",
|
||||
"meaning": "Functionality broken or data loss possible. Action required.",
|
||||
"notification": "Sent when the per-event toggle is on for the channel. CPU temperature CRITICAL is treated as a safety alert that re-fires even if previously dismissed."
|
||||
}
|
||||
],
|
||||
"infoNote": "A category that is <strong>OK</strong> but has dismissed events still inside their suppression window is rendered as <strong>INFO</strong> — to remind you that something is being silenced rather than that nothing was ever wrong.",
|
||||
"unknownTitle": "UNKNOWN, when a check can't complete",
|
||||
"unknownBody": "A check that fails to produce a verdict for three cycles in a row (a probe that times out, a sensor that disappeared, a tool that exits with an error) is recorded internally as <code>UNKNOWN</code>. The dashboard surfaces this as a yellow status — the overall view caps <code>UNKNOWN</code> at <strong>WARNING</strong> so it never escalates a healthy host to CRITICAL on its own."
|
||||
},
|
||||
"dashboardView": {
|
||||
"heading": "The dashboard view",
|
||||
"intro": "The Health Monitor lives inside the <strong>Overview</strong> tab. The header status pill (Healthy / Warning / Critical) opens a modal that splits findings into two lists:",
|
||||
"items": [
|
||||
"<strong>Active</strong> — every category with an unresolved finding. Each row expands to show the individual checks that produced the status, the raw <code>reason</code> string, the device or VM ID involved, and (for categories that link to a tab) a click-through into Storage / Network / VMs / Logs / Hardware to investigate.",
|
||||
"<strong>Dismissed</strong> — items previously acknowledged by the user that are still inside their suppression window. Each row shows how much of the suppression remains and the configured duration. When the window expires, the item disappears from this list; if the underlying condition is still present and the category supports re-firing, it re-appears in <em>Active</em>."
|
||||
],
|
||||
"pillTitle": "The pill mirrors the worst category",
|
||||
"pillBody": "The dashboard header colour is the highest severity across the ten categories: any CRITICAL → red, else any WARNING → yellow, else any INFO → blue, else green. The same logic drives the favicon dot and the PWA badge."
|
||||
},
|
||||
"dismiss": {
|
||||
"heading": "Dismissing alerts and the Suppression Duration",
|
||||
"intro": "Some events are noisy by nature — a <em>System Updates: pending updates available</em> stays true until you patch the host, and you don't want a notification every five minutes for a week. The Health Monitor solves this with two coupled mechanisms:",
|
||||
"step1": "<strong>Per-event Dismiss action</strong> in the modal. The Dismiss button opens a small dropdown with three options — <strong>24 hours</strong>, <strong>7 days</strong> or <strong>Permanently</strong> — letting you choose how long this specific alert stays silenced regardless of the category's default. Picking one calls <code>POST /api/health/acknowledge</code> with the <code>error_key</code> and the chosen <code>suppression_hours</code> (<code>-1</code> for permanent). The event moves to the Dismissed list with a timestamped <code>acknowledged_at</code>.",
|
||||
"dropdownImageAlt": "Dismiss dropdown on a Health Monitor alert — 24 hours, 7 days or Permanently",
|
||||
"dropdownImageCaption": "Per-event Dismiss dropdown. The chosen window applies to this single alert; if no per-event window is selected the category's default is used. Permanent dismisses are tagged with a distinct amber <em>Permanent</em> badge in the Dismissed list and never re-fire.",
|
||||
"step2": "<strong>Per-category Suppression Duration setting</strong>. From the Settings → Health Monitor card (or <code>POST /api/health/settings</code>), each of the ten categories has its own default window applied when a Dismiss is fired without a per-event choice:",
|
||||
"imageAlt": "Per-category Suppression Duration settings card in Settings → Health Monitor",
|
||||
"imageCaption": "Suppression Duration card — one dropdown per category. Pick a longer window for noisy events (e.g. pending updates) and shorter for ones you want to re-evaluate quickly. Active Suppressions are listed underneath (see below).",
|
||||
"outro": "While an event is suppressed, the scanner still runs and updates the row's <code>last_seen</code> timestamp, but no new notification is dispatched and the dashboard stays calm. When the window expires, the next cycle re-evaluates the condition and either re-fires fresh or, if the condition has cleared on its own, drops the row from the lists.",
|
||||
"activeSuppressionsTitle": "Reviewing and reverting dismisses — the Active Suppressions panel",
|
||||
"activeSuppressionsBody": "Every currently-silenced alert (time-limited and permanent) is listed under <strong>Settings → Health Monitor → Active Suppressions</strong>. Each row shows the alert identifier, category, severity, when it was dismissed and how much time is left, plus a <strong>Re-enable</strong> button that clears the acknowledgment so the alert can fire again on the next scan. Permanent dismisses can only be reverted from here; time-limited ones can also be force-revived without waiting for the countdown. The Re-enable action is gated by the Health Monitor <em>Edit</em> mode at the top of that card — toggle Edit, click Re-enable on each row you want to revive (queued rows show a green border and a strike-through), then click Save to commit. Cancel discards the queue.",
|
||||
"autoTitle": "Auto-suppression when you change the Duration",
|
||||
"autoBody": "Setting a category's Suppression Duration to anything other than the default 24 h has a second effect beyond user-initiated dismissals: <strong>future findings in that category enter the table already acknowledged</strong> with that duration. This is by design — if you've told the Monitor that you want disk-related events silenced for a week, brand-new disk findings honour that intent without you having to dismiss each one by hand. They appear directly in the Dismissed list with the configured remaining time. Categories left at 24 h are unaffected and behave the classic way (new findings land in Active until you act).",
|
||||
"tempTitle": "CPU temperature CRITICAL is the safety override",
|
||||
"tempBody": "One specific finding bypasses the suppression entirely: <strong>CPU temperature CRITICAL</strong>. If the sensor crosses the critical threshold, the alert re-fires regardless of any prior dismissal — a cooked CPU is a cooked CPU. This is the only built-in override of the dismiss model.",
|
||||
"nonDismissableTitle": "Findings that cannot be dismissed",
|
||||
"nonDismissableBody": "A handful of findings are flagged non-dismissable on purpose — they signal a condition where silencing the alert could cost data, hardware or connectivity. The Dismiss button is hidden for these rows; the alert clears only when the underlying condition recovers and the auto-resolve cleanup picks it up. Other findings (transient I/O events on a healthy disk, recovered states) are also marked non-dismissable but for the opposite reason: there's nothing to silence because the row is already informational and self-clearing.",
|
||||
"headerFinding": "Finding",
|
||||
"headerWhy": "Why it can't be dismissed",
|
||||
"rows": [
|
||||
{
|
||||
"finding": "CPU temperature warning / critical",
|
||||
"why": "Hardware risk — sustained over-temperature damages silicon. Silencing would let a cooking CPU run unnoticed."
|
||||
},
|
||||
{
|
||||
"finding": "Filesystem space critical (root mount)",
|
||||
"why": "Data loss risk — a full root prevents writes and corrupts state. The alert must remain visible until you free space."
|
||||
},
|
||||
{
|
||||
"finding": "ZFS pool DEGRADED / FAULTED",
|
||||
"why": "Data integrity risk — pool failure threatens every dataset on it. Silencing while the pool is unhealthy is never the right answer."
|
||||
},
|
||||
{
|
||||
"finding": "Disk I/O errors with SMART FAILED",
|
||||
"why": "Drive failure confirmed by SMART — masking hides real hardware dying. The alert stays until the device is replaced (or removed from the host)."
|
||||
},
|
||||
{
|
||||
"finding": "Network interface DOWN",
|
||||
"why": "Connectivity loss — bridges, bonds and physical interfaces with active traffic must stay visible. Silencing them would mask a remote-management outage."
|
||||
},
|
||||
{
|
||||
"finding": "I/O events on healthy disks (INFO)",
|
||||
"why": "Transient ATA / dmesg events on a disk whose SMART says OK — flagged INFO and self-clearing. Nothing to dismiss because the next cycle already removes them."
|
||||
}
|
||||
],
|
||||
"principle": "Everything else can be dismissed. The principle is: alerts that indicate \"real damage in progress\" or that have already self-resolved are kept off the dismiss path; alerts about sustained conditions you may want to acknowledge and re-check later (high CPU usage, pending updates, certificate near expiry, log warnings, VM startup hiccups, etc.) all expose the Dismiss button."
|
||||
},
|
||||
"autoresolve": {
|
||||
"heading": "Auto-resolution and cleanup",
|
||||
"intro": "Many alerts should clear themselves when the condition goes away — a VM that was failing to start and is now running, a disk that's no longer in the system, a temperature that dropped back to normal. A cleanup routine runs at the end of each five-minute cycle and applies these rules:",
|
||||
"headerTrigger": "Trigger",
|
||||
"headerAction": "Action",
|
||||
"rows": [
|
||||
{
|
||||
"trigger": "CPU usage back to normal range after a CPU-related warning.",
|
||||
"action": "Marked resolved. Drops out of the Active list."
|
||||
},
|
||||
{
|
||||
"trigger": "Memory pressure back below the warning threshold after an OOM / memory warning.",
|
||||
"action": "Marked resolved."
|
||||
},
|
||||
{
|
||||
"trigger": "VM / CT referenced by the error no longer exists (<code>qm status</code> / <code>pct status</code> non-zero).",
|
||||
"action": "Marked resolved as resource removed."
|
||||
},
|
||||
{
|
||||
"trigger": "Disk referenced by the error no longer present in <code>/dev/</code>.",
|
||||
"action": "Marked resolved as device removed. The permanent observation history is preserved (see next section)."
|
||||
},
|
||||
{
|
||||
"trigger": "Findings sourced from the journal (<code>logs</code> category, SMART entries, ATA / I/O errors) when their suppression window expires.",
|
||||
"action": "Removed cleanly. Each scan inspects fresh journal entries from that point forward; the same historic line in the journal is not re-emitted."
|
||||
},
|
||||
{
|
||||
"trigger": "Resolved errors older than seven days.",
|
||||
"action": "Deleted from the database to keep the table small. Notification history is independent and kept longer."
|
||||
}
|
||||
],
|
||||
"permanentTitle": "Permanent suppression is not the same as resolved",
|
||||
"permanentBody": "Setting a category's Suppression Duration to <code>-1</code> (<em>permanent</em>) silences future alerts for items you dismiss in that category — but it does not skip the auto-resolve check above. If the underlying condition disappears (resource deleted, threshold no longer breached), the item is still cleaned up automatically."
|
||||
},
|
||||
"observations": {
|
||||
"heading": "Disk observations — the permanent history",
|
||||
"intro": "Disk events are special. A SMART warning on <code>/dev/sdh</code> at 02:14 AM is something you want to remember even after the I/O storm subsided and the error auto-resolved — the disk has a track record now. For that purpose, the Health Monitor keeps a separate <strong>permanent</strong> table: <code>disk_observations</code>.",
|
||||
"headerProperty": "Property",
|
||||
"headerErrors": "<code>errors</code> table (Active)",
|
||||
"headerObs": "<code>disk_observations</code> table",
|
||||
"rows": [
|
||||
{
|
||||
"property": "Purpose",
|
||||
"errors": "Drives the <em>current</em> health view + notification dispatch.",
|
||||
"obs": "Permanent per-disk audit trail."
|
||||
},
|
||||
{
|
||||
"property": "Auto-resolve",
|
||||
"errors": "Yes — rows are cleared when the condition disappears.",
|
||||
"obs": "No — entries persist forever unless the user explicitly dismisses them."
|
||||
},
|
||||
{
|
||||
"property": "Dedup key",
|
||||
"errors": "<code>error_key</code> (e.g. <code>smart_sdh</code>).",
|
||||
"obs": "<code>(disk_registry_id, error_type, error_signature)</code> with stable signatures stripped of volatile data."
|
||||
},
|
||||
{
|
||||
"property": "Where shown",
|
||||
"errors": "Health Monitor modal (Active / Dismissed lists).",
|
||||
"obs": "Disk detail card in the <strong>Storage</strong> tab, with an \"X obs.\" badge per disk."
|
||||
},
|
||||
{
|
||||
"property": "What it records",
|
||||
"errors": "Whatever is currently failing.",
|
||||
"obs": "SMART warnings (sector issues / temperature / CRC / failed self-tests), I/O errors (ATA / NVMe / dm), filesystem errors, ZFS pool events."
|
||||
}
|
||||
],
|
||||
"outro": "Practical consequence: an alert can clear from the dashboard while the same incident is still recorded in the disk's history. When you click into a disk under Storage, the card shows the count of outstanding observations and a list with timestamps, severity and the original raw message — useful when you're deciding whether a drive needs replacement.",
|
||||
"renameTitle": "Cross-device renames are merged automatically",
|
||||
"renameBody": "Disks sometimes appear under transient names (<code>ata8</code>, <code>nvme0n1p3</code>) before getting a stable block-device name. The observation layer consolidates entries by serial number when known: if an event was first recorded as <code>ata8</code> and the same disk is later identified as <code>sdh</code>, the historic observations are reattached to <code>sdh</code> on the next cycle so the history isn't fragmented."
|
||||
},
|
||||
"notification": {
|
||||
"heading": "From a finding to a notification",
|
||||
"intro": "Every active error is also a candidate for the notification engine. The flow:",
|
||||
"items": [
|
||||
"The scanner records the finding with category + severity + structured details.",
|
||||
"If the event type is <strong>enabled</strong> in the global notification settings, and the channel hasn't silenced this category, an event is queued.",
|
||||
"The template engine renders a (title, body) pair from the structured details. If the AI rewriter is enabled, the same pair is also passed through the configured provider for a plain-language version.",
|
||||
"The channel implementation ships it: Telegram message, Discord embed, Gotify push or email. The dispatch outcome is stored in <code>notification_history</code>.",
|
||||
"If a dismiss arrives later, the suppression window kicks in and any further re-fires of the same <code>error_key</code> stay queue-side until the window closes."
|
||||
],
|
||||
"outro": "Channel configuration (Telegram bot token, webhook URLs, AI provider keys, per-event toggles, channel overrides) is documented in <notifLink>Notifications</notifLink> and <aiLink>AI Assistant</aiLink>."
|
||||
},
|
||||
"rest": {
|
||||
"heading": "REST endpoints",
|
||||
"intro": "Everything the modal does is callable from the API — handy for scripts, custom dashboards or your own chat-bot integration.",
|
||||
"headerEndpoint": "Endpoint",
|
||||
"headerMethod": "Method",
|
||||
"headerUse": "Use",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/health",
|
||||
"method": "GET",
|
||||
"use": "Small health probe — returns JSON with <code>status</code>, <code>timestamp</code> and <code>version</code>. Suitable for Uptime Kuma keyword checks; the receiver must send the bearer header."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/status",
|
||||
"method": "GET",
|
||||
"use": "Overall health verdict — single severity + summary string. Authenticated."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/details",
|
||||
"method": "GET",
|
||||
"use": "All ten categories with their per-category statuses and the structured payload that produced each one."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/full",
|
||||
"method": "GET",
|
||||
"use": "Full snapshot — categories + active errors + dismissed list + custom suppression settings. Backs the modal in one round-trip and uses a 6-min background cache for instant response."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/active-errors",
|
||||
"method": "GET",
|
||||
"use": "Just the Active list. Filterable by <code>?category=<name></code>."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/dismissed",
|
||||
"method": "GET",
|
||||
"use": "Just the Dismissed list, with remaining suppression hours."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/acknowledge",
|
||||
"method": "POST",
|
||||
"use": "Body: <code>'{'\"error_key\":\"smart_sdh\"'}'</code>. Dismiss an alert with the category's configured window."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/settings",
|
||||
"method": "GET / POST",
|
||||
"use": "Read or write the per-category Suppression Duration values."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/health/cleanup-orphans",
|
||||
"method": "POST",
|
||||
"use": "Manual cleanup of errors whose underlying device / VM is gone. Idempotent."
|
||||
}
|
||||
],
|
||||
"codeComment1": "# Snapshot the current health for a script",
|
||||
"codeComment2": "# Dismiss a specific error",
|
||||
"codeComment3": "# Set the disks-category suppression to a week"
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Dashboard",
|
||||
"href": "/docs/monitor/dashboard",
|
||||
"tail": " — where the Health Monitor modal is opened from in the UI."
|
||||
},
|
||||
{
|
||||
"label": "Notifications",
|
||||
"href": "/docs/monitor/notifications",
|
||||
"tail": " — channels, per-event toggles, the AI rewrite hook, history."
|
||||
},
|
||||
{
|
||||
"label": "AI Assistant",
|
||||
"href": "/docs/monitor/ai-assistant",
|
||||
"tail": " — provider configuration (OpenAI / Anthropic / Gemini / Groq / Ollama / OpenRouter), prompt mode, per-channel detail level, language."
|
||||
},
|
||||
{
|
||||
"label": "Architecture",
|
||||
"href": "/docs/monitor/architecture",
|
||||
"tailRich": " — the SQLite schema (<code>errors</code>, <code>disk_observations</code>, <code>events</code>) and the background-thread cadence."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,151 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "ProxMenux Monitor — Self-hosted Web Dashboard for Proxmox VE | ProxMenux",
|
||||
"description": "ProxMenux Monitor is a self-hosted web dashboard for Proxmox VE: real-time host metrics, storage and SMART data, network, VMs and containers, hardware, logs, an integrated web terminal, a proactive Health Monitor, notifications to Telegram / Discord / Email, an optional AI assistant, a REST API and integrations with tools like Homepage and Home Assistant.",
|
||||
"ogTitle": "ProxMenux Monitor — Self-hosted Web Dashboard for Proxmox VE",
|
||||
"ogDescription": "Real-time Proxmox VE dashboard: host metrics, storage SMART, network, VMs and containers, hardware, logs, web terminal, Health Monitor, notifications, AI assistant, REST API.",
|
||||
"twitterTitle": "ProxMenux Monitor | ProxMenux",
|
||||
"twitterDescription": "Self-hosted Proxmox VE dashboard with Health Monitor, notifications, AI assistant and REST API."
|
||||
},
|
||||
"header": {
|
||||
"title": "ProxMenux Monitor",
|
||||
"description": "A self-hosted web dashboard for Proxmox VE shipped as an AppImage. It runs on the host as a single systemd service, listens on TCP 8008, and serves both the API and the UI from one process.",
|
||||
"section": "ProxMenux Monitor"
|
||||
},
|
||||
"atGlance": {
|
||||
"title": "At a glance",
|
||||
"body": "Single AppImage on the Proxmox host → Flask backend (port 8008) collecting live data via <code>psutil</code>, <code>pvesh</code>, <code>smartctl</code>, <code>journalctl</code> → Next.js dashboard served from the same process. Optional auth (password + 2FA), optional AI assistant, optional notifications, REST API for integrations."
|
||||
},
|
||||
"hero": {
|
||||
"alt": "ProxMenux Monitor dashboard — system overview screen with CPU, memory, temperature and uptime widgets",
|
||||
"caption": "Default landing screen — host-level metrics and health state at a glance."
|
||||
},
|
||||
"coverage": {
|
||||
"heading": "What the dashboard covers",
|
||||
"intro": "Eight first-class sections, each backed by its own API endpoints:",
|
||||
"tableSection": "Section",
|
||||
"tableWhat": "What it shows",
|
||||
"sections": [
|
||||
{
|
||||
"name": "Health Monitor",
|
||||
"description": "Active and dismissed alerts across CPU, memory, storage, disks, network, services, logs, VMs, updates and security. Drives the notification engine."
|
||||
},
|
||||
{
|
||||
"name": "Storage",
|
||||
"description": "Proxmox storage pools, physical disks (SATA / NVMe / USB), SMART attributes, ZFS pool status, wear & lifetime, I/O activity."
|
||||
},
|
||||
{
|
||||
"name": "Network",
|
||||
"description": "All interfaces (physical, bonds, bridges, OVS), IP/MAC/state, real-time RX/TX graphs, historical RRD data per interface."
|
||||
},
|
||||
{
|
||||
"name": "VMs & Containers",
|
||||
"description": "Inventory of all VMs and LXCs with status, resources and uptime. Drill-in shows config, historical metrics, full guest logs and start/stop/reboot/shutdown actions."
|
||||
},
|
||||
{
|
||||
"name": "Hardware",
|
||||
"description": "CPU model and topology, memory layout, PCIe devices, GPU list with driver and per-slot real-time monitoring (NVIDIA / Intel iGPU)."
|
||||
},
|
||||
{
|
||||
"name": "Logs & Events",
|
||||
"description": "Live <code>journalctl</code> with severity / time-range / keyword filters, Proxmox task history, notification log, downloadable log bundles."
|
||||
},
|
||||
{
|
||||
"name": "Terminal",
|
||||
"description": "Browser shell to the host or to any VM/CT, powered by <code>xterm.js</code> over WebSockets. Authenticated and audited like the rest of the API."
|
||||
},
|
||||
{
|
||||
"name": "Security",
|
||||
"description": "Authentication failures, Fail2Ban jail status, recent ban events, integration with the host's <code>[proxmenux]</code> jail."
|
||||
}
|
||||
],
|
||||
"footer": "Every section has a dedicated documentation page under <link>Dashboard</link> in the sidebar."
|
||||
},
|
||||
"howItRuns": {
|
||||
"heading": "How it runs",
|
||||
"intro": "ProxMenux Monitor ships as a self-contained AppImage. A single systemd unit (<code>proxmenux-monitor.service</code>) starts a Flask process that:",
|
||||
"bullets": [
|
||||
"Listens on <strong>TCP 8008</strong> on the host (HTTP).",
|
||||
"Serves the Next.js dashboard as static assets under <code>/</code> and the API under <code>/api/*</code> from the same process.",
|
||||
"Pulls live data with standard host tools: <code>psutil</code>, <code>pvesh</code>, <code>smartctl</code>, <code>journalctl</code>, <code>zpool</code>, <code>ip</code>, <code>nvidia-smi</code>, etc.",
|
||||
"Persists its own state in a local SQLite database (<code>/usr/local/share/proxmenux/health_monitor.db</code>): dismissed alerts, disk observations, notification config, AI config. Authentication state lives separately in <code>/root/.config/proxmenux-monitor/auth.json</code>."
|
||||
],
|
||||
"footer": "The full request flow, file layout and the systemd integration are described in <link>Architecture</link>."
|
||||
},
|
||||
"noAgent": {
|
||||
"title": "No agent on the guests",
|
||||
"body": "The Monitor reads everything from the host. VMs and CTs do not need any agent installed — guest data comes from the Proxmox API and from the host's own kernel-level visibility into the running guests."
|
||||
},
|
||||
"access": {
|
||||
"heading": "Accessing the dashboard",
|
||||
"intro": "Two access patterns are supported and the application detects which one is in use:",
|
||||
"codeComment1": "# 1) Direct access on the host",
|
||||
"codeComment2": "# 2) Via reverse proxy (Nginx / Caddy / Traefik)",
|
||||
"afterCode": "When fronted by a reverse proxy, the Monitor honours <code>X-Forwarded-For</code>, <code>X-Forwarded-Proto</code> and <code>X-Forwarded-Host</code> so URLs and CORS behave correctly without manual configuration.",
|
||||
"footer": "First-launch setup, password + TOTP 2FA, and reverse-proxy snippets are covered in <link>Access & Authentication</link>."
|
||||
},
|
||||
"mobile": {
|
||||
"heading": "Mobile use and home-screen install",
|
||||
"intro": "The dashboard is responsive and ships as a Progressive Web App. The packaged <code>public/manifest.json</code> declares <code>display: standalone</code> with an app name, icon and theme colour, so adding the URL to the home screen produces a real standalone launcher — no browser address bar, custom splash, dark theme matched to the dashboard.",
|
||||
"phoneAlt": "ProxMenux Monitor running on a phone — main dashboard view",
|
||||
"phoneCaption": "Main dashboard on a phone — the layout reflows for small viewports.",
|
||||
"addHeading": "Add to home screen",
|
||||
"iosLabel": "iOS Safari:",
|
||||
"iosBody": "share button → <em>Add to Home Screen</em>. The icon comes from <code>/apple-touch-icon.png</code> shipped in the AppImage.",
|
||||
"androidLabel": "Android Chrome / Edge:",
|
||||
"androidBody": "three-dot menu → <em>Install app</em> (or <em>Add to Home screen</em> on older versions).",
|
||||
"afterInstall": "Once installed, opening the icon launches the dashboard in standalone mode with its own task switcher entry.",
|
||||
"onlineOnlyTitle": "Online-only",
|
||||
"onlineOnlyBody": "The PWA is installable but it is <strong>not</strong> offline-capable — there is no service worker. The launcher behaves like a native app, but the device still needs to reach the host on TCP 8008 (LAN, VPN or reverse-proxied HTTPS) for the dashboard to load."
|
||||
},
|
||||
"health": {
|
||||
"heading": "The Health Monitor and notifications",
|
||||
"alt": "Health Monitor screen showing the 10 categories tracked (CPU, memory, storage, disks, network, services, logs, VMs, updates, security) with current status",
|
||||
"caption": "Health Monitor view — the 10 categories tracked, with their current status. Active and dismissed alerts appear here when the system raises any.",
|
||||
"body1": "Inside the dashboard, the <strong>Health Monitor</strong> runs continuously in the background and produces a structured stream of events: high CPU temperature, disk SMART warnings, ZFS pool degradation, OOM kills, VM/CT failures, security incidents, and so on. Each event has a category, a severity (INFO / WARNING / CRITICAL) and a stable <code>error_key</code> so duplicates collapse instead of flooding the screen.",
|
||||
"feedsIntro": "Events feed three things at the same time:",
|
||||
"feedsHealth": "The <strong>Health Monitor view</strong> in the dashboard (active + dismissed lists).",
|
||||
"feedsChannels": "The <strong>notification engine</strong> — Telegram, Discord, Email, Gotify and Apprise (multi-channel). Each channel is configured independently and per-event categories can be silenced.",
|
||||
"feedsAI": "The optional <strong>AI assistant</strong> — when enabled, the configured provider (OpenAI, Anthropic, Gemini, Groq, Ollama or OpenRouter) explains incoming events in plain language and, if enabled in the AI settings, proposes next steps.",
|
||||
"suppressionTitle": "Suppression instead of mute-all",
|
||||
"suppressionBody": "Each category has its own <em>Suppression Duration</em>: once you dismiss an alert, the same alert is silenced for that window (default 24 hours, configurable per category up to permanent). Real escalations — e.g. CPU temperature crossing the critical threshold — always re-trigger regardless of suppression."
|
||||
},
|
||||
"api": {
|
||||
"heading": "REST API and integrations",
|
||||
"intro": "Everything the UI shows is available as JSON over HTTP/HTTPS. The same endpoints power Homepage widgets, Home Assistant sensors, Grafana dashboards (via the Prometheus exporter at <code>/api/prometheus</code>), Uptime Kuma probes and any custom script that speaks <code>curl</code>.",
|
||||
"tokens": "Long-lived API tokens (365 days) are generated from <strong>Settings → API Access Tokens</strong> or via <code>POST /api/auth/generate-api-token</code>.",
|
||||
"bearer": "Tokens travel as <code>Authorization: Bearer …</code>. Public endpoints (<code>/api/health</code>, <code>/api/auth/*</code>) work without a token so external uptime probes can hit the host without handing out credentials.",
|
||||
"catalog": "The full endpoint catalog, token rotation guidance and security best-practices live in <linkApi>API Reference</linkApi>; ready-made examples for Homepage, Home Assistant, Grafana, Uptime Kuma and a generic cURL pattern are in <linkIntegrations>Integrations</linkIntegrations>."
|
||||
},
|
||||
"serviceControl": {
|
||||
"heading": "Service control",
|
||||
"intro": "Day-to-day, the Monitor is managed exactly like any other systemd service. It is also exposed as two entries inside the ProxMenux TUI under <em>Settings</em>:",
|
||||
"codeComment": "# Manual control",
|
||||
"footer": "See <link>Settings → ProxMenux Monitor</link> for the in-menu toggle and status verification flow."
|
||||
},
|
||||
"nextSteps": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "Architecture",
|
||||
"description": "— Flask backend, systemd unit, SQLite schema, AI providers, notification channels."
|
||||
},
|
||||
{
|
||||
"label": "Access & Authentication",
|
||||
"description": "— first launch, password setup, TOTP 2FA, reverse-proxy configuration, Fail2Ban integration."
|
||||
},
|
||||
{
|
||||
"label": "Dashboard",
|
||||
"description": "— every section of the UI, one page each."
|
||||
},
|
||||
{
|
||||
"label": "API Reference",
|
||||
"description": "— every endpoint, request / response shape and token management."
|
||||
},
|
||||
{
|
||||
"label": "Integrations",
|
||||
"description": "— Homepage, Home Assistant, Grafana / Prometheus, Uptime Kuma, generic cURL pattern."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,255 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "Proxmox Integrations — Homepage, Home Assistant, Grafana, Prometheus | ProxMenux Monitor",
|
||||
"description": "Copy-paste recipes for connecting ProxMenux Monitor to your homelab dashboards: Homepage, Home Assistant, Grafana via Prometheus, Uptime Kuma. Each recipe with the exact config the dashboards expect, the API endpoints used, and the auth header pattern.",
|
||||
"ogTitle": "Proxmox Integrations — Homepage, Home Assistant, Grafana, Prometheus",
|
||||
"ogDescription": "Cookbook for connecting ProxMenux Monitor to Homepage, Home Assistant, Grafana, Prometheus and Uptime Kuma.",
|
||||
"twitterTitle": "Proxmox Integrations | ProxMenux Monitor",
|
||||
"twitterDescription": "Recipes for Homepage, Home Assistant, Grafana, Prometheus and Uptime Kuma."
|
||||
},
|
||||
"header": {
|
||||
"title": "Integrations",
|
||||
"description": "Copy-paste recipes for plugging ProxMenux Monitor into the dashboards and tools your homelab already uses — Homepage, Home Assistant, Grafana via Prometheus, Uptime Kuma. Each recipe shows the exact config the receiving tool expects, the Monitor endpoint it talks to, and the auth header pattern that holds it together.",
|
||||
"section": "ProxMenux Monitor"
|
||||
},
|
||||
"intro": {
|
||||
"title": "What you can build from this page",
|
||||
"body": "Every recipe below is ready to copy into your tool of choice. The API endpoints used here are documented in the <link>API Reference</link> — this page is the \"here's how to actually use them in Homepage / Home Assistant / Grafana\" companion. Screenshots show the real output you'll see once the recipe is in place."
|
||||
},
|
||||
"auth": {
|
||||
"heading": "Authentication: the one thing every recipe needs",
|
||||
"intro": "Most endpoints used by these integrations are authenticated. You have two ways to satisfy that requirement.",
|
||||
"optAtitle": "Option A — API token (recommended for integrations)",
|
||||
"optAbody1": "Open the dashboard and go to <strong>Settings → Security → API Tokens</strong>. Click <em>Generate token</em>, give it a name (e.g. <em>homepage</em>, <em>home-assistant</em>, <em>prometheus</em>), and copy the token — it's shown <em>once</em>. Long-lived (one-year expiry by default), individually revocable, and what you should be using for any non-browser client.",
|
||||
"optAbody2": "From that point every request is just:",
|
||||
"optBtitle": "Option B — login flow (username + password)",
|
||||
"optBbody": "Useful for scripts that authenticate as a human user. The returned token is short-lived; most integrations should prefer Option A.",
|
||||
"outro": "The TOTP field is only required when 2FA is enabled on the account. Token rotation, revocation, password policy and the audit log live in <link>Access & Authentication</link>.",
|
||||
"httpsTitle": "Using HTTPS instead of HTTP",
|
||||
"httpsIntro": "Every recipe below uses <code>http://</code> in the URLs to keep the examples short. If you've enabled TLS on the Monitor (<strong>Settings → Security → SSL/HTTPS</strong>), swap <code>http://</code> for <code>https://</code> in every URL — that's the only change. Two notes specific to certain tools:",
|
||||
"httpsItems": [
|
||||
"<strong>Self-signed certificates.</strong> Home Assistant's <code>rest:</code> integration verifies TLS by default. If the Monitor is using its own self-signed cert, add <code>verify_ssl: false</code> to each REST block (alongside <code>scan_interval:</code>), or import the Monitor's CA into HA's trust store. Same for any tool that refuses untrusted certs.",
|
||||
"<strong>Prometheus</strong> already has <code>scheme: https</code> ready in the scrape config below; uncomment / leave it as <code>https</code> if TLS is enabled on the Monitor."
|
||||
]
|
||||
},
|
||||
"homepage": {
|
||||
"heading": "Homepage",
|
||||
"headingHref": "https://gethomepage.dev",
|
||||
"intro": "Homepage is a fully static, customizable application dashboard. ProxMenux Monitor plugs into it via the built-in <code>customapi</code> widget — paste a service entry into <code>services.yaml</code>, restart Homepage, and the card appears with live numbers.",
|
||||
"iconCalloutTitle": "The official ProxMenux logo is on dashboardicons.com",
|
||||
"iconCalloutBody": "The recipes below use <code>icon: proxmenux.png</code>. Homepage automatically resolves bare filenames against <a1>dashboardicons.com</a1> — a curated icon library for self-hosted dashboards. The ProxMenux entry lives at <a2>dashboardicons.com/icons/external/proxmenux</a2> and Homepage pulls it on first render. Same lookup works for thousands of other tools (Telegram, Discord, Grafana, Tailscale, etc.) — just write <code>icon: <name>.png</code> in any service entry.",
|
||||
"imageAlt": "Homepage dashboard showing three ProxMenux Monitor cards (EDGE, VOID, DREAM) with uptime, CPU, RAM and temperature for each Proxmox host",
|
||||
"imageCaption": "Three ProxMenux Monitor instances rendered as Homepage cards — uptime, CPU, RAM and CPU temperature read live from <code>/api/system</code> on each host every 10 s.",
|
||||
"basicTitle": "Basic widget — no authentication",
|
||||
"basicIntro": "Use this when ProxMenux Monitor is on a trusted network and you haven't enabled authentication on the Monitor side yet. The simplest possible <code>services.yaml</code> entry:",
|
||||
"authedTitle": "Authenticated widget",
|
||||
"authedIntro": "Generate an API token in <strong>Settings → Security → API Tokens</strong> on the Monitor, copy it, and paste it into the <code>Authorization</code> header below — replace the example token shown after <code>Bearer</code> with the one you just copied:",
|
||||
"authedOutro": "Restart Homepage and the card lights up with live values. Reuse the same token across all Homepage widgets pointing at the same ProxMenux Monitor host.",
|
||||
"multiTitle": "Multi-widget setup — system, storage, network",
|
||||
"multiIntro": "For a richer view, render three separate cards backed by different endpoints — one for system metrics, one for storage, one for network. Use the same token in every card; it's the same Monitor instance.",
|
||||
"multiCalloutTitle": "Multiple Proxmox hosts",
|
||||
"multiCalloutBody": "Repeat the entry block per host to get the multi-card layout in the screenshot above — each entry points at the <code>http://<host>:8008</code> URL of its own ProxMenux Monitor instance. The token can be different per host (one secret entry per host) or shared, depending on how you generate them."
|
||||
},
|
||||
"homeAssistant": {
|
||||
"heading": "Home Assistant",
|
||||
"headingHref": "https://www.home-assistant.io",
|
||||
"intro": "There is no native HACS integration for ProxMenux Monitor (yet) — but you don't need one. The built-in <code>rest</code> integration in Home Assistant can pull every endpoint documented in the <link>API Reference</link> and turn the responses into sensors, attributes and triggers. The complete reference build below exposes ~25 sensors covering system resources, the Health Monitor, VMs / CTs, storage, network, gateway latency and ProxMenux update status — drop the YAML into <code>configuration.yaml</code>, restart, and you have a full Proxmox observability layer inside HA.",
|
||||
"imageAlt": "Home Assistant dashboard showing ProxMenux Monitor entities — health status badge, CPU / RAM / Temp gauges, VM count, storage usage and active errors counter",
|
||||
"imageCaption": "ProxMenux Monitor as a first-class Home Assistant integration — sensors built from the YAML recipe below.",
|
||||
"step1Title": "1 · Store the API token",
|
||||
"step1Body": "Drop the token into Home Assistant's <code>secrets.yaml</code> so it never leaks into a config dump. The whole bearer prefix goes in one line — that lets the YAML reference it directly as a header value. Filename and location depend on your HA install (typically <code>/config/secrets.yaml</code> for HA OS / Container).",
|
||||
"step2Title": "2 · Drop in the REST configuration",
|
||||
"step2Body": "Six REST blocks cover the full surface — one per major Monitor area. Each block has a sensible <code>scan_interval</code> tuned to how often the underlying data changes (system resources every 30 s, health every 60 s, slowly-changing inventories every 5-10 min). Paste into <code>configuration.yaml</code>:",
|
||||
"step3Title": "3 · Add binary sensors and template helpers",
|
||||
"step3Body": "Two binary sensors and a couple of template sensors round out the integration — they make automations and Lovelace conditional cards much cleaner than chaining Jinja in every place.",
|
||||
"step4Title": "4 · Reload & verify",
|
||||
"step4Body": "From the HA UI: <em>Developer Tools → YAML → Check Configuration</em> first to validate the syntax, then <em>All YAML configuration</em> reload (or full restart). After it comes back, filter <em>Settings → Devices & Services → Entities</em> by <em>proxmenux</em> — you should see all ~25 entities populating within one scan interval.",
|
||||
"replaceTitle": "Replacing an earlier version of this recipe?",
|
||||
"replaceBody": "If you tried a previous version of these YAML blocks, Home Assistant's entity registry may have cached the old entity IDs and stale entities will still appear (with <em>Entidad no encontrada</em> warnings in your Lovelace cards). Clean state in two steps: delete the previous <code>rest:</code> and <code>template:</code> blocks from <code>configuration.yaml</code>, reload, and then under <em>Settings → Devices & Services → Entities</em> filter by <em>proxmenux</em> and remove any entries marked \"Restored\" or showing as unavailable. Then paste the current YAML and reload again — the new entities register cleanly.",
|
||||
"step5Title": "5 · Lovelace dashboard",
|
||||
"step5Body": "The YAML below is a single <strong>vertical-stack</strong> card that combines all the sub-cards in one block — header with logo, quick KPIs, system detail, VMs, storage, network and a conditional health-issues card. To use it: open your dashboard, click the pencil (edit), click <em>Add card</em>, scroll to the bottom and pick <em>Manual</em>, then paste:",
|
||||
"viewTipTitle": "Want it as a full dashboard view instead of a single card?",
|
||||
"viewTipBody": "Open the dashboard's 3-dot menu → <em>Raw configuration editor</em> and add a new view with this header (above the <code>cards:</code> list from the YAML above):",
|
||||
"viewTipOutro": "That gives you a dedicated tab/view in your dashboard with its own icon and title, instead of one long card on an existing view.",
|
||||
"altViewTitle": "Alternative — a dedicated dashboard view",
|
||||
"altViewIntro": "If you'd rather have a full dedicated page (its own tab in the dashboard sidebar) than a single card inside an existing view, Home Assistant lets you create a new view directly with YAML. Steps:",
|
||||
"altViewSteps": [
|
||||
"Open the dashboard where you want the new tab.",
|
||||
"Click the pencil (edit dashboard) at the top right.",
|
||||
"Click the <em>+</em> tab at the end of the existing tabs to create a new view.",
|
||||
"In the dialog that opens, switch to the <em>Code editor</em> tab (top right of the dialog — toggles between visual editor and YAML).",
|
||||
"Paste the YAML below.",
|
||||
"Save. The new <em>ProxMenux Monitor</em> tab appears in the sidebar with all cards rendered."
|
||||
],
|
||||
"twoEditorsTitle": "Two YAML editors in HA — pick the right one",
|
||||
"twoEditorsIntro": "Home Assistant has two YAML editors that look similar but expect different formats:",
|
||||
"twoEditorsItems": [
|
||||
"<strong>Single-view editor</strong> (this recipe) — opened from the <em>+</em> tab or from <em>Edit view → Code editor</em>. Expects the body of one view directly: <code>title:</code>, <code>path:</code>, <code>cards:</code> at the top level, no leading dash.",
|
||||
"<strong>Whole-dashboard Raw editor</strong> — opened from the dashboard's 3-dot menu. Expects the entire <code>views:</code> list, with each view as a list item (leading <code>-</code>)."
|
||||
],
|
||||
"twoEditorsOutro": "Pasting view-body YAML into the whole-dashboard editor (or vice versa) leaves you with a <em>Vista sin nombre</em> and <code>cards: []</code>. The YAML below is for the single-view editor — paste exactly as shown.",
|
||||
"viewImageAlt": "Home Assistant dedicated view rendering ProxMenux Monitor — picture-entity header, glance KPIs, and System / VMs / Storage / Network entity cards laid out automatically by HA across multiple columns",
|
||||
"viewImageCaption": "The dedicated <em>ProxMenux Monitor</em> view as Home Assistant renders it on a wide screen — HA's default layout splits the cards into multiple columns automatically.",
|
||||
"twoColTipTitle": "Want a fixed two-column layout instead of the auto layout?",
|
||||
"twoColTipBody": "Replace any pair of cards (e.g. <em>System</em> + <em>VMs</em>, or <em>Storage</em> + <em>Network</em>) with a single <code>horizontal-stack</code> wrapping both, so they always render side by side regardless of screen width:",
|
||||
"twoColTipOutro": "On mobile the row stays compressed; HA's auto layout (no horizontal-stack) reflows better at narrow widths.",
|
||||
"step6Title": "6 · Automations",
|
||||
"step6Body": "Three automations that cover the most common reactive scenarios — replace <code>notify.mobile_app_<your_phone></code> with whichever notify service you use:",
|
||||
"logoTitle": "About the ProxMenux logo",
|
||||
"logoBody": "The picture-entity card at the top of the Lovelace YAML pulls the official ProxMenux logo from <a1>dashboardicons.com</a1> — a free icon library curated for self-hosted dashboards. The ProxMenux entry lives at <a2>dashboardicons.com/icons/external/proxmenux</a2>. Home Assistant fetches the SVG over HTTPS on first render and caches it.",
|
||||
"logoBrokenTitle": "If the logo card shows a broken image",
|
||||
"logoBrokenIntro": "Some HA installs (firewalled networks, content blockers, hosts without public internet) can't reach jsdelivr.net at render time. The fix is a local copy:",
|
||||
"logoBrokenSteps": [
|
||||
"Download the SVG from <a>cdn.jsdelivr.net/gh/homarr-labs/dashboard-icons/svg/proxmenux.svg</a>.",
|
||||
"Save it to <code>/config/www/icons/proxmenux.svg</code> on your HA host.",
|
||||
"In the Lovelace YAML, replace the <code>image:</code> URL with <code>/local/icons/proxmenux.svg</code>. Save and reload — the image renders from the local file, no internet needed."
|
||||
],
|
||||
"scanTipTitle": "scan_interval rule of thumb",
|
||||
"scanTipBody": "<code>/api/system</code> is cheap to call — 30 s is fine. <code>/api/health/full</code> uses an internal 6-min cache, so polling it more often than ~60 s gains you nothing. <code>/api/storage/summary</code> changes slowly — every 5 min is plenty. <code>/api/proxmenux/update-status</code> only matters once an hour. Tune to your hardware budget if you have many sensors across many hosts."
|
||||
},
|
||||
"grafana": {
|
||||
"heading": "Prometheus + Grafana",
|
||||
"promHref": "https://prometheus.io",
|
||||
"grafanaHref": "https://grafana.com",
|
||||
"intro": "ProxMenux Monitor exposes a Prometheus-format scrape endpoint at <code>GET /api/prometheus</code> (authenticated) returning OpenMetrics text. Wire it into Prometheus, then build a Grafana dashboard on top — same data the dashboard UI shows, in the format your TSDB expects.",
|
||||
"imageAlt": "Grafana dashboard rendering ProxMenux Monitor metrics — CPU usage gauge, memory usage timeseries, running VMs count, network throughput",
|
||||
"imageCaption": "A basic Grafana dashboard built from the ProxMenux Prometheus scrape — CPU, memory, running VMs, network throughput. The full metric catalogue lives in the <link>API Reference → Prometheus metrics</link>.",
|
||||
"step1Title": "1 · Add the scrape job to Prometheus",
|
||||
"step1Body": "Pass the API token via Prometheus' native <code>authorization</code> block (cleaner than custom headers and works with secret stores):",
|
||||
"step1After": "Reload Prometheus (<code>kill -HUP</code> or <code>systemctl reload prometheus</code>, or <code>docker compose restart prometheus</code> if you run it as a container) and check <em>Status → Targets</em> — the proxmenux job should turn green within one scrape interval. Each metric carries a <code>node=\"<hostname>\"</code> label so you can distinguish hosts in queries.",
|
||||
"tokenTipTitle": "Token via file or env, not inline",
|
||||
"tokenTipBody": "For production deployments avoid inlining the token. Prometheus supports <code>credentials_file: /etc/prometheus/secrets/proxmenux.token</code> as an alternative — keep the token in a 0600 file and let Prometheus read it.",
|
||||
"step2Title": "2 · Verify the scrape with a couple of queries",
|
||||
"step2Body": "Before configuring Grafana, confirm Prometheus actually has the data. Open Prometheus' own UI at <code>http://<prometheus-host>:9090</code>, click <em>Query</em> and run any of these — you should get back live numbers from your Proxmox host:",
|
||||
"headerQuery": "Query",
|
||||
"headerConfirms": "What it confirms",
|
||||
"verifyRows": [
|
||||
{
|
||||
"query": "up{job=\"proxmenux\"}",
|
||||
"confirms": "Returns <code>1</code> if Prometheus is successfully scraping the Monitor, <code>0</code> if not. The fastest sanity check."
|
||||
},
|
||||
{
|
||||
"query": "proxmox_cpu_usage",
|
||||
"confirms": "Current CPU usage % of the Proxmox host. Should change if you refresh the query a few seconds apart."
|
||||
},
|
||||
{
|
||||
"query": "proxmox_vms_running",
|
||||
"confirms": "Number of running guests. Compare against what you see in the Proxmox UI."
|
||||
},
|
||||
{
|
||||
"query": "proxmox_uptime_seconds / 86400",
|
||||
"confirms": "Host uptime in days. Should match the value you'd see in <code>uptime</code> on the Proxmox shell."
|
||||
}
|
||||
],
|
||||
"calloutTitle": "The 401 you may see when clicking the endpoint URL is fine",
|
||||
"calloutBody": "On the <em>Status → Targets</em> page, clicking the endpoint link (<code>/api/prometheus</code>) makes your browser fetch it directly — without the bearer header that Prometheus uses for its own scrapes. So you'll see <code>'{'\"error\":\"Authentication required\"'}'</code>. That confirms the API is properly protected; Prometheus itself authenticates correctly because it has the token from the scrape config. Trust the green <em>State: UP</em>, not the click-through.",
|
||||
"step3Title": "3 · Add Prometheus as a Grafana data source",
|
||||
"step3Body": "In Grafana: <em>Connections → Data sources → Add new → Prometheus</em>. Set the URL to your Prometheus instance (e.g. <code>http://prometheus.lan:9090</code>), save and test. No extra auth needed at this layer — Prometheus has already authenticated to ProxMenux.",
|
||||
"step4Title": "4 · Build panels with these PromQL queries",
|
||||
"step4Body": "A starter set that maps directly to what users typically watch on a Proxmox host:",
|
||||
"headerPanel": "Panel idea",
|
||||
"headerPromql": "PromQL query",
|
||||
"panelRows": [
|
||||
{
|
||||
"panel": "CPU usage gauge per host",
|
||||
"promql": "proxmox_cpu_usage"
|
||||
},
|
||||
{
|
||||
"panel": "Memory usage gauge per host",
|
||||
"promql": "proxmox_memory_usage_percent"
|
||||
},
|
||||
{
|
||||
"panel": "Memory used vs total (timeseries)",
|
||||
"promql": "proxmox_memory_used_bytes / 1024 / 1024 / 1024"
|
||||
},
|
||||
{
|
||||
"panel": "Running VMs / CTs per host",
|
||||
"promql": "proxmox_vms_running"
|
||||
},
|
||||
{
|
||||
"panel": "CPU temperature",
|
||||
"promql": "proxmox_cpu_temperature_celsius"
|
||||
},
|
||||
{
|
||||
"panel": "Network throughput RX (bytes/s)",
|
||||
"promql": "rate(proxmox_interface_bytes_received_total[5m])"
|
||||
},
|
||||
{
|
||||
"panel": "Network throughput TX (bytes/s)",
|
||||
"promql": "rate(proxmox_interface_bytes_sent_total[5m])"
|
||||
},
|
||||
{
|
||||
"panel": "Load average (1m)",
|
||||
"promql": "proxmox_load_average{period=\"1m\"}"
|
||||
},
|
||||
{
|
||||
"panel": "Disk space used % per mountpoint",
|
||||
"promql": "proxmox_disk_usage_percent"
|
||||
},
|
||||
{
|
||||
"panel": "UPS battery charge",
|
||||
"promql": "proxmox_ups_battery_charge_percent"
|
||||
},
|
||||
{
|
||||
"panel": "GPU temperature per slot",
|
||||
"promql": "proxmox_gpu_temperature_celsius"
|
||||
}
|
||||
],
|
||||
"outro": "Add each query as a Grafana panel, set the right visualization (<em>Stat</em> for gauges, <em>Time series</em> for trends), and group panels into rows by category. Use the <code>node</code> label as a dashboard variable (<em>Settings → Variables → New → Query → label_values(proxmox_cpu_usage, node)</em>) to filter all panels by host."
|
||||
},
|
||||
"uptimeKuma": {
|
||||
"heading": "Uptime Kuma and other status checkers",
|
||||
"href": "https://github.com/louislam/uptime-kuma",
|
||||
"intro": "For external probes, use <code>GET /api/system-info</code> — it is the one endpoint that works without a token, returning a small JSON payload with hostname, uptime and the overall health status (mapped to <code>healthy</code> / <code>warning</code> / <code>critical</code>). That's exactly what a keyword-based monitor needs.",
|
||||
"kumaTitle": "Uptime Kuma — HTTP keyword monitor",
|
||||
"kumaSteps": [
|
||||
"In Uptime Kuma, click <em>+ Add New Monitor</em>.",
|
||||
"Monitor Type: <em>HTTP(s) - Keyword</em>.",
|
||||
"Friendly Name: <em>ProxMenux Monitor — pve01</em>.",
|
||||
"URL: <code>http://pve01.lan:8008/api/system-info</code>.",
|
||||
"Keyword: <code>healthy</code> (the value of <code>health.status</code> when the host is OK).",
|
||||
"Heartbeat Interval: 60 seconds is enough.",
|
||||
"Save. No headers needed — the endpoint is public."
|
||||
],
|
||||
"healthchecksTitle": "healthchecks.io / cron-style pings",
|
||||
"healthchecksBody": "Same endpoint, same shape — point your cron-style ping at <code>/api/system-info</code> and assert <code>.health.status == \"healthy\"</code>. Most of these services accept a 2xx HTTP status as the \"up\" signal too, in which case even a curl without parsing is enough.",
|
||||
"richTitle": "Want richer health data",
|
||||
"richBody": "For the full state (the ten Health Monitor categories + active errors + dismissed list), use <code>GET /api/health/full</code> instead — that one needs an API token but gives you everything the dashboard modal renders in a single response."
|
||||
},
|
||||
"workflows": {
|
||||
"heading": "n8n, Zapier and custom scripts",
|
||||
"intro": "For workflow tools and ad-hoc scripts that need to <em>raise</em> notifications through the Monitor (a CI failure, a smart-home sensor, a cron job that ran too long), the recipe is one POST to <code>/api/notifications/send</code>. The event flows through the same dispatch pipeline as anything emitted internally — dedup, cooldown, optional AI rewrite, fan-out to the configured channels.",
|
||||
"n8nBody": "In n8n, the equivalent is an <em>HTTP Request</em> node with method POST, the URL above, an <em>Authorization</em> header set to <code>Bearer '{''{'$credentials.proxmenux.token'}''}'</code> (using n8n credentials), and a JSON body matching the curl payload. Wire any preceding node as the trigger (cron, webhook, condition).",
|
||||
"severityBody": "Severity values are <code>INFO</code>, <code>WARNING</code> or <code>CRITICAL</code> (uppercase). The <code>data</code> payload is free-form JSON — the AI rewriter, when enabled, will pull anything useful from it for the rendered body. Full event-type semantics live in <link>Notifications → Event catalogue</link>."
|
||||
},
|
||||
"pveWebhook": {
|
||||
"heading": "Native Proxmox VE webhook (inbound)",
|
||||
"intro1": "Proxmox VE 8.1+ has its own notification system. ProxMenux Monitor registers itself as a webhook target so that everything PVE emits on its own (HA fencing, replication, vzdump from the GUI, certificate renewal) lands in the same dispatch pipeline as the Monitor's own events. This happens automatically when you press <em>Enable Notifications</em> on the Settings tab — no integration work required on the user side.",
|
||||
"intro2": "Mechanics, the body template PVE sends, the entries written to <code>/etc/pve/notifications.cfg</code>, and behaviour in clusters are documented in <link>Notifications → PVE webhook integration</link>."
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "API Reference",
|
||||
"href": "/docs/monitor/api",
|
||||
"tail": " — every endpoint with method, path and the full Prometheus metric catalogue."
|
||||
},
|
||||
{
|
||||
"label": "Notifications",
|
||||
"href": "/docs/monitor/notifications",
|
||||
"tail": " — event sources, channels, the dispatch pipeline, the PVE webhook integration in detail."
|
||||
},
|
||||
{
|
||||
"label": "AI Assistant",
|
||||
"href": "/docs/monitor/ai-assistant",
|
||||
"tail": " — the optional rewriter that turns templated bodies into plain language before they reach Telegram / Discord / email / Gotify."
|
||||
},
|
||||
{
|
||||
"label": "Access & Authentication",
|
||||
"href": "/docs/monitor/access-auth",
|
||||
"tail": " — minting and revoking the API tokens these recipes consume, audit log, TLS configuration."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,483 @@
|
||||
{
|
||||
"meta": {
|
||||
"title": "Proxmox Notifications — Telegram, Discord, Email, Gotify, Apprise | ProxMenux Monitor",
|
||||
"description": "Send Proxmox VE notifications to Telegram, Discord, Email, Gotify and ~80 extra services via Apprise. ProxMenux Monitor turns events from the Health Monitor, the journal watcher and the Proxmox VE webhook into rich messages with deduplication, cooldown, burst aggregation, an optional AI rewrite and a complete history.",
|
||||
"ogTitle": "Proxmox Notifications — Telegram, Discord, Email, Gotify, Apprise",
|
||||
"ogDescription": "Send Proxmox VE alerts to Telegram, Discord, Email, Gotify and ~80 extra services via Apprise — with deduplication, cooldown, burst aggregation and an optional AI rewrite.",
|
||||
"twitterTitle": "Proxmox Notifications | ProxMenux Monitor",
|
||||
"twitterDescription": "Send Proxmox VE alerts to Telegram, Discord, Email, Gotify and ~80 extra services via Apprise."
|
||||
},
|
||||
"header": {
|
||||
"title": "Notifications",
|
||||
"description": "The fan-out engine that takes events from every collector inside the Monitor and delivers them to Telegram, Discord, Email, Gotify and ~80 extra services via Apprise — with deduplication, cooldown, burst aggregation, per-event and per-channel toggles, an optional AI rewriter, and a queryable history.",
|
||||
"section": "ProxMenux Monitor"
|
||||
},
|
||||
"intro": {
|
||||
"title": "Where messages come from",
|
||||
"body": "Notifications are not a separate scanner. They are the output side of every collector already running inside the Monitor — the <link>Health Monitor</link>, the journal watcher, the Proxmox task watcher, the PVE webhook hook, the polling collector and the in-process events emitted by ProxMenux scripts. Each event runs through the same dispatch pipeline before reaching a phone or an inbox."
|
||||
},
|
||||
"howItWorks": {
|
||||
"heading": "How it works",
|
||||
"intro": "Every notification follows the same path through the Monitor process. Events are produced by a handful of independent collectors, normalised into a structured payload, passed through a dispatch pipeline that decides whether to send and in what shape, optionally rewritten by an LLM, and finally fanned out to whichever channels the user has configured.",
|
||||
"arrowLabel": "event",
|
||||
"caption": "High-level flow. Every actual dispatch attempt — successful, aggregated or failed — is recorded in the SQLite history table for retrospective inspection. Events suppressed by the cooldown stage are not logged.",
|
||||
"nodes": {
|
||||
"sourcesLabel": "Sources",
|
||||
"sourcesDetail": "Health Monitor\nJournal watcher\nTask watcher\nPVE webhook hook\nPolling collector\nIn-process emitters",
|
||||
"dispatchLabel": "Dispatch pipeline",
|
||||
"dispatchDetail": "Per-event toggle\nFingerprint dedup\nCooldown\nBurst aggregation",
|
||||
"aiLabel": "AI rewrite (opt.)",
|
||||
"aiDetail": "OpenAI / Anthropic\nGemini / Groq\nOpenRouter / Ollama\n(off by default)",
|
||||
"channelsLabel": "Channels",
|
||||
"channelsDetail": "Telegram\nDiscord\nEmail (SMTP)\nGotify\nApprise (~80 services)"
|
||||
}
|
||||
},
|
||||
"enabling": {
|
||||
"heading": "Enabling the panel",
|
||||
"intro": "On a fresh install the Notifications card on the Settings tab shows a <em>Disabled</em> badge and a single <em>Enable Notifications</em> button. Nothing is dispatched and no PVE config is touched until you press it.",
|
||||
"disabledAlt": "Notifications card on a fresh install showing Disabled badge and a single Enable Notifications button",
|
||||
"disabledCaption": "The first state — one click to enable.",
|
||||
"stepsIntro": "Pressing the button does three things in sequence:",
|
||||
"steps": [
|
||||
"Flips the panel to its <em>Active</em> state and unfolds the channel form below.",
|
||||
"Registers a Proxmox VE webhook target in <code>/etc/pve/notifications.cfg</code> pointing at <code>POST http://127.0.0.1:8008/api/notifications/webhook</code>. From this moment on, anything Proxmox VE emits on its own (HA, replication, vzdump from the GUI) flows into the same pipeline as the Monitor's own events. See <pvelink>PVE webhook integration</pvelink> below for the full mechanics.",
|
||||
"Starts the dispatch background thread. The thread polls the event queue and walks every event through the pipeline diagrammed above."
|
||||
],
|
||||
"activeAlt": "Notifications card after enabling — Active badge, channel tabs (Telegram, Gotify, Discord, Email), Display Name field and Advanced AI Enhancement collapsible section",
|
||||
"activeCaption": "Active state — channel tabs at the top (Telegram / Gotify / Discord / Email), the Display Name field, the per-channel category list, and the collapsible <em>Advanced: AI Enhancement</em> section."
|
||||
},
|
||||
"sources": {
|
||||
"heading": "Event sources",
|
||||
"intro": "Six independent collectors feed the notification engine. They run as background threads inside the Monitor process and emit a structured <code>NotificationEvent</code> every time something happens.",
|
||||
"headerCollector": "Collector",
|
||||
"headerWatches": "Watches",
|
||||
"headerEvents": "Typical events",
|
||||
"rows": [
|
||||
{
|
||||
"collector": "Health Monitor",
|
||||
"watches": "Ten categories, every 5 minutes",
|
||||
"events": "<code>new_error</code>, <code>error_resolved</code>, <code>error_escalated</code>, <code>health_degraded</code>, <code>health_persistent</code>."
|
||||
},
|
||||
{
|
||||
"collector": "Journal watcher",
|
||||
"watches": "<code>journalctl --follow</code> with pattern matching for SSH / web auth failures, Fail2Ban bans (when the optional jail is installed), kernel I/O errors, OOM, smartd events.",
|
||||
"events": "<code>auth_fail</code>, <code>ip_block</code>, <code>oom_kill</code>, <code>disk_io_error</code>, <code>service_fail</code>."
|
||||
},
|
||||
{
|
||||
"collector": "Task watcher",
|
||||
"watches": "Polls <code>/var/log/pve/tasks/index</code> for new task UPIDs and follows their per-file logs.",
|
||||
"events": "<code>backup_start</code>, <code>backup_complete</code>, <code>backup_warning</code>, <code>backup_fail</code>, <code>migration_*</code>, <code>snapshot_complete</code>."
|
||||
},
|
||||
{
|
||||
"collector": "Proxmox webhook hook",
|
||||
"watches": "Listens on <code>POST /api/notifications/webhook</code>. Proxmox VE 8.1+ pushes its own notifications here once the integration is set up (see <pvelink>below</pvelink>).",
|
||||
"events": "Anything PVE emits — including events the Monitor would otherwise miss (HA, replication, vzdump from the GUI)."
|
||||
},
|
||||
{
|
||||
"collector": "Polling collector",
|
||||
"watches": "Periodic comparisons (cluster nodes online, certificate expiry, GPU passthrough state, PVE / ProxMenux update availability).",
|
||||
"events": "<code>node_disconnect</code>, <code>node_reconnect</code>, <code>pve_update</code>, <code>proxmenux_update</code>, <code>gpu_mode_switch</code>, <code>pci_passthrough_conflict</code>."
|
||||
},
|
||||
{
|
||||
"collector": "In-process emitters",
|
||||
"watches": "Direct calls from ProxMenux scripts and from the Monitor itself (<code>notification_manager.emit_event(...)</code>).",
|
||||
"events": "<code>system_startup</code>, <code>system_shutdown</code>, <code>system_reboot</code>, <code>ai_model_migrated</code>, custom test events."
|
||||
}
|
||||
],
|
||||
"after1": "Every event carries a stable <code>event_type</code> (the catalogue is below), a <code>severity</code> (<code>INFO</code>, <code>WARNING</code>, <code>CRITICAL</code>), a <code>category</code> (used for emoji enrichment and per-group filters) and a <code>data</code> payload with anything the template needs (<code>vmid</code>, <code>device</code>, <code>source_ip</code>, <code>reason</code>…).",
|
||||
"after2": "Each <code>event_type</code> has a matching template in <code>notification_templates.py</code> that renders the structured event into a plain-text body before anything else happens. That templated body is what travels through the dispatch pipeline, and what the optional AI layer rewrites if enabled. See the <ailink>AI Assistant page</ailink> for how the rewrite layer interacts with this templated body."
|
||||
},
|
||||
"channels": {
|
||||
"heading": "Channel walkthroughs",
|
||||
"intro": "Five channels are currently supported: Telegram, Discord, Gotify, Email (SMTP) and Apprise. The first four are native — each one has its own tab inside the Notifications panel with a <em>+ setup guide</em> link opening an in-app modal. Apprise is a generic hub that adds ~80 additional services (ntfy, Matrix, Pushover, Slack, Teams, Pushbullet, AWS SNS, Mattermost…) through a single URL field. They are all documented step by step below.",
|
||||
"credsTitle": "Where credentials live",
|
||||
"credsBody": "Tokens, webhook URLs and SMTP passwords are stored locally in the Monitor's SQLite database under <code>/usr/local/share/proxmenux/</code>. They never leave the host except to reach their respective services. A backup of that directory is enough to recover the configured channels."
|
||||
},
|
||||
"telegram": {
|
||||
"heading": "Telegram",
|
||||
"intro": "Two pieces of information are required: a <strong>Bot Token</strong> (one per bot, reusable across chats) and a <strong>Chat ID</strong> (where the bot should post — your private chat, a group, or a topic inside a supergroup). The in-app guide below contains the full step-by-step; the rest of this section repeats it as text plus the two shapes the Chat ID can take.",
|
||||
"guideAlt": "Telegram Bot Setup Guide modal with four numbered sections: Create a Bot with BotFather, Get the Bot Token, Get Your Chat ID and For Groups or Channels",
|
||||
"guideCaption": "The <em>+ setup guide</em> link inside the Telegram tab opens this modal — the four numbered steps go from no bot to a working channel in about two minutes.",
|
||||
"step1Title": "1 · Create a bot with BotFather",
|
||||
"step1Items": [
|
||||
"Open Telegram and start a chat with <a>@BotFather</a> (the one with the blue verification tick — copies are common).",
|
||||
"Send <code>/newbot</code>.",
|
||||
"Pick a display name (e.g. <em>ProxMenux Lab</em>). It can be changed later.",
|
||||
"Pick a username ending in <code>bot</code> (e.g. <em>proxmenux_lab_bot</em>). It must be unique across Telegram.",
|
||||
"BotFather replies with a token of the form <code>123456789:ABCdef…</code> — that is the Bot Token. Treat it as a password."
|
||||
],
|
||||
"step2Title": "2 · Get the Chat ID",
|
||||
"step2Intro": "The Chat ID identifies <em>where</em> the bot posts. It takes one of two shapes depending on the target.",
|
||||
"privateLabel": "Private chat (you receive the alerts on your own account):",
|
||||
"privateItems": [
|
||||
"Start a chat with your new bot and send any message (e.g. <code>/start</code>).",
|
||||
"Open a chat with <a1>@userinfobot</a1> (or <a2>@myidbot</a2>) and send <code>/start</code>. It replies with your numeric user ID — that is the Chat ID. It is a positive number."
|
||||
],
|
||||
"privateAlt": "Telegram channel form filled with Bot Token (masked), positive Chat ID for a private chat and an empty optional Topic ID field",
|
||||
"privateCaption": "Private chat with the bot — Chat ID is a positive number (your personal user ID).",
|
||||
"groupLabel": "Group or supergroup with topics:",
|
||||
"groupItems": [
|
||||
"Add the bot to the group as a member (and make it admin if the group requires it to post).",
|
||||
"Send any message in the group.",
|
||||
"Open <code>https://api.telegram.org/bot<YOUR_TOKEN>/getUpdates</code> in a browser. Look for <code>chat.id</code> in the JSON response — for groups it is a negative number, for supergroups it starts with <code>-100</code>.",
|
||||
"For supergroups with <em>Topics</em> enabled, also note the <code>message_thread_id</code> of the topic you want to target — that goes in the optional <em>Topic ID</em> field."
|
||||
],
|
||||
"groupAlt": "Telegram channel form with Bot Token (masked), negative Chat ID prefixed with -100 indicating a supergroup, and Topic ID 3 set to deliver into a specific topic",
|
||||
"groupCaption": "Supergroup — Chat ID starts with <code>-100…</code> and the optional <em>Topic ID</em> targets a specific thread.",
|
||||
"step3Title": "3 · Save and test",
|
||||
"step3Body": "Paste the Bot Token and Chat ID into the Telegram tab, save, and press <em>Send Test</em> at the bottom of the panel. A test message should arrive within a second; if it doesn't, the History section records the failure with the exact reason (invalid token, bot not in group, blocked by user, etc.)."
|
||||
},
|
||||
"discord": {
|
||||
"heading": "Discord",
|
||||
"intro": "Discord channels accept incoming messages through a <em>Webhook URL</em> tied to a single channel. The Monitor needs that URL and nothing else.",
|
||||
"items": [
|
||||
"In Discord, open the server where you want notifications to land and go to <em>Server Settings → Integrations → Webhooks</em>.",
|
||||
"Click <em>New Webhook</em>. Give it a name (e.g. <em>ProxMenux</em>) and pick the channel it should post to. An avatar is optional.",
|
||||
"Click <em>Copy Webhook URL</em> — it looks like <code>https://discord.com/api/webhooks/<id>/<token></code>.",
|
||||
"Paste it in the Webhook URL field of the Discord tab in the Notifications panel and save."
|
||||
],
|
||||
"imageAlt": "Discord channel form with Webhook URL field starting with https://discord.com/api/webhooks/",
|
||||
"imageCaption": "Discord — paste the Webhook URL from <em>Server Settings → Integrations → Webhooks</em>."
|
||||
},
|
||||
"gotify": {
|
||||
"heading": "Gotify",
|
||||
"intro": "Gotify is a self-hosted push server. You need its base URL and an <em>Application Token</em> generated from the Gotify admin UI.",
|
||||
"items": [
|
||||
"If you don't already have a Gotify instance, install one — see the <a>official install guide</a>.",
|
||||
"Open the Gotify web UI, log in as admin, go to <em>Apps</em> → <em>Create Application</em>. Give it a name (e.g. <em>ProxMenux</em>). Gotify generates a token — copy it.",
|
||||
"In the Gotify tab of the Notifications panel, set <em>Server URL</em> to the base URL of your instance (e.g. <code>https://gotify.example.com</code>) and paste the App Token.",
|
||||
"Save and press <em>Send Test</em>."
|
||||
],
|
||||
"imageAlt": "Gotify channel form with Server URL field set to https://gotify.example.com and an App Token field with placeholder A_valid_gotify_token",
|
||||
"imageCaption": "Gotify — server URL of your self-hosted instance plus the App Token from the Gotify admin UI."
|
||||
},
|
||||
"email": {
|
||||
"heading": "Email (SMTP)",
|
||||
"intro": "Email is the most flexible channel — and the one with the most fields. You need an SMTP server, a port, a TLS mode, optionally a username and password, a sender address and at least one recipient.",
|
||||
"imageAlt": "Email channel form with SMTP Host, Port, TLS Mode dropdown, Username, Password, From Address, To Addresses comma-separated and Subject Prefix fields",
|
||||
"imageCaption": "Email — SMTP host / port / TLS mode, optional username + password, sender address, comma-separated recipients and a subject prefix to make alerts easy to filter inbox-side.",
|
||||
"appNote": "If you use a personal Gmail or Microsoft 365 account, the password field cannot be your normal account password — both providers require an <strong>app password</strong> generated specifically for third-party clients. The two flows are below.",
|
||||
"gmailTitle": "Gmail app password",
|
||||
"gmailIntro": "Gmail app passwords require <strong>2-Step Verification</strong> to be active on the Google account. If it isn't, the <em>App passwords</em> page won't exist.",
|
||||
"gmailItems": [
|
||||
"Open <a>myaccount.google.com/security</a> and turn on <em>2-Step Verification</em> if it's not already on.",
|
||||
"Go to <a>myaccount.google.com/apppasswords</a>.",
|
||||
"Type a name (e.g. <em>ProxMenux</em>) and click <em>Create</em>. Google shows a 16-character password — copy it.",
|
||||
"Fill the Email tab with: <em>Host</em> <code>smtp.gmail.com</code>, <em>Port</em> <code>587</code>, <em>TLS Mode</em> <code>STARTTLS</code>, <em>Username</em> your Gmail address, <em>Password</em> the 16-character app password."
|
||||
],
|
||||
"outlookTitle": "Microsoft / Outlook app password",
|
||||
"outlookIntro": "Microsoft now requires <strong>two-step verification</strong> on the personal account before an app password can be created. Enterprise tenants where the admin has disabled SMTP basic auth need a different path (OAuth2) which is not currently supported by the Monitor — point those at an SMTP relay you control instead.",
|
||||
"outlookItems": [
|
||||
"Open <a>account.microsoft.com/security</a> and enable two-step verification.",
|
||||
"Open <em>Advanced security options</em>, scroll to <em>App passwords</em> and click <em>Create a new app password</em>.",
|
||||
"Microsoft shows a long random password — copy it.",
|
||||
"Fill the Email tab with: <em>Host</em> <code>smtp-mail.outlook.com</code>, <em>Port</em> <code>587</code>, <em>TLS Mode</em> <code>STARTTLS</code>, <em>Username</em> your Outlook / Microsoft 365 address, <em>Password</em> the generated app password."
|
||||
],
|
||||
"relayTitle": "Self-hosted SMTP relay",
|
||||
"relayBody": "If you run your own SMTP relay (Postfix, msmtp, etc.) on the LAN, point the Monitor at it and skip the app-password dance entirely. The relay handles auth upstream and the Monitor sends in cleartext on a trusted network."
|
||||
},
|
||||
"apprise": {
|
||||
"heading": "Apprise (generic hub for ~80 services)",
|
||||
"intro": "Apprise is an open-source notification library that speaks the protocol of around 80 different services through a single URL format. Adding it as one more channel inside the Monitor means you can deliver alerts to services that don't have a dedicated tab — ntfy, Matrix, Pushover, Slack, Microsoft Teams, Mattermost, Pushbullet, AWS SNS, Pushsafer, Rocket.Chat, Signal API and many others — without ProxMenux having to implement each integration separately.",
|
||||
"listIntro": "The full list of supported services and the exact URL format for each one lives in the official Apprise wiki:",
|
||||
"listItems": [
|
||||
"<a>github.com/caronc/apprise/wiki</a> — full index of supported services.",
|
||||
"<a>URL basics</a> — how Apprise URLs are structured."
|
||||
],
|
||||
"stepsTitle": "Steps",
|
||||
"steps": [
|
||||
"Pick the target service in the <a>Apprise wiki</a> and copy the URL template for it. Each service page shows the exact scheme to use (<code>ntfy://</code>, <code>matrix://</code>, <code>pover://</code>, <code>slack://</code>…) plus any required tokens, channels or hostnames.",
|
||||
"Fill in the placeholders with your own credentials. For example, an ntfy.sh topic looks like <code>ntfy://ntfy.sh/my-topic</code>; a Pushover URL looks like <code>pover://user@token</code>; a Matrix URL looks like <code>matrix://user:pass@host:port/#room</code>.",
|
||||
"Paste the final URL into the <em>Apprise URL</em> field in the Apprise tab of the Notifications panel and save.",
|
||||
"Press <em>Send Test</em> to verify the URL is reachable and the credentials are accepted."
|
||||
],
|
||||
"deliveredTitle": "What gets delivered",
|
||||
"deliveredBody": "Apprise receives the same payload as the other channels — title, body and a severity (info / success / warning / failure). Severity is mapped to whatever the destination service exposes (icon, priority, colour). Rich-message formatting and the AI rewrite layer all run before the URL is invoked, exactly like for Telegram or Email.",
|
||||
"fanoutTitle": "One URL per Apprise channel",
|
||||
"fanoutBody": "The Monitor exposes a single URL slot per Apprise channel. If you need to fan-out to several Apprise services at once (e.g. ntfy.sh plus a Matrix room), the cleanest approach is to host a small <a>Apprise API server</a> with a tagged config and point the Monitor at its endpoint — the server then broadcasts to every URL behind that tag."
|
||||
},
|
||||
"rich": {
|
||||
"heading": "Rich messages, categories and per-channel filtering",
|
||||
"intro": "Below the channel form every channel exposes the same three controls: a <em>Rich messages</em> toggle at the top (highlighted with the arrow in the screenshot), eleven collapsible <em>Notification Categories</em> with per-event toggles, and a <em>Send Test</em> button at the bottom.",
|
||||
"imageAlt": "Notification Categories panel with Rich messages master toggle highlighted at top, collapsible sections for VM/CT, Backups, Resources, Storage, Network, Security, Cluster, Services, Health Monitor, Updates each with toggle and event count, and a Send Test button",
|
||||
"imageCaption": "Top arrow — the per-channel <em>Rich messages</em> toggle. Below — the eleven collapsible categories with per-event toggles. <em>Send Test</em> sits at the bottom of the channel.",
|
||||
"richTitle": "Rich messages",
|
||||
"richIntro": "With <em>Rich messages</em> on, every event header is prefixed with a category emoji and the body is rendered using the channel's native formatting (Telegram HTML, Discord embed with severity colour). With it off, the Monitor sends a plain-text version with the same information minus the visual cues. Same content, different presentation:",
|
||||
"plainHeader": "Plain — Rich messages off",
|
||||
"richHeader": "Rich — Rich messages on",
|
||||
"richOutro": "The toggle is per-channel: leave Email plain for inbox-rule readability while letting Telegram and Discord render the rich version. Channels that don't support inline formatting (plain-text email, Gotify) ignore the formatting and fall back to text either way.",
|
||||
"togglesTitle": "Per-event categories",
|
||||
"togglesIntro": "Around seventy event types are grouped into eleven UI categories. Each event has a master toggle and a per-channel override — two layers that decide whether a given event reaches a given channel:",
|
||||
"togglesItems": [
|
||||
"<strong>Per-event master toggle.</strong> If <code>vm_start</code> is off everywhere, no channel ever sees a <code>vm_start</code>. Toggles persist as <code>event_toggles[event_type] = true | false</code>.",
|
||||
"<strong>Per-channel overrides.</strong> An event type can also be muted for a specific channel (<em>\"send <code>backup_complete</code> to Discord but not to Telegram\"</em>). These live in <code>channel_overrides[channel_name][event_type]</code> and only apply if the event passed the master toggle."
|
||||
],
|
||||
"togglesOutro": "Each category header in the screenshot also shows the count of events <em>currently enabled</em> / <em>total</em> for that group, and a category-level toggle that flips every event inside it on or off in one click — the shortcut for muting a whole group (e.g. all <code>info</code> backups, all update-related events) without expanding the section."
|
||||
},
|
||||
"quiet": {
|
||||
"heading": "Quiet Hours",
|
||||
"intro": "Quiet Hours is a per-channel time window during which the dispatcher only lets <strong>CRITICAL</strong> events through. Everything else — INFO, WARNING, action events — is held back, persisted to disk, and delivered as a single grouped summary the moment the window closes. The channel still gets the urgent things in real time; the noise waits until you're likely to want it.",
|
||||
"imageAlt": "Channel settings showing both knobs side-by-side: Quiet Hours card with toggle on, Start 22:00 and End 07:00 plus a live preview of the next transition, and right below it the Daily digest card with its own toggle, a delivery time picker set to 09:00 and the note that CRITICAL and WARNING are never delayed",
|
||||
"imageCaption": "Both knobs live side-by-side inside each channel's settings card — Quiet Hours on top, Daily digest underneath. Independent per channel.",
|
||||
"purposeTitle": "What it is for",
|
||||
"purposeItems": [
|
||||
"<strong>Don't wake me at 03:00 for an update notice.</strong> Backups, app updates, post-install optimisations and other INFO-level events stop pinging your phone at night.",
|
||||
"<strong>But still wake me for a fire.</strong> Disk failures, OOM kills, host shutdowns, fail2ban bans — anything classified as CRITICAL — bypass the window and arrive immediately.",
|
||||
"<strong>Don't miss anything either.</strong> The events suppressed during the window aren't silently dropped — they sit in a SQLite buffer until you're back on the clock."
|
||||
],
|
||||
"howTitle": "How it works",
|
||||
"howItems": [
|
||||
"<strong>Per-channel toggle.</strong> Each channel has its own Quiet Hours config — Telegram can be silent 22:00–07:00 while email keeps receiving everything 24/7.",
|
||||
"<strong>Start and end time</strong> in your local timezone, half-open interval (start inclusive, end exclusive). The window can cross midnight (e.g. 22:00–07:00 means tonight until tomorrow morning).",
|
||||
"<strong>Live preview line</strong> right below the inputs shows whether the window is currently active and when the next transition happens. Saves opening a clock.",
|
||||
"<strong>During the window:</strong> CRITICAL events still fire through the normal dispatch pipeline. INFO and WARNING events are routed to a persistent buffer (<code>quiet_pending</code> table in the Monitor's SQLite DB).",
|
||||
"<strong>When the window closes:</strong> a single grouped notification is sent with everything that accumulated — one line per buffered event, in chronological order. The buffer is cleared only after the channel confirms delivery, so a transient Telegram / SMTP outage doesn't lose the night's context.",
|
||||
"<strong>Across restarts.</strong> If the Monitor restarts mid-window, the buffer is intact on disk. If the restart happens just after the window closed, the next dispatch cycle detects the pending rows and flushes them with a single \"recovery\" summary — no notifications are lost to a deploy or a reboot."
|
||||
],
|
||||
"criticalTitle": "What counts as CRITICAL",
|
||||
"criticalBody": "Severity is set at event creation, not at dispatch time. Disk failures, OOM kills, cluster split-brain, host shutdowns and the \"hard\" tier of disk I/O errors ship as CRITICAL by design. Everything else (backups OK, updates available, INFO logs, rate-limit hits) defaults to INFO or WARNING and is therefore quietable. You can verify a given event's default severity in the <link>Event catalogue</link> further down this page."
|
||||
},
|
||||
"digest": {
|
||||
"heading": "Daily digest of INFO events",
|
||||
"intro1": "The Daily Digest is the opposite knob: an <strong>opt-in</strong> setting that says \"don't send me every successful backup or update notice as it happens — collect them and send me one summary per day at 09:00 (or whatever hour I choose)\". Same goal as Quiet Hours (less noise) but a different mechanism (time-based summary instead of a daily window).",
|
||||
"intro2": "It lives in the same channel-settings card as Quiet Hours (see the figure under <link>Quiet Hours</link>), right underneath. You enable each one independently.",
|
||||
"purposeTitle": "What it is for",
|
||||
"purposeItems": [
|
||||
"<strong>The morning \"everything that happened\" recap.</strong> If you check on the host once a day with a coffee, one digest at 09:00 carries the same information as 20 individual pings throughout the previous day, without you reading 20 Telegram bubbles.",
|
||||
"<strong>Separate noise from signal.</strong> INFO events answer \"what happened\"; CRITICAL and WARNING answer \"what do I need to do right now\". The digest handles the first; everything else keeps its live delivery."
|
||||
],
|
||||
"howTitle": "How it works",
|
||||
"howItems": [
|
||||
"<strong>Per-channel opt-in.</strong> Off by default — Telegram doesn't silently batch your alerts. You enable it on the channels where you want a digest, leaving others on live delivery.",
|
||||
"<strong>Delivery time</strong> in your local timezone. Defaults to 09:00 but you can pick any time; the dispatcher fires the digest within ~60 s of that minute.",
|
||||
"<strong>What goes into the digest:</strong> any event the channel would have received live whose severity is <strong>INFO</strong>. Examples — <em>vzdump complete</em>, <em>Tailscale update available</em>, <em>ProxMenux optimisation update available</em>, <em>APT security updates pending</em>, <em>rate-limit hit</em>.",
|
||||
"<strong>What is never delayed:</strong>",
|
||||
"<strong>Persistence.</strong> Pending events sit in a SQLite table (<code>digest_pending</code>) until the configured hour. The Monitor can restart freely without losing what the digest will eventually contain.",
|
||||
"<strong>Empty days are silent.</strong> If nothing INFO-level happened, no digest is sent — the channel stays quiet rather than receiving a \"no events to report\" message."
|
||||
],
|
||||
"neverDelayedSub": [
|
||||
"<strong>CRITICAL</strong> events always go through immediately.",
|
||||
"<strong>WARNING</strong> events always go through immediately.",
|
||||
"Live-action events (VM/CT start / stop / shutdown / restart, vm_fail / ct_fail, backup start / fail, replication start / fail, host shutdown / reboot) bypass the digest even at INFO severity — you opted in to see those live, the digest would defeat that opt-in."
|
||||
],
|
||||
"comboTitle": "Combining Quiet Hours and Daily Digest",
|
||||
"comboBody": "The two work together. A channel can have <em>both</em> active — Quiet Hours from 22:00 to 07:00 plus a Daily Digest at 09:00. INFO events during the quiet window go to the quiet buffer and arrive at 07:00 as the close-of-window summary; INFO events during the day go to the digest buffer and arrive at 09:00 the next morning. CRITICAL and WARNING always cut through both. Choose Quiet Hours when the goal is a <em>window of silence</em>, the Daily Digest when the goal is a <em>fixed-time summary</em>; many setups want both."
|
||||
},
|
||||
"displayName": {
|
||||
"heading": "Display Name",
|
||||
"intro": "Every notification carries a <em>Display Name</em> — the label that identifies which host produced the alert. It is the value you see at the bottom of the rich-messages example above (<code>🏠 home-lab</code>) and inside the email subject prefix.",
|
||||
"imageAlt": "Display Name field with the value amd shown as example, label Name shown in notifications - edit to customize or leave empty to use the system hostname",
|
||||
"imageCaption": "The Display Name field — leave empty to use the system hostname, or override with anything you want.",
|
||||
"outro": "If the field is empty, the Monitor falls back to the system hostname. The override is mostly useful when you run several ProxMenux hosts that send to the same Telegram chat or inbox — a friendlier label (<em>home-lab</em>, <em>office-pve</em>) is easier to read than <code>pve01.lan</code> or <code>pmx-prod-01</code>."
|
||||
},
|
||||
"dispatch": {
|
||||
"heading": "Dispatch pipeline",
|
||||
"intro": "Between an event being raised and a message leaving the host, three stages run in this order:",
|
||||
"headerStage": "Stage",
|
||||
"headerWhat": "What it does",
|
||||
"headerTunable": "Tunable?",
|
||||
"rows": [
|
||||
{
|
||||
"stage": "1. Fingerprint dedup",
|
||||
"what": "Each event is hashed into a fingerprint (<code>event_type + key fields from data</code>). Identical fingerprints inside a short window are considered duplicates of the first one.",
|
||||
"tunable": "No — internal dispatcher logic."
|
||||
},
|
||||
{
|
||||
"stage": "2. Cooldown",
|
||||
"what": "After a fingerprint is sent, the same fingerprint is suppressed for the per-severity cooldown duration. Stored in the <code>notification_last_sent</code> SQLite table so it survives restarts. Defaults: <code>CRITICAL</code> 60 s, <code>WARNING</code> 300 s, <code>INFO</code> 900 s, plus a per-category override on top (e.g. <code>resources</code> 900 s, <code>updates</code> 86 400 s).",
|
||||
"tunable": "No — defaults baked into the dispatcher."
|
||||
},
|
||||
{
|
||||
"stage": "3. Burst aggregation",
|
||||
"what": "When N events of a kind arrive inside a short window (e.g. an SSH brute-force flood), they are merged into a single <code>burst_*</code> message with a count and a sample.",
|
||||
"tunable": "No — window and threshold are hard-coded per event type."
|
||||
}
|
||||
],
|
||||
"calloutTitle": "Dispatch happens in a background thread",
|
||||
"calloutBody": "The dispatch loop runs in its own thread. The HTTP request that emits an event returns as soon as the event is queued — it does not wait for Telegram, SMTP or webhook RTT. Every send result is recorded in the history table for retrospective inspection."
|
||||
},
|
||||
"aiRewrite": {
|
||||
"heading": "Optional AI rewrite",
|
||||
"body1": "Any event can be passed through an LLM that rewrites its body in plain language and (optionally) in the target user's language before fan-out. The AI rewriter is off by default. When enabled it runs in the dispatch thread; if the provider call fails or times out, the original templated body is used instead.",
|
||||
"body2": "Six providers are supported (OpenAI, Anthropic, Google Gemini, Groq, OpenRouter and local Ollama), with per-channel detail level (<code>brief</code>, <code>standard</code>, <code>detailed</code>), output language, prompt mode (<code>default</code> or <code>custom</code>) and an optional custom prompt. Full configuration walk-through, captures and prompt examples live in the dedicated <link>AI Assistant</link> page.",
|
||||
"privacyTitle": "Privacy note",
|
||||
"privacyBody": "AI rewrite sends the event body — which can include hostnames, IP addresses, usernames, error messages and journal lines — to the configured provider. Ollama keeps everything on-host; the other five providers transmit data to their respective endpoints. Disable the rewriter, or use Ollama, if the host runs in an environment where event content cannot leave the network."
|
||||
},
|
||||
"pveWebhook": {
|
||||
"heading": "PVE webhook integration",
|
||||
"intro1": "Proxmox VE 8.1+ has its own notification system with built-in <em>endpoints</em> (sendmail, gotify, SMTP, webhook). When you enable Notifications on the Monitor, it registers itself as one of those endpoints — a <code>webhook</code> target that points back at the Monitor's own API. From that moment on, anything Proxmox itself emits (HA fencing, replication, vzdump from the GUI, certificate renewal, etc.) flows through the same dispatch pipeline as the Monitor's own events.",
|
||||
"intro2": "The target is visible from the Proxmox GUI at <em>Datacenter → Notifications → Notification Targets</em>:",
|
||||
"imageAlt": "Proxmox VE Edit Webhook dialog showing the auto-created proxmenux-webhook target with method POST, URL http://127.0.0.1:8008/api/notifications/webhook and a JSON body template using escape title, escape message, escape severity, escape timestamp and json fields",
|
||||
"imageCaption": "The PVE-side webhook target as Proxmox sees it (the GUI is in the host's configured locale — Spanish in this example). Same fields apply in any language.",
|
||||
"registeredIntro": "What gets registered:",
|
||||
"registeredItems": [
|
||||
"<strong>Method & URL.</strong> <code>POST http://127.0.0.1:8008/api/notifications/webhook</code>. Loopback only — PVE talks to the Monitor process running on the same host.",
|
||||
"<strong>Body template.</strong> A JSON body using PVE's native Handlebars helpers — stored base64-encoded in the config file by PVE, but it expands to:",
|
||||
"<strong>Matcher.</strong> A companion <code>matcher: proxmenux-matcher</code> block with <code>mode all</code> so every PVE notification reaches the target.",
|
||||
"<strong>Companion priv block.</strong> An empty <code>webhook: proxmenux-webhook</code> entry is appended to <code>/etc/pve/priv/notifications.cfg</code>. PVE refuses to instantiate any webhook endpoint without a matching private block, even when no secrets are needed — so the Monitor writes a header-only stub there. No tokens, headers or HMAC are configured on the PVE side."
|
||||
],
|
||||
"securityTitle": "How the receiver is secured",
|
||||
"securityIntro": "The webhook receiver at <code>POST /api/notifications/webhook</code> applies different security layers depending on where the request comes from:",
|
||||
"securityItems": [
|
||||
"<strong>Loopback (<code>127.0.0.1</code> / <code>::1</code>).</strong> Rate-limit only. The endpoint trusts the loopback interface — only processes running on the host can reach it, and PVE itself cannot send custom auth headers in the body it generates. This is the path every PVE-emitted notification travels.",
|
||||
"<strong>Remote callers.</strong> Five layers stack on top of rate-limiting: a shared secret in the <code>X-Webhook-Secret</code> header, a freshness timestamp in <code>X-ProxMenux-Timestamp</code> (rejected if it drifts more than the configured window), a replay-cache lookup, and an optional IP allowlist. The shared secret lives in the Monitor's SQLite settings table — not in <code>/etc/pve/priv/notifications.cfg</code> — and is generated at first setup. This path exists for custom integrations posting from outside the host; the PVE-configured target never exercises it."
|
||||
],
|
||||
"practiceTitle": "In practice",
|
||||
"practiceBody": "The PVE setup writes the target as <code>http://127.0.0.1:8008</code>, so PVE-emitted notifications always go through the loopback path with rate-limit-only security. The remote-caller path with the shared secret is opt-in for custom integrations — point an external service at <code>https://<monitor-host>:<port>/api/notifications/webhook</code> and supply the <code>X-Webhook-Secret</code> header to use it.",
|
||||
"actionsIntro": "The Monitor manages this target through three actions on the Settings tab:",
|
||||
"actionsItems": [
|
||||
"<strong>Setup</strong> — runs automatically when you enable Notifications. Creates the entry in <code>/etc/pve/notifications.cfg</code> after backing up the current file.",
|
||||
"<strong>Cleanup</strong> — removes the entry. The previous backup of the file is kept.",
|
||||
"<strong>Read config</strong> — shows the current targets and matchers as PVE sees them. This is how you confirm the Monitor's entry is the one firing when PVE has multiple notification routes configured."
|
||||
],
|
||||
"clusterTitle": "Cluster nodes",
|
||||
"clusterBody": "<code>/etc/pve/</code> is replicated across cluster members, so the webhook target is visible on every node. Each node, however, posts to its <em>own</em> <code>127.0.0.1:8008</code> — meaning the Monitor running on that node receives the events that PVE generated locally. Run the Monitor on every node you want to see in the Notifications history."
|
||||
},
|
||||
"catalogue": {
|
||||
"heading": "Event catalogue",
|
||||
"intro": "Around seventy event types are grouped into eleven UI categories. The Notifications panel renders one collapsible section per group with a toggle for every event inside it. Each event is on by default unless explicitly marked otherwise.",
|
||||
"headerGroup": "Group",
|
||||
"headerEvents": "Events",
|
||||
"rows": [
|
||||
{
|
||||
"group": "VM / CT",
|
||||
"events": "<code>vm_start</code>, <code>vm_start_warning</code>, <code>vm_stop</code>, <code>vm_shutdown</code>, <code>vm_fail</code>, <code>vm_restart</code>, plus the <code>ct_*</code> equivalents, <code>migration_start</code>, <code>migration_complete</code>, <code>migration_warning</code>, <code>migration_fail</code>, <code>replication_complete</code>, <code>replication_fail</code>."
|
||||
},
|
||||
{
|
||||
"group": "Backups",
|
||||
"events": "<code>backup_start</code>, <code>backup_complete</code>, <code>backup_warning</code>, <code>backup_fail</code>, <code>snapshot_complete</code>, <code>snapshot_fail</code>."
|
||||
},
|
||||
{
|
||||
"group": "Resources",
|
||||
"events": "<code>cpu_high</code>, <code>ram_high</code>, <code>temp_high</code>, <code>load_high</code>."
|
||||
},
|
||||
{
|
||||
"group": "Storage",
|
||||
"events": "<code>disk_space_low</code>, <code>disk_io_error</code>, <code>storage_unavailable</code>, <code>smart_test_complete</code>, <code>smart_test_failed</code>."
|
||||
},
|
||||
{
|
||||
"group": "Network",
|
||||
"events": "<code>network_down</code>, <code>network_latency</code>."
|
||||
},
|
||||
{
|
||||
"group": "Security",
|
||||
"events": "<code>auth_fail</code>, <code>ip_block</code>, <code>firewall_issue</code>, <code>user_permission_change</code>."
|
||||
},
|
||||
{
|
||||
"group": "Cluster",
|
||||
"events": "<code>split_brain</code>, <code>node_disconnect</code>, <code>node_reconnect</code>."
|
||||
},
|
||||
{
|
||||
"group": "Services",
|
||||
"events": "<code>system_startup</code>, <code>system_shutdown</code>, <code>system_reboot</code>, <code>system_problem</code>, <code>service_fail</code>, <code>oom_kill</code>, <code>system_mail</code>."
|
||||
},
|
||||
{
|
||||
"group": "Health Monitor",
|
||||
"events": "<code>new_error</code>, <code>error_resolved</code>, <code>error_escalated</code>, <code>health_degraded</code>, <code>health_persistent</code>, <code>health_issue_new</code>, <code>health_issue_resolved</code>."
|
||||
},
|
||||
{
|
||||
"group": "Updates",
|
||||
"events": "<code>update_summary</code>, <code>update_available</code>, <code>pve_update</code>, <code>update_complete</code>, <code>proxmenux_update</code>."
|
||||
},
|
||||
{
|
||||
"group": "Hardware / GPU",
|
||||
"events": "<code>gpu_mode_switch</code>, <code>gpu_passthrough_blocked</code>, <code>pci_passthrough_conflict</code>, <code>ai_model_migrated</code>."
|
||||
}
|
||||
],
|
||||
"burstNote": "A handful of <code>burst_*</code> aggregation types (<code>burst_auth_fail</code>, <code>burst_ip_block</code>, <code>burst_disk_io</code>, etc.) exist only in the dispatcher — they replace bursts of individual events with a single summary message and are not exposed as toggles in the UI. They inherit the on/off state of their parent event type."
|
||||
},
|
||||
"history": {
|
||||
"heading": "History",
|
||||
"body1": "Every dispatch <em>attempt</em> the dispatcher actually performs is recorded in the <code>notification_history</code> SQLite table. Each row stores the timestamp (<code>sent_at</code>), channel, event type, severity, title, rendered message body, a <code>success</code> flag and — when the send failed — the error returned by the provider in <code>error_message</code>. Burst-aggregated events appear as a single row with the <code>burst_*</code> event type. Events suppressed by the cooldown stage are not logged: they never become a dispatch attempt.",
|
||||
"body2": "The History tab inside Settings → Notifications shows the last 20 entries and has a single <em>Clear</em> button that wipes the table.",
|
||||
"body3": "The same data is exposed at <code>GET /api/notifications/history</code> with optional <code>limit</code>, <code>offset</code>, <code>severity</code> and <code>channel</code> query parameters, and can be cleared with <code>DELETE /api/notifications/history</code>."
|
||||
},
|
||||
"api": {
|
||||
"heading": "API endpoints",
|
||||
"headerEndpoint": "Endpoint",
|
||||
"headerMethod": "Method",
|
||||
"headerUse": "Use",
|
||||
"rows": [
|
||||
{
|
||||
"endpoint": "/api/notifications/settings",
|
||||
"method": "GET / POST",
|
||||
"use": "Read or write the full configuration (channels, per-event toggles, AI rewriter, Display Name)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/test",
|
||||
"method": "POST",
|
||||
"use": "Send a test notification to one channel: <code>'{'\"channel\":\"telegram\"'}'</code>."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/test-ai",
|
||||
"method": "POST",
|
||||
"use": "Render and rewrite a sample event without dispatching it."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/provider-models",
|
||||
"method": "POST",
|
||||
"use": "List available models for the selected AI provider."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/send",
|
||||
"method": "POST",
|
||||
"use": "Emit an event from outside (custom integrations)."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/history",
|
||||
"method": "GET / DELETE",
|
||||
"use": "Read history with filters; clear it."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/webhook",
|
||||
"method": "POST",
|
||||
"use": "Receives Proxmox VE's own notifications. Loopback callers are rate-limited only; remote callers must additionally pass the <code>X-Webhook-Secret</code> header, <code>X-ProxMenux-Timestamp</code> freshness check, replay cache and optional IP allowlist."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/proxmox/setup-webhook",
|
||||
"method": "POST",
|
||||
"use": "Register the Monitor as a target in <code>/etc/pve/notifications.cfg</code>."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/proxmox/cleanup-webhook",
|
||||
"method": "POST",
|
||||
"use": "Remove the Monitor target from PVE's notification config."
|
||||
},
|
||||
{
|
||||
"endpoint": "/api/notifications/proxmox/read-cfg",
|
||||
"method": "GET",
|
||||
"use": "Show the current PVE notification config as PVE sees it."
|
||||
}
|
||||
]
|
||||
},
|
||||
"whereNext": {
|
||||
"heading": "Where to next",
|
||||
"items": [
|
||||
{
|
||||
"label": "AI Assistant",
|
||||
"href": "/docs/monitor/ai-assistant",
|
||||
"tail": " — providers, models, prompt modes, languages, per-channel detail levels."
|
||||
},
|
||||
{
|
||||
"label": "Health Monitor",
|
||||
"href": "/docs/monitor/health-monitor",
|
||||
"tail": " — the largest single producer of events, with its own per-category suppression durations."
|
||||
},
|
||||
{
|
||||
"label": "Architecture",
|
||||
"href": "/docs/monitor/architecture",
|
||||
"tailRich": " — where the SQLite tables (<code>notification_last_sent</code>, <code>notification_history</code>) and the dispatch thread fit into the wider Monitor process."
|
||||
},
|
||||
{
|
||||
"label": "Access & Authentication",
|
||||
"href": "/docs/monitor/access-auth",
|
||||
"tailRich": " — how API tokens are minted for scripts that call <code>/api/notifications/send</code>."
|
||||
},
|
||||
{
|
||||
"label": "Dashboard → System Logs",
|
||||
"href": "/docs/monitor/dashboard/system-logs",
|
||||
"tail": " — the live view of the same journal that feeds the journal watcher."
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user