ProxMenux

mirror of https://github.com/MacRimi/ProxMenux.git synced 2026-06-03 13:54:41 +00:00

Author	SHA1	Message	Date
MacRimi	5a116e77b9	Discord channel: split oversized digests across embeds (#220 ) A mass-backup webhook that exceeded ~2 KB used to be silently truncated by `desc = message[:MAX_EMBED_DESC]` with MAX_EMBED_DESC set to 2048 — half of Discord's real description limit and far below what a multi-VM backup digest produces. The trailing jobs just vanished from the channel. Bring the channel up to Discord's actual webhook contract: * description limit raised to the real 4096-char cap * if the body still doesn't fit, split it on line boundaries into one embed per chunk so every backup entry is preserved * keep title + fields on the first embed only; attach the footer and timestamp to the last embed so the rendered card has the normal head/tail framing even when split across many embeds * enforce Discord's 6000-char-per-embed cap (title + description + every field name+value) — only kicks in when many large fields combine with a chunk already near the description ceiling * batch up to 10 embeds per webhook POST (Discord's per-message limit) and POST additional messages sequentially with a 0.4 s gap so a >10-embed digest doesn't trip the 5/2 s webhook rate limit Verified with synthetic mass-backup payloads: * 14 KB / 200 jobs → 4 embeds, 1 POST * 60 KB / 60 lines → 15 embeds, 2 POSTs (10 + 5) New AppImage SHA-256: 16ad59ea63a64e5be460cd73f87315e8b39b756bf1c61f3cb2019e9fa3e76361 Closes #220. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-02 17:30:59 +02:00
MacRimi	17cae5d3a4	Refresh AppImage binary + sha256 after NVMe-obs-count fix New build picks up the get_disks_observation_counts NVMe-rename fix. SHA-256: 3b44eb1172b4b1b7e6a36d1c9f1cd5a237ec04d52543bb791358525b0653a402 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 23:52:39 +02:00
MacRimi	642bd8ecae	health_persistence: stop leaking obs counts across NVMe device renames `get_disks_observation_counts` maps each serial's count to that serial's "most recent" device_name (so renames like ata8 -> sdh keep the badge attached). When several physical disks have passed through the same kernel name across reboots — common with NVMe, the kernel probes in a different order depending on which slots are populated — disk_registry keeps a row per (device_name, serial) seen and the "most recent" device_name for a serial can now be in use by an entirely different disk. Concrete case from the wild: serial 211716800490 was nvme0n1 during the previous boot and earned a real I/O observation. After removing four of five NVMes, the surviving disk (serial 243332800236) booted into nvme0n1. The badge layer mirrored 211716800490's count onto nvme0n1 — which is now a different physical disk — and showed "1 obs." on the wrong drive, while the modal (which scopes by the current (device_name, serial) registry row) found nothing and rendered an empty history. Only mirror a serial's count onto its device_name when that device_name is currently owned by the same serial, determined from the freshest disk_registry row. The serial-keyed entry stays unconditional so observations remain reachable when the disk is re-plugged under another device name. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 23:52:11 +02:00
MacRimi	92385f44b0	Drop stale ProxMenux-1.2.0.AppImage binary The v1.2.0 binary lingered in the repo after later releases. Remove it so AppImage/ holds only the current shipping artefact. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 23:22:13 +02:00
MacRimi	4cd1cb4e39	Refresh AppImage binary + sha256 for v1.2.2 The tracked binary still pointed at the build made before the last two fixes landed (resolution_reason persistence in health_persistence and disk-temp breakdown alignment in storage-overview). Re-build the AppImage so the GitHub-published binary matches what is actually running on the deploy targets. New SHA-256: d043e2f27f21315931ab53d87f02390b1a66b0c1730e8b7699aafb565809efbb Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 23:21:24 +02:00
MacRimi	7a1fe0b0fc	storage-overview: drive disk-temp breakdown from configurable thresholds `getDiskHealthBreakdown` carried its own hardcoded ladder (HDD ≤45 normal, ≤55 warning) that was much stricter than the configurable defaults consumed by `getTempColor` via `useDiskTempThresholds` (HDD warn 60, hot 65). HDDs at 48 °C therefore rendered a green "Healthy 48°C" badge on the card but were tallied as "warning" in the top-of-page "X normal, Y warning, Z critical" summary, leaving the user with the misleading "6 normal, 5 warning" line. Use the same threshold map as the per-disk badge so the colour and the count are always consistent, and so Settings → Health Monitor Thresholds → Disk temperature actually applies to the breakdown. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 23:13:55 +02:00
MacRimi	3c5beb0286	Persist resolution_reason on resolve_error so the audit log is useful The UPDATE in `_resolve_error_impl` only touched `resolved_at` — the `reason` argument every caller passes was silently dropped, and the `resolution_reason` / `resolution_type` columns stayed NULL for every auto-resolved error. The columns were added back in a previous sprint for exactly this audit-log purpose, but the writer was never updated to populate them. Fix the SQL to write `resolution_reason = ?` and tag `resolution_type = COALESCE(existing, 'auto')` so admin-cleared errors (whose type is set elsewhere) keep their value while the default auto path correctly labels itself. Verified end-to-end on the lab host: re-injected the `disk_nvme2n1` warning, waited one scan cycle, the row now reads `resolution_type='auto'` and `resolution_reason='Transient I/O cleared, SMART now reports healthy'` — previously these columns stayed NULL even though the resolve_error call passed a descriptive reason. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 23:02:52 +02:00
MacRimi	9677c5cb19	Health Monitor: reconcile stale disk warnings across reboots When a host gets transient I/O events on a disk while smartctl is momentarily unavailable (the canonical case: late in a noisy shutdown), the disk-scan code records a `disk_<name>` WARNING tagged "SMART: unavailable" exactly once and trusts the next scan to clear it. That trust is misplaced: the clear path only fires when the device shows up in the current dmesg window with zero events. After a reboot, dmesg is empty for that device — so the device never gets iterated, resolve_error is never called, and the dashboard stays orange for a disk whose SMART now reports PASSED. Caught on a lab host where `disk_nvme2n1` had been stuck as WARNING for hours after a reboot. SMART was 100% healthy at the moment of inspection (Critical Warning 0x00, 0 media errors, 100% spare). The error's first_seen and last_seen were identical and pre-dated the current boot, confirming a one-shot record that nothing had cleared. Fix: add a `_reconcile_stale_disk_warnings()` pass at the top of `_check_disks_optimized()`. For every active `disk_` error (skipping `disk_fs_`, which is already reconciled separately): - device gone from /dev/ → resolve "Device no longer present" - device present + SMART PASSED → resolve "Transient I/O cleared, SMART now reports healthy" - device present + SMART UNKNOWN/FAILED → leave active so the main loop can re-classify on the next dmesg window Acknowledged errors are left alone so the user's explicit dismiss intent isn't overridden. Verified end-to-end: re-injected the original `disk_nvme2n1` warning into the persistence DB on the lab host, waited one scan cycle, error was resolved automatically with `resolved_at` set and `resolution_reason = 'Transient I/O cleared, SMART now reports healthy'`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 22:54:14 +02:00
MacRimi	d25faedc2b	Rebuild AppImage with actual Next.js 15.1.9 + always reconcile node_modules The previous bump commit (`2f24de25`) shipped a binary that still carried Next.js 15.1.6 in the bundled chunks even though AppImage/package.json was at 15.1.9. Root cause: build_appimage.sh only ran `npm install` when `node_modules` did not exist; on the .50 build host node_modules had been cached since the 1.2.1 build cycle, so the bump was silently ignored and the build re-used the stale tree. Fix the script: always run `npm install --legacy-peer-deps` on every build. npm reconciles against the lockfile in under a second when everything is already in sync, so the change is free on a warm tree and correct on a stale one. Rebuild from a clean node_modules on .50, redeploy to all four hosts (SHA 4602b8d4aa130c6f...), runtime grep confirms the bundle now contains 15.1.9 with no traces of 15.1.6 left. Same architecture and threat model as before — Flask serves the static export on :8008, no Next.js runtime — but the version banner now matches the lockfile. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 22:37:44 +02:00
MacRimi	2f24de2592	Bump Next.js to 15.1.9 + doc nav handles in-page anchors + help_info_menu Three changes that fold into the v1.2.2 release PR: 1. AppImage: bump Next.js 15.1.6 -> 15.1.9 (CVE-2025-55182) GHSA-9qr9-h5gf-34mp / React2Shell is a pre-auth RCE in React Server Components when Server Functions deserialize attacker payloads. The ProxMenux Monitor ships Next.js in `output: "export"` mode behind Flask on :8008, so there is no runtime Next.js server and no "use server" directive in the source tree — the exploitable path is not reachable. Bumping to 15.1.9 anyway because OpenVAS and similar scanners flag the version string from the JS bundle regardless of architecture; raising the floor removes false-positive noise across every install. Reported by @rost43 in #219. 2. web/components/ui/doc-navigation.tsx: handle sidebar entries that point to in-page anchors. The Storage Share Manager sidebar has entries for `/docs/storage-share#host` and `/docs/storage-share#lxc-net` as section headers, but usePathname() does not include the hash so every visit collapsed to the parent page. As a result Next/Previous on /docs/storage-share stayed stuck at #host, and Next from .../lxc-mount-points/ pointed back at #host instead of #lxc-net. Read window.location.hash on mount (and on hashchange) and try the pathname+hash match before falling back to the pathname-only lookup. SSR hydrates with an empty hash and refreshes once mounted — brief render before hydration is the same as the previous behaviour, so no regression. 3. scripts/help_info_menu.sh: user-side improvement (mirrored from develop). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 22:31:12 +02:00
MacRimi	4f3750a8ab	Fix CI: add pagefind to web devDeps + portable AppImage cache path Two regressions surfaced after the 1.2.2 release merge to main, both in workflows that auto-trigger on push to main: * Deploy to GitHub Pages — build failed with `pagefind: not found` (exit 127) after Next.js prerendered all 241 routes. pagefind was not declared in web/package.json; the local build only worked because the project root had its own package.json with pagefind as a devDep (the one we just gitignored in the previous commit). Add `pagefind: ^1.5.2` to web/package.json devDependencies and regenerate web/package-lock.json so `npm ci` in CI puts the binary at web/node_modules/.bin/pagefind. * Build ProxMenux Monitor AppImage — failed at the first step with `mkdir: cannot create directory '/var/cache/proxmenux-build': Permission denied`. The cache path was hardcoded to /var/cache/, which is writable when the script runs as root (the .50 host manual build) but not as the unprivileged GitHub Actions runner. Switch to `${XDG_CACHE_HOME:-$HOME/.cache}/proxmenux-build/` — works identically in both environments. Verified locally: `cd web && npm ci && npm run build` produces 2804 files in out/, 231 pages indexed by pagefind, root redirect intact. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-31 13:34:30 +02:00
MacRimi	964f2083b6	Merge main into develop to resolve conflicts before 1.2.2 release PR # Conflicts: # AppImage/ProxMenux-Monitor.AppImage.sha256 # LICENSE # install_proxmenux_beta.sh	2026-05-31 13:24:58 +02:00
MacRimi	9e8434a16b	Release 1.2.2 stable — consolidated v1.2.1.x cycle Promote the v1.2.1.x beta cycle to stable: version markers bumped from 1.2.1.4-beta to 1.2.2 across version.txt, AppImage/package.json, flask_server.py (3 places) and the four UI labels in login, proxmox-dashboard, storage-overview and release-notes-modal. Replace AppImage/ProxMenux-1.2.1.4-beta.AppImage with ProxMenux-1.2.2.AppImage and regenerate the .sha256 sidecar (097e2344675d4b21f1dd18c531c956c299a6507fbc3d0c9695418063581ba2b0). The new binary is verified on all 4 lab hosts (.50 / .55 / .89 / 1.10) — same sha, all services active, runtime version markers report 1.2.2. CHANGELOG["1.2.2"] in release-notes-modal.tsx consolidates every beta in the 1.2.1.x line (12 added / 13 changed / 18 fixed), and CURRENT_VERSION_FEATURES is rewritten with the four stable highlights: Health Monitor Thresholds, granular dismiss control (per-event duration + Active Suppressions panel), Apprise notification channel parity, and LXC update detection.	2026-05-31 13:15:39 +02:00
MacRimi	853bcbde35	Fix AppRise	2026-05-31 11:03:04 +02:00
MacRimi	2442ca63be	Update AppImage 1.2.1.4	2026-05-31 10:36:16 +02:00
MacRimi	91ded0125e	Update AppImage 1.2.1.4	2026-05-30 22:14:51 +02:00
MacRimi	4bf49675d2	Update ProxMenux 1.2.1.4-beta	2026-05-30 21:54:32 +02:00
MacRimi	fe1297936f	Update AppImage 1.2.1.3	2026-05-27 17:55:41 +02:00
MacRimi	a3aa5d9c1a	Update flask_server.py	2026-05-25 18:01:24 +02:00
MacRimi	b299227da2	Update AppImage 1.2.1.3	2026-05-24 17:52:04 +02:00
MacRimi	3286fc315c	Update AppImage 1.2.1.3	2026-05-24 16:42:44 +02:00
MacRimi	105576cf17	Update AppImage	2026-05-24 11:37:20 +02:00
MacRimi	4b934db7db	Update AppImage 1.2.1.3	2026-05-23 21:27:18 +02:00
MacRimi	f2a40b993a	Update AppImage 1.2.1.3	2026-05-22 18:47:30 +02:00
MacRimi	840385272c	Add ProxMenux beta 1.2.1.3	2026-05-22 18:24:03 +02:00
MacRimi	95d0667077	Update AppImage 1.2.1.2	2026-05-21 22:25:29 +02:00
MacRimi	56fac4c34b	Update AppImage 1.2.1.2	2026-05-21 22:00:35 +02:00
MacRimi	2d523b030f	Update AppImage 1.2.1.2	2026-05-21 21:41:27 +02:00
MacRimi	f5b7a0a74b	Update AppImage 1.2.1.2	2026-05-21 21:17:59 +02:00
MacRimi	3e9dd599a6	Update AppImage 1..2.1.2	2026-05-21 19:31:47 +02:00
MacRimi	9545587b67	Update AppImage 1.2.1.2	2026-05-21 17:24:09 +02:00
MacRimi	ef22c88861	Update AppImage 1.2.1.2	2026-05-21 17:18:23 +02:00
MacRimi	298cd2c6d4	Update Beta 1.2.1.2	2026-05-20 19:47:42 +02:00
MacRimi	4112323961	Update AppImage	2026-05-20 18:14:32 +02:00
MacRimi	73389d842a	Reset auth_fail cooldowns on NotificationManager.start() Pedro Rico, 19/05: after reinstalling the Monitor from GitHub a real SSH/web login failure went unnotified. Root cause was the auth_fail cooldown surviving across the service restart — install_proxmenux_beta extracts the new AppImage but leaves the notification_last_sent SQLite table intact (desirable: we don't want to lose legitimate cooldowns on every update). On startup `_load_cooldowns_from_db()` then loaded the stale auth_fail row from the previous run into the in-memory cache, and `_passes_cooldown` blocked the new event. This extends the existing reset-on-start mechanism (already in place for update_summary, proxmenux_update, post_install_update, …) to also clear auth_fail rows. A security-relevant event shouldn't be silenced because the same source IP happened to fail to log in yesterday. - Rename `_UPDATE_EVENT_TYPES_RESET_ON_START` → `_EVENT_TYPES_RESET_ON_START` (the list no longer covers only update-status reports). - Rename `_reset_update_cooldowns_on_start()` → `_reset_cooldowns_on_start()` for the same reason. - Add `'auth_fail'` to the curated list. High-frequency sources (log_critical_*, disk SMART errors, …) are deliberately NOT on this list — they keep their 24h cooldown across restarts to prevent inbox floods if the user toggles the service. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 01:18:49 +02:00
MacRimi	4e26c5942f	Reset update-type cooldowns on NotificationManager.start() When the user reinstalls or restarts the Monitor (deploy of a new beta AppImage), they expect to see a fresh "what's available now" summary in Telegram/Gotify/etc. instead of silence — even if the 24h anti-spam cooldown for `update_summary` etc. hasn't expired yet. Without this, the operator had to wait up to 24h after every deploy before the next `update_summary`, `proxmenux_update`, `post_install_update`, `pve_update`, `update_available`, `nvidia_driver_update_available` or `secure_gateway_update_available` notification fired. The 24h cooldown is the right default for steady state (don't pester the user every poll cycle with the same "177 packages pending" reminder), but a service restart is an explicit signal that the user wants a fresh status report. - New _UPDATE_EVENT_TYPES_RESET_ON_START tuple lists the event types to clear (everything in the "_update" + "update_" family). - New _reset_update_cooldowns_on_start() runs at start() right after the running flag flips, before watchers/dispatcher come up. - Patterns match both fingerprint shapes: "<host>:<entity>:<event_type>:" trailing-colon form "<host>:<entity>:<event_type>" no-suffix form (managed installs) - In-memory `_cooldowns` cache is also pruned so the live dispatcher picks up the reset immediately, without waiting for the next `_load_cooldowns_from_db()` cycle. Non-update cooldowns (auth_fail, log_critical_, disk errors, …) are preserved so a restart doesn't unleash a backlog of stale alerts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 01:03:44 +02:00
MacRimi	06e6ae417e	Fix notification dispatch: NameError in _dispatch_to_channels (Quiet Hours) `_dispatch_to_channels` does NOT receive the NotificationEvent object — only the rendered primitives (title, body, severity, event_type, …). The Quiet Hours + Daily Digest merge introduced two references to `event.severity` / `event` inside this function, which raised `NameError: name 'event' is not defined` for every event passing through dispatch. The dispatch loop swallows the exception with a broad `except`, so the visible symptom was "the Test button works but no real event ever arrives" — both for community beta users (multiple reports on Telegram, 9-18 May) and verified live on a test host (id 905 in notification_history confirms the pipeline post-fix). - _dispatch_to_channels: read `severity` / `event_type` directly instead of `event.severity` / `event.event_type`. - _should_buffer_for_digest: take (ch_name, severity, event_type) primitives instead of a NotificationEvent. - _buffer_digest_event: same — take (ch_name, event_type, event_group, severity, title, body). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 00:23:29 +02:00
MacRimi	6eb1312c61	1.2.1.1-beta: notification + LXC + post-install fixes - flask_notification_routes: PVE webhook X-Webhook-Secret written in standard base64 so PVE can decode it (GH #198) - notification_channels: Gmail SMTP App Password handling — normalize tls_mode (None/empty → starttls), reject creds without host (false- positive sendmail delivery), surface "AUTH not advertised" hint - notification_events: is_vzdump_active_on_host() reads /var/log/pve/ tasks/active directly so backup_start fallback and vm_shutdown suppression survive a Monitor restart mid-backup - notification_templates: extract --storage flag from vzdump log → "PBS-Cloud: vm/104/…" instead of generic "PBS:" prefix when multiple PBS endpoints exist - health_monitor: pve_storage_capacity + zfs_pool_capacity respect per-item dismiss (don't keep category WARNING/CRITICAL after user dismisses); updates_check cache invalidated when /var/log/apt/ history.log mtime advances - lxc_mount_points: PVE volume size from subvol quota (df via /proc/<host_pid>/root/<target> + lxc.conf size=NNNG fallback); host_source_state detects "host detached" zombie binds; per-mount subprocess work parallelised via ThreadPoolExecutor so a CT with many bind mounts doesn't trip the Caddy 3s reverse-proxy timeout - virtual-machines: "host detached" badge on bind mounts whose host source path disappeared - auto/customizable_post_install: log2ram FUNC_VERSION 1.1 → 1.2; new log2ram-check.sh vacuums journal + truncates non-rotating logs (pveproxy/access.log, pveam.log) instead of only calling `log2ram write` (which leaves the tmpfs full); auto flow gains the missing SystemMaxUse in /etc/systemd/journald.conf Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 00:06:49 +02:00
MacRimi	81844fa456	Update AppImage	2026-05-18 18:09:15 +02:00
MacRimi	4bedeb9fcd	Update appimage	2026-05-18 17:47:25 +02:00
jcastro	70ab072c79	Fix webhook loopback detection and update handoff	2026-05-14 14:33:27 +02:00
MacRimi	bd9af49412	Add ProxMenux-Monitor AppImage SHA256 file	2026-05-10 06:10:09 +02:00
MacRimi	ab5c7093eb	Update AppImage	2026-05-10 05:19:36 +02:00
MacRimi	b4e8c5101a	Update AppImage	2026-05-10 05:11:51 +02:00
MacRimi	911886b90c	Update AppImage	2026-05-10 05:00:00 +02:00
MacRimi	c14b72456f	Update AppImage	2026-05-10 04:46:33 +02:00
MacRimi	0288c14a29	Update AppImage	2026-05-09 23:22:45 +02:00
MacRimi	2f919de9e3	update beta ProxMenux 1.2.1.1-beta	2026-05-09 18:59:59 +02:00
MacRimi	5ed1fc44fd	Update 1.2.1.1-beta	2026-05-09 18:31:47 +02:00
github-actions[bot]	b8b49da99e	Update AppImage beta build (2026-04-21 19:30:40)	2026-04-21 19:30:40 +00:00

1 2 3 4 5 ...

1878 Commits