Commit Graph

115 Commits

Author SHA1 Message Date
MacRimi 9677c5cb19 Health Monitor: reconcile stale disk warnings across reboots
When a host gets transient I/O events on a disk while smartctl is
momentarily unavailable (the canonical case: late in a noisy
shutdown), the disk-scan code records a `disk_<name>` WARNING tagged
"SMART: unavailable" exactly once and trusts the next scan to clear
it. That trust is misplaced: the clear path only fires when the
device shows up in the current dmesg window with zero events. After
a reboot, dmesg is empty for that device — so the device never gets
iterated, resolve_error is never called, and the dashboard stays
orange for a disk whose SMART now reports PASSED.

Caught on a lab host where `disk_nvme2n1` had been stuck as WARNING
for hours after a reboot. SMART was 100% healthy at the moment of
inspection (Critical Warning 0x00, 0 media errors, 100% spare). The
error's first_seen and last_seen were identical and pre-dated the
current boot, confirming a one-shot record that nothing had cleared.

Fix: add a `_reconcile_stale_disk_warnings()` pass at the top of
`_check_disks_optimized()`. For every active `disk_*` error
(skipping `disk_fs_*`, which is already reconciled separately):

  - device gone from /dev/   → resolve "Device no longer present"
  - device present + SMART PASSED → resolve "Transient I/O cleared,
    SMART now reports healthy"
  - device present + SMART UNKNOWN/FAILED → leave active so the
    main loop can re-classify on the next dmesg window

Acknowledged errors are left alone so the user's explicit dismiss
intent isn't overridden.

Verified end-to-end: re-injected the original `disk_nvme2n1`
warning into the persistence DB on the lab host, waited one scan
cycle, error was resolved automatically with `resolved_at` set and
`resolution_reason = 'Transient I/O cleared, SMART now reports
healthy'`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 22:54:14 +02:00
MacRimi 4bf49675d2 Update ProxMenux 1.2.1.4-beta 2026-05-30 21:54:32 +02:00
MacRimi f2a40b993a Update AppImage 1.2.1.3 2026-05-22 18:47:30 +02:00
MacRimi 6eb1312c61 1.2.1.1-beta: notification + LXC + post-install fixes
- flask_notification_routes: PVE webhook X-Webhook-Secret written in
  standard base64 so PVE can decode it (GH #198)
- notification_channels: Gmail SMTP App Password handling — normalize
  tls_mode (None/empty → starttls), reject creds without host (false-
  positive sendmail delivery), surface "AUTH not advertised" hint
- notification_events: is_vzdump_active_on_host() reads /var/log/pve/
  tasks/active directly so backup_start fallback and vm_shutdown
  suppression survive a Monitor restart mid-backup
- notification_templates: extract --storage flag from vzdump log →
  "PBS-Cloud: vm/104/…" instead of generic "PBS:" prefix when multiple
  PBS endpoints exist
- health_monitor: pve_storage_capacity + zfs_pool_capacity respect
  per-item dismiss (don't keep category WARNING/CRITICAL after user
  dismisses); updates_check cache invalidated when /var/log/apt/
  history.log mtime advances
- lxc_mount_points: PVE volume size from subvol quota (df via
  /proc/<host_pid>/root/<target> + lxc.conf size=NNNG fallback);
  host_source_state detects "host detached" zombie binds; per-mount
  subprocess work parallelised via ThreadPoolExecutor so a CT with
  many bind mounts doesn't trip the Caddy 3s reverse-proxy timeout
- virtual-machines: "host detached" badge on bind mounts whose host
  source path disappeared
- auto/customizable_post_install: log2ram FUNC_VERSION 1.1 → 1.2; new
  log2ram-check.sh vacuums journal + truncates non-rotating logs
  (pveproxy/access.log, pveam.log) instead of only calling
  `log2ram write` (which leaves the tmpfs full); auto flow gains the
  missing SystemMaxUse in /etc/systemd/journald.conf

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 00:06:49 +02:00
MacRimi b4e8c5101a Update AppImage 2026-05-10 05:11:51 +02:00
MacRimi 911886b90c Update AppImage 2026-05-10 05:00:00 +02:00
MacRimi 2f919de9e3 update beta ProxMenux 1.2.1.1-beta 2026-05-09 18:59:59 +02:00
MacRimi 039e35f3c5 update health_monitor.py 2026-04-17 16:39:08 +02:00
MacRimi baa2ff4fa9 update health_persistence.py 2026-04-17 10:38:39 +02:00
MacRimi ee1204c566 update health_monitor.py 2026-04-16 19:10:47 +02:00
MacRimi 2b8caa924f update notification_events.py 2026-04-09 12:34:03 +02:00
MacRimi adde2ce5b9 update health_persistence.py 2026-04-06 12:02:05 +02:00
MacRimi 95e876b37f update health_monitor.py 2026-04-05 12:17:42 +02:00
MacRimi e7dc030304 Update health_monitor.py 2026-04-05 12:02:59 +02:00
MacRimi 4b01ba1d2f update health_monitor.py 2026-04-05 11:58:14 +02:00
MacRimi e9851da12f update virtual-machines.tsx 2026-04-05 11:51:26 +02:00
MacRimi e0e732dd2c update health_persistence.py 2026-04-04 01:31:37 +02:00
MacRimi c2073a5db5 Update health_monitor.py 2026-04-01 15:24:47 +02:00
MacRimi d62396717a update health_persistence.py 2026-04-01 12:03:54 +02:00
MacRimi a734fa5566 Update health_monitor.py 2026-04-01 00:01:12 +02:00
MacRimi 2df55d2839 Update health_monitor.py 2026-03-31 23:14:48 +02:00
MacRimi e00051caa7 update health_monitor.py 2026-03-31 23:00:00 +02:00
MacRimi 80afa789e7 Update notification service 2026-03-30 22:26:20 +02:00
MacRimi c549737ad0 Update HealthMonitor 2026-03-30 20:52:25 +02:00
MacRimi 2fc5e2865d Update notification service 2026-03-30 19:55:19 +02:00
MacRimi d628233982 Update notification service 2026-03-28 15:50:30 +01:00
MacRimi 0edc2cc3af Update notification service 2026-03-27 19:40:17 +01:00
MacRimi 6bb9313b95 Update notification service 2026-03-27 19:15:11 +01:00
MacRimi 839a20df97 Update notification service 2026-03-26 19:05:11 +01:00
MacRimi 8b6755d866 Update notification service 2026-03-25 22:43:42 +01:00
MacRimi bcacd8b98e Update notification service 2026-03-24 17:48:52 +01:00
MacRimi d2c8178772 Update notification service 2026-03-24 17:34:05 +01:00
MacRimi d34cebc90d Update health_monitor.py 2026-03-23 20:25:27 +01:00
MacRimi c7ef51a73c Update notification service 2026-03-23 20:14:25 +01:00
MacRimi ab34fb08c1 Update health_monitor.py 2026-03-23 19:31:21 +01:00
MacRimi 4ac71381da Update health_monitor.py 2026-03-23 18:08:22 +01:00
MacRimi 04564bc9cf Update health_monitor.py 2026-03-22 14:57:46 +01:00
MacRimi d33741a90d Update notification service 2026-03-22 14:20:47 +01:00
MacRimi 7838762a4e Update health_monitor.py 2026-03-21 23:19:41 +01:00
MacRimi 876194cdc8 update storage settings 2026-03-19 19:07:26 +01:00
MacRimi 69e0bfe89a update notification service 2026-03-18 17:48:02 +01:00
MacRimi 6aaaa910af Update health_monitor.py 2026-03-16 09:36:40 +01:00
MacRimi 785d58cb59 Update health monitor 2026-03-15 18:12:42 +01:00
MacRimi af61d145da Update oci manager 2026-03-15 17:59:47 +01:00
MacRimi 9112bcc52f Update health_monitor.py 2026-03-15 10:54:37 +01:00
MacRimi e534cffcf7 Update health_monitor.py 2026-03-15 10:41:34 +01:00
MacRimi a184dcc38f Update health_monitor.py 2026-03-15 10:36:19 +01:00
MacRimi e169200f40 Update health monitor 2026-03-15 10:03:35 +01:00
MacRimi 6d4006fd93 update oci manager 2026-03-12 22:13:56 +01:00
MacRimi b4a2e5ee11 Create oci manager 2026-03-12 21:30:44 +01:00