Commit Graph

84 Commits

Author SHA1 Message Date
MacRimi 642bd8ecae health_persistence: stop leaking obs counts across NVMe device renames
`get_disks_observation_counts` maps each serial's count to that
serial's "most recent" device_name (so renames like ata8 -> sdh keep
the badge attached). When several physical disks have passed through
the same kernel name across reboots — common with NVMe, the kernel
probes in a different order depending on which slots are populated —
disk_registry keeps a row per (device_name, serial) seen and the
"most recent" device_name for a serial can now be in use by an
entirely different disk.

Concrete case from the wild: serial 211716800490 was nvme0n1 during
the previous boot and earned a real I/O observation. After removing
four of five NVMes, the surviving disk (serial 243332800236) booted
into nvme0n1. The badge layer mirrored 211716800490's count onto
nvme0n1 — which is now a different physical disk — and showed
"1 obs." on the wrong drive, while the modal (which scopes by the
current (device_name, serial) registry row) found nothing and
rendered an empty history.

Only mirror a serial's count onto its device_name when that
device_name is currently owned by the same serial, determined from
the freshest disk_registry row. The serial-keyed entry stays
unconditional so observations remain reachable when the disk is
re-plugged under another device name.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 23:52:11 +02:00
MacRimi 3c5beb0286 Persist resolution_reason on resolve_error so the audit log is useful
The UPDATE in `_resolve_error_impl` only touched `resolved_at` — the
`reason` argument every caller passes was silently dropped, and the
`resolution_reason` / `resolution_type` columns stayed NULL for every
auto-resolved error. The columns were added back in a previous sprint
for exactly this audit-log purpose, but the writer was never updated
to populate them.

Fix the SQL to write `resolution_reason = ?` and tag
`resolution_type = COALESCE(existing, 'auto')` so admin-cleared
errors (whose type is set elsewhere) keep their value while the
default auto path correctly labels itself.

Verified end-to-end on the lab host: re-injected the `disk_nvme2n1`
warning, waited one scan cycle, the row now reads
`resolution_type='auto'` and
`resolution_reason='Transient I/O cleared, SMART now reports healthy'`
— previously these columns stayed NULL even though the resolve_error
call passed a descriptive reason.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 23:02:52 +02:00
MacRimi 4bf49675d2 Update ProxMenux 1.2.1.4-beta 2026-05-30 21:54:32 +02:00
MacRimi 4112323961 Update AppImage 2026-05-20 18:14:32 +02:00
MacRimi 2f919de9e3 update beta ProxMenux 1.2.1.1-beta 2026-05-09 18:59:59 +02:00
MacRimi baa2ff4fa9 update health_persistence.py 2026-04-17 10:38:39 +02:00
MacRimi a776d6b746 Update health_persistence.py 2026-04-16 20:01:56 +02:00
MacRimi cf871da880 update health_persistence.py 2026-04-16 19:33:47 +02:00
MacRimi 6660122e69 Update health_persistence.py 2026-04-16 19:18:42 +02:00
MacRimi ee1204c566 update health_monitor.py 2026-04-16 19:10:47 +02:00
MacRimi 435f346d98 update notification_events.py 2026-04-09 14:08:56 +02:00
MacRimi 463769aba9 Update health_persistence.py 2026-04-07 15:34:48 +02:00
MacRimi adde2ce5b9 update health_persistence.py 2026-04-06 12:02:05 +02:00
MacRimi 95e876b37f update health_monitor.py 2026-04-05 12:17:42 +02:00
MacRimi e0e732dd2c update health_persistence.py 2026-04-04 01:31:37 +02:00
MacRimi d1efae37a4 update health_persistence.py 2026-04-04 00:51:20 +02:00
MacRimi 46e0322e6f update health_persistence.py 2026-04-01 13:01:48 +02:00
MacRimi 618538a854 update health_persistence.py 2026-04-01 12:30:19 +02:00
MacRimi d62396717a update health_persistence.py 2026-04-01 12:03:54 +02:00
MacRimi f98b302b94 Update health_persistence.py 2026-03-31 23:47:38 +02:00
MacRimi 215d36900a Update health_persistence.py 2026-03-31 23:20:33 +02:00
MacRimi e00051caa7 update health_monitor.py 2026-03-31 23:00:00 +02:00
MacRimi 5138b2f1d5 update health_persistence.py 2026-03-31 20:55:17 +02:00
MacRimi 80afa789e7 Update notification service 2026-03-30 22:26:20 +02:00
MacRimi 43f2ce52a5 Update notification service 2026-03-30 22:10:40 +02:00
MacRimi 6899650bf8 Update health_persistence.py 2026-03-30 21:19:08 +02:00
MacRimi c549737ad0 Update HealthMonitor 2026-03-30 20:52:25 +02:00
MacRimi 2fc5e2865d Update notification service 2026-03-30 19:55:19 +02:00
MacRimi 54eab9af49 Update notification service 2026-03-30 18:53:03 +02:00
MacRimi 2c363bbb8e Update health_persistence.py 2026-03-28 22:44:52 +01:00
MacRimi e7d3b20295 Update menus 2026-03-28 19:32:15 +01:00
MacRimi 264fa4982f Update notification service 2026-03-28 19:29:16 +01:00
MacRimi 4cc1147579 Update notification service 2026-03-27 20:42:03 +01:00
MacRimi 976f23a90e update notification service 2026-03-27 20:31:21 +01:00
MacRimi 8ed500adf7 Update notification service 2026-03-27 20:17:59 +01:00
MacRimi 6bb9313b95 Update notification service 2026-03-27 19:15:11 +01:00
MacRimi 7c5e7208b9 Update notification service 2026-03-26 20:04:53 +01:00
MacRimi 0b3624dbd5 Update notification service 2026-03-26 00:32:52 +01:00
MacRimi ba4e3c3adb Update notification service 2026-03-25 23:45:34 +01:00
MacRimi 66892f69ce Update notification service 2026-03-25 23:30:00 +01:00
MacRimi cdc2d7bbcb update notification service 2026-03-25 23:13:11 +01:00
MacRimi 68872d0e06 Update notification service 2026-03-25 20:12:08 +01:00
MacRimi d53c1dc402 Utdate notification service 2026-03-25 19:47:47 +01:00
MacRimi 876194cdc8 update storage settings 2026-03-19 19:07:26 +01:00
MacRimi f74d336072 Update health_persistence.py 2026-03-16 09:48:58 +01:00
MacRimi 7375e306fb Update health_persistence.py 2026-03-15 18:27:55 +01:00
MacRimi 785d58cb59 Update health monitor 2026-03-15 18:12:42 +01:00
MacRimi 2c80223fc4 Update health_persistence.py 2026-03-15 10:28:53 +01:00
MacRimi 59a578fb2d Update health_persistence.py 2026-03-15 10:23:38 +01:00
MacRimi 91c3f3520b Update health_persistence.py 2026-03-15 10:13:44 +01:00