Project

General

Profile

action #162296

Updated by okurz about 2 months ago

## Observation 
 Observed on w31+w32 that upgraded themselves to Leap 15.6 and then crashed multiple times after booting into kernel 6.4 after a waiting time of 10-20m after boot. 

 ## Acceptance criteria 
 * **AC1:** OSD openQA workers can run stable with Leap 15.6 (package locks on reported issues allowed) 
 * **AC2:** `ssh osd 'sudo salt \* cmd.run "zypper ll | grep \"\(162296\|1227616\)\""'` is empty 
 * **AC3:** ssh o3 'hosts="openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26 openqaworker27 openqaworker28 openqaworker-arm21 openqaworker-arm22 qa-power8-3"; for i in $hosts; do echo "### $i" && ssh root@$i "zypper ll" ; done' lists no firewall package locks anymore 

 ## Suggestions 
 * Temporarily upgrade selected machines to Leap 15.6 with old kernel or vice versa, just kernel 6.4, try to get the system to work in a stable manner 
 * Optional: Look into the crash files on w31 in /root/crash-2024-06-14/ 

 ## Rollback actions 
 * Remove alert silence from https://monitor.qa.suse.de/alerting/silences?alertmanager=grafana with name `alertname=Failed systemd services alert (except openqa.suse.de)` 
 * Add worker39 back to salt

Back