Actions
action #68050
closedopenqaworker3 fails to come up on reboot, openqa_nvme_format.service failed
Start date:
2020-06-14
Due date:
2020-07-07
% Done:
0%
Estimated time:
Updated by okurz over 4 years ago
- Copied to action #68053: powerqaworker-qam-1 fails to come up on reboot (repeatedly) added
Updated by okurz over 4 years ago
- Status changed from New to Workable
I could recover by calling mdadm --stop /dev/md127
and exiting the emergency mode from where the boot continued. We should crosscheck the config in /etc/mdadm.conf
Updated by okurz over 4 years ago
- Status changed from Workable to Feedback
- Assignee set to okurz
- Priority changed from Urgent to Normal
I did mdadm --detail --scan >> /etc/mdadm.conf
and adjusted the entries manually so that the / fs raid is preserved.
Updated by okurz over 4 years ago
removed the worker machine's salt key for now to fix osd deployment. I am not yet sure if the above is the right fix, particularly because we don't include that in salt. I will test more and multiple reboots.
Updated by okurz over 4 years ago
Updated by okurz over 4 years ago
- Status changed from Feedback to Resolved
merged the MR, was applied to all workers. Brought back openqaworker3 with
sudo systemctl unmask openqa-worker.target salt-minion telegraf
sudo systemctl enable --now openqa-worker.target salt-minion telegraf
on openqaworker3 and on osd
sudo salt-key -y -A openqaworker3\*
sudo salt -l error --state-output=changes -C 'G@roles:worker and openqaworker3*' state.apply
Updated by okurz over 4 years ago
- Status changed from Resolved to In Progress
openqaworker3 again stuck in openqa_nvme_format.service . Working on it again
Updated by okurz over 4 years ago
- Due date set to 2020-07-07
- Status changed from In Progress to Feedback
Updated by okurz over 4 years ago
- Status changed from Feedback to Resolved
All workers rebooted fine over the weekend, no failed services.
Updated by okurz about 4 years ago
- Related to action #78010: unreliable reboots on openqaworker3, likely due do openqa_nvme_format (was: [alert] PROBLEM Host Alert: openqaworker3.suse.de is DOWN) added
Actions