big job queue for ppc as powerqaworker-qam-1.qa and malbec.arch and qa-power8-5-kvm were not active
We have reached 10k scheduled jobs on osd in https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=1603848964424&to=1603970807972 . I don't know if this is good or bad :D
https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=1603963753871&to=1603970807972 shows that the main problem right now is ppc64le, also visible in https://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?orgId=1&fullscreen&panelId=4&from=1604107477948&to=1604135288558 .
- Due date set to 2020-11-03
- Status changed from In Progress to Feedback
First I called
power reset over IPMI for qa-power8-5-kvm.qa , then called
ipmi-fsp1-malbec.arch power reset, waited for malbec.arch to come up, ensure services are properly started and monitored until openQA jobs were picked up. For powerqaworker-qam-1 I also proceeded in #68053
- Due date deleted (
- Status changed from Feedback to In Progress
- Priority changed from Urgent to High
qa-power8-5-kvm.qa again showed problems, commented in https://progress.opensuse.org/issues/76792#change-346096 on what I did. The queue of jobs on https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test is near-empty now but again I can't reach powerqaworker-qam-1.qa right now and also not malbec.arch which https://stats.openqa-monitor.qa.suse.de/d/4KkGdvvZk/osd-status-overview confirms :(
Had a quick chat with nsinger, thanks for the quick reaction! nsinger confirmed the observation I had that when a machine is not reachable a
sol activate may not show anything which is likely when the machine crashed and just does not output anything anymore on the serial console.
Triggered a reset of malbec.arch and after boot did
sudo systemctl restart var-lib-openqa-share.mount on the machine. Machine is back and working on jobs. Did not care about powerqaworker-qam-1.qa for now.