Project

General

Profile

Actions

action #76828

closed

big job queue for ppc as powerqaworker-qam-1.qa and malbec.arch and qa-power8-5-kvm were not active

Added by okurz about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2020-10-31
Due date:
% Done:

0%

Estimated time:


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet)Resolvednicksinger2020-10-202020-11-17

Actions
Actions #1

Updated by okurz about 4 years ago

  • Related to action #73633: OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet) added
Actions #2

Updated by okurz about 4 years ago

  • Due date set to 2020-11-03
  • Status changed from In Progress to Feedback

First I called power reset over IPMI for qa-power8-5-kvm.qa , then called ipmi-fsp1-malbec.arch power reset, waited for malbec.arch to come up, ensure services are properly started and monitored until openQA jobs were picked up. For powerqaworker-qam-1 I also proceeded in #68053

Actions #3

Updated by okurz about 4 years ago

  • Due date deleted (2020-11-03)
  • Status changed from Feedback to In Progress
  • Priority changed from Urgent to High

qa-power8-5-kvm.qa again showed problems, commented in https://progress.opensuse.org/issues/76792#change-346096 on what I did. The queue of jobs on https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test is near-empty now but again I can't reach powerqaworker-qam-1.qa right now and also not malbec.arch which https://stats.openqa-monitor.qa.suse.de/d/4KkGdvvZk/osd-status-overview confirms :(

Had a quick chat with nsinger, thanks for the quick reaction! nsinger confirmed the observation I had that when a machine is not reachable a sol activate may not show anything which is likely when the machine crashed and just does not output anything anymore on the serial console.

Triggered a reset of malbec.arch and after boot did sudo systemctl restart var-lib-openqa-share.mount on the machine. Machine is back and working on jobs. Did not care about powerqaworker-qam-1.qa for now.

Actions #4

Updated by okurz about 4 years ago

  • Status changed from In Progress to Resolved

The job queue has decreased enough that this isn't a problem anymore. Currently malbec.arch is still up as well as qa-power8-5-kvm.qa and for powerqaworker-qam-1.qa we have our own ticket anyway.

Actions

Also available in: Atom PDF