Actions
action #97502
closedosd deployment failed due to openqaworker-arm-3 being down, needs to be worked around size:M
Start date:
2021-08-25
Due date:
% Done:
0%
Estimated time:
Description
Actions
Added by okurz over 2 years ago. Updated over 2 years ago.
0%
Description
I did the two first suggested steps and now the pipeline passes the failed step. https://gitlab.suse.de/openqa/osd-deployment/-/pipelines
In the moment of writting this comment openqaworker-arm-3 seems down. So the last step cannot be done until thhe worker come back to life. I change the status of this ticket to blocked and write a comment in the other ticket #97244
thanks. This adressed the urgency hence reducing prio.
I checked today, and the openqaworker-arm-3 seems thhat is still down
I removed myself from this ticket because I won't be working in this group for a period
after #97244 resolved we can continue here. I triggered a manual explicit pipeline run for openqaworker-arm-3 in https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/589516 which actually recovered the machine. Now I readd the salt key and apply the current high state.
tests are correctly executed. https://monitor.qa.suse.de/d/WDopenqaworker-arm-3/worker-dashboard-openqaworker-arm-3?orgId=1&refresh=1m&from=now-24h&to=now shows some data now but https://monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?viewPanel=4&orgId=1 says that there is still no data. Restarted services on monitor.qa "influxdb grafana telegraf" but no change. I assume it's actually also here (same as in the other ticket where I struggled) the telegraf service on the webUI host as that one does the actual ping to each host. So I restarted the service on osd, let's see. If this works then I should ensure that the telegraf service restarts on config file changes because I assume that is the case for both the latest postgresql related changes as well as here for the host.
EDIT: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/580 created for telegraf service. https://monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?editPanel=4&viewPanel=4&orgId=1 shows data again same as https://monitor.qa.suse.de/d/4KkGdvvZk/osd-status-overview?orgId=1
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/580 merged and deployed. openqaworker-arm-3 is fully back in production and working on jobs, e.g. https://openqa.suse.de/tests/7160099