Project

General

Profile

Actions

action #97502

closed

osd deployment failed due to openqaworker-arm-3 being down, needs to be worked around size:M

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2021-08-25
Due date:
% Done:

0%

Estimated time:

Description

Observation

#97244 is the real issue but we need to workaround.

Steps

  • Take out openqaworker-arm-3 from production
  • retrigger failed step from deployment
  • bring back openqaworker-arm-3 into production manually or mention in another ticket as step to be done

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #97244: openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:MResolveddheidler2021-08-192021-09-17

Actions
Actions #1

Updated by ilausuch over 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to ilausuch
Actions #2

Updated by okurz over 2 years ago

  • Description updated (diff)
Actions #3

Updated by ilausuch over 2 years ago

  • Status changed from In Progress to Blocked

I did the two first suggested steps and now the pipeline passes the failed step. https://gitlab.suse.de/openqa/osd-deployment/-/pipelines

In the moment of writting this comment openqaworker-arm-3 seems down. So the last step cannot be done until thhe worker come back to life. I change the status of this ticket to blocked and write a comment in the other ticket #97244

Actions #4

Updated by okurz over 2 years ago

  • Related to action #97244: openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:M added
Actions #5

Updated by okurz over 2 years ago

  • Priority changed from Urgent to Normal

thanks. This adressed the urgency hence reducing prio.

Actions #6

Updated by ilausuch over 2 years ago

I checked today, and the openqaworker-arm-3 seems thhat is still down

Actions #7

Updated by ilausuch over 2 years ago

  • Assignee deleted (ilausuch)

I removed myself from this ticket because I won't be working in this group for a period

Actions #8

Updated by okurz over 2 years ago

  • Status changed from Blocked to In Progress
  • Assignee set to okurz

after #97244 resolved we can continue here. I triggered a manual explicit pipeline run for openqaworker-arm-3 in https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/589516 which actually recovered the machine. Now I readd the salt key and apply the current high state.

Actions #9

Updated by okurz over 2 years ago

tests are correctly executed. https://monitor.qa.suse.de/d/WDopenqaworker-arm-3/worker-dashboard-openqaworker-arm-3?orgId=1&refresh=1m&from=now-24h&to=now shows some data now but https://monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?viewPanel=4&orgId=1 says that there is still no data. Restarted services on monitor.qa "influxdb grafana telegraf" but no change. I assume it's actually also here (same as in the other ticket where I struggled) the telegraf service on the webUI host as that one does the actual ping to each host. So I restarted the service on osd, let's see. If this works then I should ensure that the telegraf service restarts on config file changes because I assume that is the case for both the latest postgresql related changes as well as here for the host.

EDIT: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/580 created for telegraf service. https://monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?editPanel=4&viewPanel=4&orgId=1 shows data again same as https://monitor.qa.suse.de/d/4KkGdvvZk/osd-status-overview?orgId=1

Actions #10

Updated by okurz over 2 years ago

  • Status changed from In Progress to Feedback
Actions #11

Updated by okurz over 2 years ago

  • Status changed from Feedback to Resolved

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/580 merged and deployed. openqaworker-arm-3 is fully back in production and working on jobs, e.g. https://openqa.suse.de/tests/7160099

Actions

Also available in: Atom PDF