Project

General

Profile

action #97364

openqaworker-arm-2 and openqaworker-arm-3 seem to be offline, alerts had been triggered size:S

Added by okurz 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Target version:
Start date:
2021-08-23
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://monitor.qa.suse.de/alerting/list?state=not_ok
shows

[openqa] openqaworker-arm-2 online (long-time) alert
ALERTING for 4 days
Edit alert
[openqa] openqaworker-arm-3 online (long-time) alert
ALERTING for 5 days

Related issues

Related to openQA Infrastructure - action #97244: openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:MResolved2021-08-192021-09-17

History

#1 Updated by okurz 3 months ago

  • Related to action #97244: openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:M added

#2 Updated by nicksinger 3 months ago

  • Status changed from New to In Progress
  • Assignee set to nicksinger

#3 Updated by okurz 3 months ago

  • Subject changed from openqaworker-arm-2 and openqaworker-arm-3 seem to be offline, alerts had been triggered to openqaworker-arm-2 and openqaworker-arm-3 seem to be offline, alerts had been triggered size:S

discussed in daily, out of scope: changing automatic ticket creation. in scope: please take a short look into the pipeline why power cycling over gitlab did not work.

#4 Updated by nicksinger 3 months ago

  • Status changed from In Progress to Resolved

Both workers look good again after a manual reboot and show up as "online" in grafana: https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?orgId=1

Pipeline triggered for both machine. The execution for arm-2 failed due to some CI issue which I think is something we can't really change:

ERROR: Job failed (system failure): prepare environment: error sending request: Post "https://caasp-master.suse.de:6443/api/v1/namespaces/gitlab/pods/runner-h1wecofv-project-4652-concurrent-0l8dw9/attach?container=helper&stdin=true": dial tcp: lookup caasp-master.suse.de on [2620:113:80c0:8080:10:160:2:88]:53: server misbehaving. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information

(https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/536479#L13)

For arm-3 I created https://progress.opensuse.org/issues/97382 to fix our pipeline.

Also available in: Atom PDF