Actions
action #97364
closedopenqaworker-arm-2 and openqaworker-arm-3 seem to be offline, alerts had been triggered size:S
Start date:
2021-08-23
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
https://monitor.qa.suse.de/alerting/list?state=not_ok
shows
[openqa] openqaworker-arm-2 online (long-time) alert
ALERTING for 4 days
Edit alert
[openqa] openqaworker-arm-3 online (long-time) alert
ALERTING for 5 days
Updated by okurz about 3 years ago
- Related to action #97244: openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:M added
Updated by nicksinger about 3 years ago
- Status changed from New to In Progress
- Assignee set to nicksinger
Updated by okurz about 3 years ago
- Subject changed from openqaworker-arm-2 and openqaworker-arm-3 seem to be offline, alerts had been triggered to openqaworker-arm-2 and openqaworker-arm-3 seem to be offline, alerts had been triggered size:S
discussed in daily, out of scope: changing automatic ticket creation. in scope: please take a short look into the pipeline why power cycling over gitlab did not work.
Updated by nicksinger about 3 years ago
- Status changed from In Progress to Resolved
Both workers look good again after a manual reboot and show up as "online" in grafana: https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?orgId=1
Pipeline triggered for both machine. The execution for arm-2 failed due to some CI issue which I think is something we can't really change:
ERROR: Job failed (system failure): prepare environment: error sending request: Post "https://caasp-master.suse.de:6443/api/v1/namespaces/gitlab/pods/runner-h1wecofv-project-4652-concurrent-0l8dw9/attach?container=helper&stdin=true": dial tcp: lookup caasp-master.suse.de on [2620:113:80c0:8080:10:160:2:88]:53: server misbehaving. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
(https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/536479#L13)
For arm-3 I created https://progress.opensuse.org/issues/97382 to fix our pipeline.
Updated by jbaier_cz over 2 years ago
- Related to action #113561: failed pipelines for openQABot and bot-ng because of an expired cert added
Actions