Project

General

Profile

Actions

action #159270

closed

QA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

QA - coordination #129280: [epic] Move from SUSE NUE1 (Maxtorhof) to new NBG Datacenters

openqaworker-arm-1 is Unreachable size:S

Added by ybonatakis 13 days ago. Updated 1 day ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-04-19
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

❯ ping openqaworker-arm-1.qe.nue2.suse.org
PING openqaworker-arm-1.qe.nue2.suse.org (10.168.192.213) 56(84) bytes of data.
From 81.95.8.245 icmp_seq=1 Destination Host Unreachable
From 81.95.8.245 icmp_seq=2 Destination Host Unreachable
From 81.95.8.245 icmp_seq=3 Destination Host Unreachable

graph shows that it went down at 2024-04-18 15:32:00
I think the most relevant graph is https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker-arm-1/worker-dashboard-openqaworker-arm-1?orgId=1&from=now-12h&to=now&viewPanel=65113
QA network infrastructure packet loss shows walter1.qe.nue2.suse.org 100 at 2024-04-18 15:19:00

Suggestions

  • Just recover the machine and ensure it's up again as alert mitigation

Out of scope


Related issues 4 (1 open3 closed)

Related to openQA Infrastructure - action #159303: [alert] osd-deployment pre-deploy pipeline failed because openqaworker-arm-1.qe.nue2.suse.org was offline size:SBlockednicksinger

Actions
Related to QA - action #157753: Bring back automatic recovery for openqaworker-arm-1 size:MResolvedybonatakis

Actions
Related to openQA Infrastructure - action #159318: openqa-piworker host up alertResolvednicksinger2023-08-09

Actions
Related to openQA Infrastructure - action #159555: IPMI access over IPv6 doesn't work on imagetester - try to update BIOS with physical access size:SResolvedokurz2024-04-24

Actions
Actions #2

Updated by tinita 13 days ago

  • Target version changed from Tools - Next to Ready
Actions #3

Updated by okurz 13 days ago

  • Related to action #159303: [alert] osd-deployment pre-deploy pipeline failed because openqaworker-arm-1.qe.nue2.suse.org was offline size:S added
Actions #4

Updated by okurz 13 days ago

  • Related to action #157753: Bring back automatic recovery for openqaworker-arm-1 size:M added
Actions #5

Updated by okurz 12 days ago

  • Subject changed from openqaworker-arm-1 is Unreachable to openqaworker-arm-1 is Unreachable size:S
  • Description updated (diff)
  • Status changed from New to Workable
  • Assignee set to ybonatakis
Actions #6

Updated by ybonatakis 12 days ago · Edited

  • Status changed from Workable to Feedback
Actions #7

Updated by ybonatakis 12 days ago

  • Status changed from Feedback to Resolved

Also possible to ssh into it.

Actions #8

Updated by okurz 10 days ago

  • Status changed from Resolved to In Progress

@ybonatakis you leaked the IPMI passwords in #159270-6. I deleted that comment. Now please update all IPMI passwords as documented in https://gitlab.suse.de/openqa/salt-pillars-openqa/#ipmi-passwords . Please use a pronouncable password. I suggest to think of a good password based on https://github.com/okurz/scripts/blob/master/xkcdpass-two-word

Actions #9

Updated by okurz 10 days ago

Actions #11

Updated by livdywan 9 days ago

  • Status changed from Feedback to In Progress

@ybonatakis Please rememember you need to address the urgency of the ticket or resolve it immediately.

Actions #13

Updated by livdywan 9 days ago

ybonatakis wrote in #note-12:

but https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/789 is still open

Did you see the unanswered questions there?

Actions #14

Updated by openqa_review 8 days ago

  • Due date set to 2024-05-08

Setting due date based on mean cycle time of SUSE QE Tools

Actions #15

Updated by okurz 8 days ago · Edited

I found the following entries with inconsistencies:

  1. malbec, you did not change the password, please revert
  2. imagetester, should be changed
  3. storage, should be changed
  4. kerosene, should be changed
  5. openqaworker{20..28}, should be changed
  6. openqaworker-arm{21..22}, should be changed
Actions #16

Updated by okurz 8 days ago

  • Related to action #159555: IPMI access over IPv6 doesn't work on imagetester - try to update BIOS with physical access size:S added
Actions #17

Updated by okurz 7 days ago

  • Parent task set to #129280
Actions #18

Updated by ybonatakis 7 days ago

  • Status changed from In Progress to Blocked
Actions #19

Updated by ybonatakis 7 days ago

  • Status changed from Blocked to In Progress

merged

Actions #20

Updated by ybonatakis 7 days ago

  • Due date deleted (2024-05-08)

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/794

Still waiting to get ssh access to update
openqaworker{20..28}, should be changed
openqaworker-arm{21..22}, should be changed

Actions #21

Updated by nicksinger 7 days ago

I addressed your question in https://suse.slack.com/archives/C02AJ1E568M/p1713963938332169?thread_ts=1713940157.475059&cid=C02AJ1E568M and deleted the stale alerts with:

sqlite3 /var/lib/grafana/grafana.db "$(for RULEID in DzAhcifVk dzA25mfVk Fk0h5iBVk Sk02ciBVk VzA2cif4zz; do echo -n "delete from alert_rule where uid = '$RULEID'; delete from alert_rule_version where rule_uid = '$RULEID'; delete from provenance_type where record_key = '$RULEID'; delete from annotation where text like '%$RULEID%';"; done)"
Actions #22

Updated by openqa_review 7 days ago

  • Due date set to 2024-05-09

Setting due date based on mean cycle time of SUSE QE Tools

Actions #23

Updated by ybonatakis 6 days ago

  • Status changed from In Progress to Feedback
  • Priority changed from Urgent to High

as the main reported issue has been resolved i lower the prority.
What remains to be done is to update the passwords on:
openqaworker{20..28} and openqaworker-arm{21..22}
My ssh keys are not yet in oqa-jumpy.dmz-prg2.suse.org so i will have to wait.

Actions #24

Updated by okurz 5 days ago

  • Status changed from Feedback to Workable

please create IT ticket about getting your ssh key deployed on oqa-jumpy

Actions #26

Updated by nicksinger 2 days ago

@ybonatakis could you please share the SD ticket with our "OSD Admins" group so we can see the progress? Also, can you maybe ask someone from the team to make the required changes? I don't think it is useful if you have to wait several days for a response. Especially for a "Workable" ticket with "High" Priority…

Actions #27

Updated by ybonatakis 1 day ago

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/800
the only one machine which password havent update is kerosene

Actions #28

Updated by ybonatakis 1 day ago

  • Status changed from Workable to Resolved
Actions #29

Updated by okurz 1 day ago

  • Due date deleted (2024-05-09)
Actions

Also available in: Atom PDF