action #159270
closedQA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
QA - coordination #129280: [epic] Move from SUSE NUE1 (Maxtorhof) to new NBG Datacenters
openqaworker-arm-1 is Unreachable size:S
0%
Description
Observation¶
❯ ping openqaworker-arm-1.qe.nue2.suse.org
PING openqaworker-arm-1.qe.nue2.suse.org (10.168.192.213) 56(84) bytes of data.
From 81.95.8.245 icmp_seq=1 Destination Host Unreachable
From 81.95.8.245 icmp_seq=2 Destination Host Unreachable
From 81.95.8.245 icmp_seq=3 Destination Host Unreachable
graph shows that it went down at 2024-04-18 15:32:00
I think the most relevant graph is https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker-arm-1/worker-dashboard-openqaworker-arm-1?orgId=1&from=now-12h&to=now&viewPanel=65113
QA network infrastructure packet loss shows walter1.qe.nue2.suse.org 100 at 2024-04-18 15:19:00
Suggestions¶
- Just recover the machine and ensure it's up again as alert mitigation
Out of scope¶
Updated by ybonatakis 13 days ago
And this also breaks https://gitlab.suse.de/openqa/osd-deployment/-/jobs/2510604
Updated by okurz 13 days ago
- Related to action #159303: [alert] osd-deployment pre-deploy pipeline failed because openqaworker-arm-1.qe.nue2.suse.org was offline size:S added
Updated by okurz 13 days ago
- Related to action #157753: Bring back automatic recovery for openqaworker-arm-1 size:M added
Updated by ybonatakis 12 days ago · Edited
- Status changed from Workable to Feedback
Updated by ybonatakis 12 days ago
- Status changed from Feedback to Resolved
Also possible to ssh into it.
Updated by okurz 10 days ago
- Status changed from Resolved to In Progress
@ybonatakis you leaked the IPMI passwords in #159270-6. I deleted that comment. Now please update all IPMI passwords as documented in https://gitlab.suse.de/openqa/salt-pillars-openqa/#ipmi-passwords . Please use a pronouncable password. I suggest to think of a good password based on https://github.com/okurz/scripts/blob/master/xkcdpass-two-word
Updated by okurz 10 days ago
- Related to action #159318: openqa-piworker host up alert added
Updated by ybonatakis 9 days ago
- Status changed from In Progress to Feedback
Updated by livdywan 9 days ago
- Status changed from Feedback to In Progress
@ybonatakis Please rememember you need to address the urgency of the ticket or resolve it immediately.
Updated by ybonatakis 9 days ago
but https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/789 is still open
Updated by openqa_review 8 days ago
- Due date set to 2024-05-08
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 8 days ago · Edited
I found the following entries with inconsistencies:
- malbec, you did not change the password, please revert
- imagetester, should be changed
- storage, should be changed
- kerosene, should be changed
- openqaworker{20..28}, should be changed
- openqaworker-arm{21..22}, should be changed
Updated by okurz 8 days ago
- Related to action #159555: IPMI access over IPv6 doesn't work on imagetester - try to update BIOS with physical access size:S added
Updated by ybonatakis 7 days ago
- Status changed from In Progress to Blocked
Updated by ybonatakis 7 days ago
- Due date deleted (
2024-05-08)
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/794
Still waiting to get ssh access to update
openqaworker{20..28}, should be changed
openqaworker-arm{21..22}, should be changed
Updated by nicksinger 7 days ago
I addressed your question in https://suse.slack.com/archives/C02AJ1E568M/p1713963938332169?thread_ts=1713940157.475059&cid=C02AJ1E568M and deleted the stale alerts with:
sqlite3 /var/lib/grafana/grafana.db "$(for RULEID in DzAhcifVk dzA25mfVk Fk0h5iBVk Sk02ciBVk VzA2cif4zz; do echo -n "delete from alert_rule where uid = '$RULEID'; delete from alert_rule_version where rule_uid = '$RULEID'; delete from provenance_type where record_key = '$RULEID'; delete from annotation where text like '%$RULEID%';"; done)"
Updated by openqa_review 7 days ago
- Due date set to 2024-05-09
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ybonatakis 6 days ago
- Status changed from In Progress to Feedback
- Priority changed from Urgent to High
as the main reported issue has been resolved i lower the prority.
What remains to be done is to update the passwords on:
openqaworker{20..28} and openqaworker-arm{21..22}
My ssh keys are not yet in oqa-jumpy.dmz-prg2.suse.org so i will have to wait.
Updated by ybonatakis 5 days ago
Updated by nicksinger 2 days ago
@ybonatakis could you please share the SD ticket with our "OSD Admins" group so we can see the progress? Also, can you maybe ask someone from the team to make the required changes? I don't think it is useful if you have to wait several days for a response. Especially for a "Workable" ticket with "High" Priority…
Updated by ybonatakis 1 day ago
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/800
the only one machine which password havent update is kerosene
Updated by ybonatakis 1 day ago
- Status changed from Workable to Resolved
kerosine was updated as well.
https://suse.slack.com/archives/C02AJ1E568M/p1714466396069729