action #120025
closedQA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
QA - coordination #116623: [epic] Migration of SUSE Nbg based openQA+QA+QAM systems to new security zones
[openQA][ipmi][worker] Worker host hostname changed and broken networking connection
0%
Description
Observation¶
All virtualization workers had names like grenache-1:x or openqaworker-2:y. But current openQA ipmi worker page shows irregular ipmi worker names f83:xxx as below:
It looks to me there is something wrong with worker host itself. I also double checked salt pillar repo on gitlab which does not have such worker host name, f83.
All all tests that run on these f83:xxx worker will fail, for example, failure1 and failure2.
Steps to reproduce¶
- Navitgate to openQA workers page and filter all ipmi worker out.
- Run a test with such worker
Impact¶
All tests run with these workers will definitely fail.
Problem¶
It seems there is something wrong with worker host itself.
Suggestion¶
- Check worker host networking environment
- Check ipmi workers config on worker host
Workaround¶
n/a
Files
Updated by waynechen55 almost 2 years ago
We are currently in Beta1 testing phase, so these worker are crucial.
Updated by dzedro almost 2 years ago
Not sure if the s390x failures are also related to the hostname change. https://openqa.suse.de/tests/9890767
Updated by waynechen55 almost 2 years ago
- File ipmi_worker_01.png ipmi_worker_01.png added
- File ipmi_worker_02.png ipmi_worker_02.png added
Updated by okurz almost 2 years ago
- Related to action #119443: Conduct the migration of SUSE openQA systems from Nbg SRV1 to new security zones size:M added
Updated by okurz almost 2 years ago
- Status changed from New to In Progress
- Assignee set to okurz
- Target version set to Ready
- Parent task set to #116623
Working on that, related to #119443. I mentioned in the Slack chat channel #discuss-qe-new-security-zones
@Lazaros Haleplidis https://progress.opensuse.org/issues/120025 mentions problems of openQA tests failing to access our bare metal test hosts, i.e. all hosts mentioned in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls. Just one example: sp.fozzie.qa.suse.de over IPMI
f83 was a temporary name of openqaworker2 while it was being migrated into a new network security zone and was without proper hostname for some time.
Updated by okurz almost 2 years ago
- Status changed from In Progress to Resolved
f83 was a temporary name of openqaworker2 while it was being migrated into a new network security zone and was without proper hostname for some time. f83 will not mess with tests anymore.
dzedro wrote:
Not sure if the s390x failures are also related to the hostname change. https://openqa.suse.de/tests/9890767
Yes, same problem. Thank you for bringing this up and finding the right issue report :)
I fixed the config for WORKER_HOSTNAME on worker2 now and retriggered all according tests with:
WORKER=worker2 result="result='failed'" failed_since=2022-11-07 host=openqa.suse.de openqa-advanced-retrigger-jobs | tee -a worker2_restart_$(date +%F).log
WORKER=worker2 result="result='incomplete'" failed_since=2022-11-07 host=openqa.suse.de openqa-advanced-retrigger-jobs | tee -a worker2_restart_$(date +%F).log
WORKER=f83 result="result='failed'" failed_since=2022-11-07 host=openqa.suse.de openqa-advanced-retrigger-jobs | tee -a worker2_restart_$(date +%F).log
WORKER=f83 result="result='incomplete'" failed_since=2022-11-07 host=openqa.suse.de openqa-advanced-retrigger-jobs | tee -a worker2_restart_$(date +%F).log
I am monitoring tests as part of #119443