action #120025
closed
QA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
QA - coordination #116623: [epic] Migration of SUSE Nbg based openQA+QA+QAM systems to new security zones
[openQA][ipmi][worker] Worker host hostname changed and broken networking connection
Added by waynechen55 over 1 year ago.
Updated over 1 year ago.
Description
Observation¶
All virtualization workers had names like grenache-1:x or openqaworker-2:y. But current openQA ipmi worker page shows irregular ipmi worker names f83:xxx as below:
It looks to me there is something wrong with worker host itself. I also double checked salt pillar repo on gitlab which does not have such worker host name, f83.
All all tests that run on these f83:xxx worker will fail, for example, failure1 and failure2.
Steps to reproduce¶
- Navitgate to openQA workers page and filter all ipmi worker out.
- Run a test with such worker
Impact¶
All tests run with these workers will definitely fail.
Problem¶
It seems there is something wrong with worker host itself.
Suggestion¶
- Check worker host networking environment
- Check ipmi workers config on worker host
Workaround¶
n/a
Files
We are currently in Beta1 testing phase, so these worker are crucial.
- Related to action #119443: Conduct the migration of SUSE openQA systems from Nbg SRV1 to new security zones size:M added
- Status changed from New to In Progress
- Assignee set to okurz
- Target version set to Ready
- Parent task set to #116623
- Status changed from In Progress to Resolved
f83 was a temporary name of openqaworker2 while it was being migrated into a new network security zone and was without proper hostname for some time. f83 will not mess with tests anymore.
dzedro wrote:
Not sure if the s390x failures are also related to the hostname change. https://openqa.suse.de/tests/9890767
Yes, same problem. Thank you for bringing this up and finding the right issue report :)
I fixed the config for WORKER_HOSTNAME on worker2 now and retriggered all according tests with:
WORKER=worker2 result="result='failed'" failed_since=2022-11-07 host=openqa.suse.de openqa-advanced-retrigger-jobs | tee -a worker2_restart_$(date +%F).log
WORKER=worker2 result="result='incomplete'" failed_since=2022-11-07 host=openqa.suse.de openqa-advanced-retrigger-jobs | tee -a worker2_restart_$(date +%F).log
WORKER=f83 result="result='failed'" failed_since=2022-11-07 host=openqa.suse.de openqa-advanced-retrigger-jobs | tee -a worker2_restart_$(date +%F).log
WORKER=f83 result="result='incomplete'" failed_since=2022-11-07 host=openqa.suse.de openqa-advanced-retrigger-jobs | tee -a worker2_restart_$(date +%F).log
I am monitoring tests as part of #119443
Also available in: Atom
PDF