action #16512
closedtimeout while uploading the logs - test fails in install_and_reboot
0%
Description
Observation¶
openQA test in scenario sle-12-SP3-Server-DVD-HA-s390x-ha-ftp@s390x-zVM-vswitch-l3 fails in
install_and_reboot
- ON the step of uploading the logs after installation the test dies because of timeout. From console its visible that there is some connectivity issue.
Reproducible¶
Fails since (at least) Build 0064@0234
Expected result¶
Last good: 0063@0232 (or more recent)
Problem¶
H1. ACCEPTED: worker specific problem -> E1-1
H2. REJECTED: generic problem in the product failing every time -> E2-1
H3. REJECTED: specific problem in the product failing sometimes under special circumstances
H4. REJECTED: os-autoinst problem
Suggestions¶
E1-1. check for differences on worker -> O1-1: openqaworker2 ran all jobs that seem to have this problem, and its network seems to have problem -> ACCEPT
E2-1. reproduce locally -> job passed, http://lord.arch/tests/5619#step/install_and_reboot/17 -> REJECT
E3-1. start job of old build in an environment where it fails often -> REJECT
E4-1. check os-autoinst version -> REJECT
Further details¶
Always latest result in this scenario: latest
Updated by okurz about 7 years ago
- Description updated (diff)
- Assignee set to okurz
- Priority changed from Normal to Urgent
staging tests are also affected:
install_and_reboot (https://v.gd/t760741, https://v.gd/t760742, https://v.gd/t760778, https://v.gd/t761038, https://v.gd/t760800)
Cloning latest failing staging job to local does not fail in the same step:
http://lord.arch/tests/5619#step/install_and_reboot/17
Updated by okurz about 7 years ago
- Description updated (diff)
- Status changed from New to In Progress
originally mentioned job fails: https://openqa.suse.de/tests/760747#previous
and two other jobs of the same build also fail so it is reproducible
-> E3-1: Triggered s390x job https://openqa.suse.de/tests/761579#live of an old build where it was working before. Let's wait for the result and see if it fails as well (supporting H1+H4) or if it passes (supporting H3)
And openqaworker2 is not even reachable over ping. mgriessmeier and me try https://wiki.microfocus.net/index.php/OpenQA#IPMI.2FSerial_Connections
Updated by okurz about 7 years ago
- Description updated (diff)
updated my os-autoinst
on lord.arch 14dc132..945efee and rerunning again: http://lord.arch/tests/5627#live
Also, openqaworker2 seems to be the culprit here, discussion in #infra ongoing…
Updated by okurz about 7 years ago
- Related to action #12912: [tools]monitoring of o3/osd added
Updated by mgriessmeier about 7 years ago
- Description updated (diff)
- Status changed from In Progress to Resolved
Problem was that infra changed the network config of all workers (removing bond) but our salt recipes were not adjusted for all of our workers and did overwrite the ifcfg to use bond and therefore openqaworker2 got a new dhcp name and wasn't able any longer to be reachable and to upload logs
fixed staging test on openqaworker2: https://openqa.suse.de/tests/761627#