action #19780
closed[sle][functional][aarch64][x86_64] test fails in yast2_lan and all upcoming modules, investigate failure, check worker logs
0%
Description
Observation¶
Something terribly bad happened during execution, need to check if that was problem with a worker, or any product degradation.
openQA test in scenario sle-12-SP3-Server-DVD-aarch64-sdk+allpatterns@aarch64 fails in
yast2_lan
Reproducible¶
Fails since (at least) Build 0424 (current job)
Expected result¶
Last good: 0422 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by riafarov over 7 years ago
- Subject changed from [sle][functional][aarch64] test fails in yast2_lan and all upcoming modules, investigate failure, checl worker logs to [sle][functional][aarch64] test fails in yast2_lan and all upcoming modules, investigate failure, check worker logs
No info in journal for this period of time. sar not configured to have better logs. Worker restarted at around 1pm on 13.06
Updated by riafarov over 7 years ago
- Status changed from New to Feedback
Monitoring status of the issue.
Updated by okurz over 7 years ago
- Subject changed from [sle][functional][aarch64] test fails in yast2_lan and all upcoming modules, investigate failure, check worker logs to [sle][functional][aarch64][x86_64] test fails in yast2_lan and all upcoming modules, investigate failure, check worker logs
- Priority changed from Normal to High
https://openqa.suse.de/tests/1019626#step/yast2_lan/2 and the following steps could be the same. One module fails to complete a script call and then all following modules never successfully execute. The commands show up in the SUT so it's not fully unresponsive. Also https://openqa.suse.de/tests/1019626#step/curl_https/12 shows a "segmentation fault". The worker instance https://openqa.suse.de/admin/workers/369 showed up to be fine for some time except this incomplete. This could be a severe product issue but to report anything useful we need to find more related failures or a way to reproduce.
Updated by SLindoMansilla over 7 years ago
newer cases of unresponsive system on aarch64 between snapshots:
Updated by riafarov over 7 years ago
Similar problem on another worker: https://openqa.suse.de/tests/1054061
Updated by riafarov over 7 years ago
Similar issue:
https://openqa.suse.de/tests/1057968
Fishy lines in logs:
09:27:58.7276 3160 EVENT {"timestamp":{"seconds":1499847122,"microseconds":559018},"data":{"server":{"host":"0.0.0.0","auth":"none","family":"ipv4","websocket":false,"service":"5994"},"client":{"host":"127.0.0.1","service":"47128","websocket":false,"family":"ipv4"}},"event":"VNC_INITIALIZED"}
09:27:58.7278 3160 EVENT {"event":"VNC_DISCONNECTED","data":{"server":{"websocket":false,"family":"ipv4","service":"5994","auth":"none","host":"0.0.0.0"},"client":{"host":"127.0.0.1","service":"47128","family":"ipv4","websocket":false}},"timestamp":{"seconds":1499847222,"microseconds":876972}}
09:27:58.7279 3160 EVENT {"event":"VNC_CONNECTED","data":{"client":{"host":"127.0.0.1","websocket":false,"family":"ipv4","service":"47356"},"server":{"auth":"none","host":"0.0.0.0","family":"ipv4","websocket":false,"service":"5994"}},"timestamp":{"microseconds":877318,"seconds":1499847222}}
09:27:58.7280 3160 EVENT {"timestamp":{"seconds":1499847222,"microseconds":881498},"data":{"client":{"family":"ipv4","websocket":false,"service":"47356","host":"127.0.0.1"},"server":{"host":"0.0.0.0","auth":"none","service":"5994","websocket":false,"family":"ipv4"}},"event":"VNC_INITIALIZED"}
09:27:58.7282 3160 EVENT {"event":"VNC_DISCONNECTED","data":{"client":{"host":"127.0.0.1","websocket":false,"family":"ipv4","service":"47356"},"server":{"auth":"none","host":"0.0.0.0","family":"ipv4","websocket":false,"service":"5994"}},"timestamp":{"microseconds":205885,"seconds":1499847269}}
In the original issue we also have similar messages:
17:55:12.8539 23793 Loading a VM snapshot lastgood
17:55:14.6996 23796 EVENT {"event":"NIC_RX_FILTER_CHANGED","data":{"path":"/machine/peripheral-anon/device[1]/virtio-backend"},"timestamp":{"seconds":1497287396,"microseconds":552083}}
17:55:14.7000 23796 EVENT {"timestamp":{"seconds":1497288742,"microseconds":957668},"data":{"client":{"host":"127.0.0.1","service":"58154","websocket":false,"family":"ipv4"},"server":{"websocket":false,"family":"ipv4","auth":"none","service":"5991","host":"0.0.0.0"}},"event":"VNC_DISCONNECTED"}
17:55:14.7001 23796 EVENT {"event":"VNC_CONNECTED","data":{"client":{"service":"59904","host":"127.0.0.1","websocket":false,"family":"ipv4"},"server":{"websocket":false,"family":"ipv4","auth":"none","host":"0.0.0.0","service":"5991"}},"timestamp":{"seconds":1497288742,"microseconds":958379}}
17:55:14.7002 23796 EVENT {"data":{"server":{"websocket":false,"family":"ipv4","auth":"none","host":"0.0.0.0","service":"5991"},"client":{"family":"ipv4","websocket":false,"service":"59904","host":"127.0.0.1"}},"event":"VNC_INITIALIZED","timestamp":{"seconds":1497288742,"microseconds":961827}}
17:55:14.7003 23796 EVENT {"timestamp":{"microseconds":266414,"seconds":1497288789},"event":"VNC_DISCONNECTED","data":{"client":{"host":"127.0.0.1","service":"59904","family":"ipv4","websocket":false},"server":{"auth":"none","host":"0.0.0.0","service":"5991","websocket":false,"family":"ipv4"}}}
17:55:14.7005 23796 EVENT {"timestamp":{"seconds":1497288789,"microseconds":266914},"event":"VNC_CONNECTED","data":{"client":{"host":"127.0.0.1","service":"59962","websocket":false,"family":"ipv4"},"server":{"websocket":false,"family":"ipv4","auth":"none","host":"0.0.0.0","service":"5991"}}}
17:55:14.7006 23796 EVENT {"data":{"client":{"service":"59962","host":"127.0.0.1","family":"ipv4","websocket":false},"server":{"family":"ipv4","websocket":false,"host":"0.0.0.0","service":"5991","auth":"none"}},"event":"VNC_INITIALIZED","timestamp":{"microseconds":269673,"seconds":1497288789}}
17:55:14.7007 23796 EVENT {"event":"VNC_DISCONNECTED","data":{"client":{"host":"127.0.0.1","service":"59962","family":"ipv4","websocket":false},"server":{"host":"0.0.0.0","service":"5991","auth":"none","family":"ipv4","websocket":false}},"timestamp":{"seconds":1497289075,"microseconds":403984}}
17:55:14.7008 23796 EVENT {"timestamp":{"seconds":1497289075,"microseconds":404367},"event":"VNC_CONNECTED","data":{"server":{"family":"ipv4","websocket":false,"host":"0.0.0.0","service":"5991","auth":"none"},"client":{"service":"60192","host":"127.0.0.1","family":"ipv4","websocket":false}}}
17:55:14.7010 23796 EVENT {"data":{"server":{"host":"0.0.0.0","service":"5991","auth":"none","family":"ipv4","websocket":false},"client":{"host":"127.0.0.1","service":"60192","family":"ipv4","websocket":false}},"event":"VNC_INITIALIZED","timestamp":{"seconds":1497289075,"microseconds":406989}}
17:55:14.7011 23796 EVENT {"event":"RESET","timestamp":{"seconds":1497289201,"microseconds":579680}}
17:55:14.7012 23796 EVENT {"event":"VNC_DISCONNECTED","data":{"server":{"websocket":false,"family":"ipv4","auth":"none","service":"5991","host":"0.0.0.0"},"client":{"family":"ipv4","websocket":false,"host":"127.0.0.1","service":"60192"}},"timestamp":{"seconds":1497289250,"microseconds":388669}}
17:55:14.7013 23796 EVENT {"data":{"client":{"host":"127.0.0.1","service":"60384","websocket":false,"family":"ipv4"},"server":{"websocket":false,"family":"ipv4","auth":"none","host":"0.0.0.0","service":"5991"}},"event":"VNC_CONNECTED","timestamp":{"microseconds":389219,"seconds":1497289250}}
17:55:14.7014 23796 EVENT {"timestamp":{"microseconds":391948,"seconds":1497289250},"event":"VNC_INITIALIZED","data":{"server":{"auth":"none","host":"0.0.0.0","service":"5991","websocket":false,"family":"ipv4"},"client":{"family":"ipv4","websocket":false,"service":"60384","host":"127.0.0.1"}}}
17:55:14.7016 23796 EVENT {"timestamp":{"seconds":1497289255,"microseconds":465158},"event":"STOP"}
17:55:14.7017 23796 EVENT {"event":"RESUME","timestamp":{"microseconds":820956,"seconds":1497289258}}
17:55:14.7018 23796 EVENT {"event":"STOP","timestamp":{"seconds":1497289489,"microseconds":812375}}
17:55:14.7020 23796 EVENT {"timestamp":{"microseconds":676941,"seconds":1497289492},"event":"RESUME"}
17:55:14.7021 23796 EVENT {"event":"STOP","timestamp":{"seconds":1497289691,"microseconds":958030}}
17:55:14.7022 23796 EVENT {"timestamp":{"seconds":1497289695,"microseconds":19180},"event":"RESUME"}
17:55:14.7023 23796 EVENT {"event":"STOP","timestamp":{"microseconds":43114,"seconds":1497289937}}
17:55:14.7024 23796 EVENT {"timestamp":{"microseconds":167888,"seconds":1497289940},"event":"RESUME"}
17:55:14.7025 23796 EVENT {"timestamp":{"microseconds":532513,"seconds":1497289952},"event":"STOP"}
17:55:14.7026 23796 EVENT {"event":"RESUME","timestamp":{"seconds":1497289956,"microseconds":9466}}
17:55:14.7027 23796 EVENT {"event":"STOP","timestamp":{"microseconds":855517,"seconds":1497290112}}
17:55:14.7028 23796 EVENT {"event":"RESUME","timestamp":{"microseconds":698171,"seconds":1497290114}}
17:55:14.7035 23796 EVENT {"timestamp":{"seconds":1497290114,"microseconds":703403},"event":"STOP"}
17:55:14.7045 23796 EVENT {"timestamp":{"seconds":1497290114,"microseconds":704374},"event":"RESUME"}
Updated by riafarov over 7 years ago
- Status changed from Feedback to New
- Assignee deleted (
riafarov)
Not really possible to investigate without extensive logging on workers. Nothing in logs which could explain system unresponsiveness. Unassigning.
Updated by okurz over 7 years ago
- Has duplicate action #18010: test fails in yast2_lan as first module and then all after that after what looks like giant slowdown on worker added
Updated by okurz over 7 years ago
Initial report was in #18010 but I closed that ticket in favor of this with more comments.
Observation¶
openQA test in scenario sle-12-SP3-Server-DVD-x86_64-iscsi_ibft@64bit fails in
yast2_lan
and then all modules after that one. commands don't finish, can't reach openqaworker host for uploading, etc.
nothing obvious in autoinst-log.txt, just correctly typed and then nothing is happening when the password prompt should show up
journal from corresponding worker for that job
Mar 24 15:17:45 openqaworker3 worker[24677]: [INFO] 21948: WORKING 835537
Mar 24 15:17:47 openqaworker3 qemu-system-x86_64[21974]: looking for plugins in '/usr/lib64/sasl2', failed to op
Mar 24 17:17:45 openqaworker3 worker[24677]: [WARN] max job time exceeded, aborting 00835537-sle-12-SP3-Server-D
Mar 24 17:17:45 openqaworker3 worker[24677]: killed 21948
Mar 24 17:17:50 openqaworker3 worker[24677]: [WARN] job 835537 spent more time than MAX_JOB_TIME
Mar 24 17:17:50 openqaworker3 worker[24677]: can't open /var/lib/openqa/pool/14/testresults/result-yast2_i.json:
Mar 24 17:17:51 openqaworker3 worker[24677]: [INFO] cleaning up 00835537-sle-12-SP3-Server-DVD-x86_64-Build0295-
then finally the job is incompleted because of 2h-timeout
Reproducible¶
Fails since (at least) Build 0295 (current job)
Expected result¶
Last good: 0294 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by okurz over 7 years ago
- Target version changed from Milestone 10 to Milestone 11
discussed in M10 review meeting, we should take a look again after SLE15 settled a bit now
Updated by okurz over 7 years ago
https://openqa.suse.de/tests/1187031# looks pretty much like the same, don't you think?
Updated by okurz about 7 years ago
- Target version changed from Milestone 11 to Milestone 13
Not that reproducible for SLE15 currently because SLE15 did not "settle enough". I think we should take a look again later.
Updated by riafarov almost 7 years ago
- Target version changed from Milestone 13 to Milestone 14
reviewers haven't seen such failure recently, resolve in next milestone if no update
Updated by riafarov almost 7 years ago
- Status changed from New to Resolved
As per discusion with Santi:
iirc at that point the vm tried to restore lastgood in curl_https since it failed in yast2_lan
and there was a bug in aarch64 where restoring from a snapshot was breaking usb
So resolve, reopen if reproduced.