action #65657
closedall tests passed but still got Result: timeout_exceeded
Added by zluo over 4 years ago. Updated over 4 years ago.
0%
Description
Observation¶
openQA test in scenario sle-15-SP2-Online-x86_64-gnome_sled@laptop_64bit-virtio-vga fails in
shutdown
Test suite description¶
same with testsuite 'gnome' but it is for SLED
Reproducible¶
Fails since (at least) Build 173.6
Expected result¶
Last good: 170.1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by okurz over 4 years ago
- Project changed from openQA Tests to openQA Project
- Subject changed from [tools] all tests passed but still got Result: timeout_exceeded to all tests passed but still got Result: timeout_exceeded
- Category changed from Bugs in existing tests to Regressions/Crashes
- Priority changed from Normal to Low
Interesting case. I guess we can move that to "openQA Project"? I looked at the logs and found:
[2020-04-15T15:39:25.429 UTC] [debug] assert_shutdown_with_soft_timeout(): soft_timeout=60
[2020-04-15T15:39:25.429 UTC] [debug] tests/shutdown/shutdown.pm:28 called power_action_utils::power_action -> lib/power_action_utils.pm:318 called power_action_utils::assert_shutdown_with_soft_timeout -> lib/power_action_utils.pm:406 called testapi::check_shutdown
[2020-04-15T15:39:25.430 UTC] [debug] <<< testapi::check_shutdown(timeout=60)
[2020-04-15T15:39:25.432 UTC] [debug] QEMU status is not 'shutdown', it is 'running'
[2020-04-15T15:39:26.435 UTC] [debug] QEMU status is not 'shutdown', it is 'running'
[2020-04-15T15:39:27.443 UTC] [debug] QEMU status is not 'shutdown', it is 'running'
[2020-04-15T15:39:28.444 UTC] [debug] EVENT {"data":{"guest":true},"event":"SHUTDOWN","timestamp":{"microseconds":108427,"seconds":1586965168}}
[2020-04-15T15:39:28.445 UTC] [debug] EVENT {"event":"STOP","timestamp":{"microseconds":130649,"seconds":1586965168}}
[2020-04-15T15:39:28.463 UTC] [debug] ||| finished shutdown shutdown at 2020-04-15 15:39:28 (45 s)
[2020-04-15T15:39:28.467 UTC] [debug] stopping autotest process 10457
[2020-04-15T15:39:28.486 UTC] [debug] [autotest] process exited: 0
[2020-04-15T15:39:28.487 UTC] [debug] done with autotest process
[2020-04-15T15:39:28.487 UTC] [debug] terminating command server 10293 because test execution ended
[2020-04-15T15:39:28.488 UTC] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20103/OpPKiCuRgi0mkhEu/broadcast
[2020-04-15T15:39:28.524 UTC] [debug] commands process exited: 0
[2020-04-15T15:39:28.524 UTC] [debug] done with command server
[2020-04-15T15:39:28.524 UTC] [debug] isotovideo done
[2020-04-15T15:39:28.525 UTC] [debug] BACKEND SHUTDOWN 1
[2020-04-15T15:39:28.526 UTC] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2020-04-15T15:39:29.574 UTC] [debug] flushing frames
[2020-04-15T15:39:29.614 UTC] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 10474 (/usr/bin/isotovideo: backen)
[2020-04-15T15:39:29.616 UTC] [debug] sending magic and exit
[2020-04-15T15:39:29.616 UTC] [debug] received magic close
[2020-04-15T15:39:29.653 UTC] [debug] backend process exited: 0
[2020-04-15T15:39:29.655 UTC] [debug] stopping backend process 10474
[2020-04-15T15:39:29.655 UTC] [debug] done with backend process
10285: EXIT 0
[2020-04-15T15:41:34.0824 UTC] [info] [pid:15804] +++ worker notes +++
[2020-04-15T15:41:34.0824 UTC] [info] [pid:15804] End time: 2020-04-15 15:41:34
[2020-04-15T15:41:34.0824 UTC] [info] [pid:15804] Result: timeout
to me it looks like we are hitting a "lucky spot" where actually all is well from os-autoinst scope but then because it takes some time also to process results within the openQA worker it is still declared as "timeout". We should check the code if we look for timeout regardless of the result. Then we should adapt to still "pass" in case of timeout when all individual results are actually fine.
@mkittler as you are assigned I am keeping you assigned as maybe there had been side-communication that you suggested zluo to pick you as assignee directly, otherwise feel free to unassign.
Updated by mkittler over 4 years ago
[2020-04-15T13:31:38.0644 UTC] [info] [pid:15804] +++ setup notes +++
...
[2020-04-15T15:39:28.467 UTC] [debug] stopping autotest process 10457
...
The timeout of the job is 7560 seconds (slightly more than 2 hours). So it makes sense that the result is timeout_exceeded
. However, it seems that the worker didn't really manage to enforce the timeout e.g. by killing isotovideo which makes it look a little bit weird. Therefore I suppose the problem to investigate/fix here is that the worker is not able to enforce the timeout.
By the way, they've now fixed the test itself by increasing the timeout (see #65888).
I am keeping you assigned as maybe there had been side-communication
Not really, but I can have a look anyways.
Updated by okurz over 4 years ago
ok, fine. But keep in mind that this is still a "Low" prio ticket. We might just as well reject it as one can consider "timeout is timeout" the reasonable choice here, regardless of individual module status. Feel free to update the ticket accordingly, I do not want to invest us much time here.
Updated by mkittler over 4 years ago
I also think that there are more important issues so I'm unassigning myself.
@zluo Don't assign me when you create a ticket. I check for new tickets from time to time anyways and setting me as assignee just prevents other people to pick up a ticket because they think I have already taken it.
Updated by okurz over 4 years ago
- Status changed from New to Rejected
- Assignee set to okurz
ok, fine. I have merged the content into #65271