Project

General

Profile

action #65657

all tests passed but still got Result: timeout_exceeded

Added by zluo over 1 year ago. Updated about 1 year ago.

Status:
Rejected
Priority:
Low
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2020-04-16
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP2-Online-x86_64-gnome_sled@laptop_64bit-virtio-vga fails in
shutdown

Test suite description

same with testsuite 'gnome' but it is for SLED

Reproducible

Fails since (at least) Build 173.6

Expected result

Last good: 170.1 (or more recent)

Further details

Always latest result in this scenario: latest

History

#1 Updated by okurz over 1 year ago

  • Project changed from openQA Tests to openQA Project
  • Subject changed from [tools] all tests passed but still got Result: timeout_exceeded to all tests passed but still got Result: timeout_exceeded
  • Category changed from Bugs in existing tests to Concrete Bugs
  • Priority changed from Normal to Low

Interesting case. I guess we can move that to "openQA Project"? I looked at the logs and found:

[2020-04-15T15:39:25.429 UTC] [debug] assert_shutdown_with_soft_timeout(): soft_timeout=60
[2020-04-15T15:39:25.429 UTC] [debug] tests/shutdown/shutdown.pm:28 called power_action_utils::power_action -> lib/power_action_utils.pm:318 called power_action_utils::assert_shutdown_with_soft_timeout -> lib/power_action_utils.pm:406 called testapi::check_shutdown
[2020-04-15T15:39:25.430 UTC] [debug] <<< testapi::check_shutdown(timeout=60)
[2020-04-15T15:39:25.432 UTC] [debug] QEMU status is not 'shutdown', it is 'running'
[2020-04-15T15:39:26.435 UTC] [debug] QEMU status is not 'shutdown', it is 'running'
[2020-04-15T15:39:27.443 UTC] [debug] QEMU status is not 'shutdown', it is 'running'
[2020-04-15T15:39:28.444 UTC] [debug] EVENT {"data":{"guest":true},"event":"SHUTDOWN","timestamp":{"microseconds":108427,"seconds":1586965168}}
[2020-04-15T15:39:28.445 UTC] [debug] EVENT {"event":"STOP","timestamp":{"microseconds":130649,"seconds":1586965168}}
[2020-04-15T15:39:28.463 UTC] [debug] ||| finished shutdown shutdown at 2020-04-15 15:39:28 (45 s)
[2020-04-15T15:39:28.467 UTC] [debug] stopping autotest process 10457
[2020-04-15T15:39:28.486 UTC] [debug] [autotest] process exited: 0
[2020-04-15T15:39:28.487 UTC] [debug] done with autotest process
[2020-04-15T15:39:28.487 UTC] [debug] terminating command server 10293 because test execution ended
[2020-04-15T15:39:28.488 UTC] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20103/OpPKiCuRgi0mkhEu/broadcast
[2020-04-15T15:39:28.524 UTC] [debug] commands process exited: 0
[2020-04-15T15:39:28.524 UTC] [debug] done with command server
[2020-04-15T15:39:28.524 UTC] [debug] isotovideo done
[2020-04-15T15:39:28.525 UTC] [debug] BACKEND SHUTDOWN 1
[2020-04-15T15:39:28.526 UTC] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2020-04-15T15:39:29.574 UTC] [debug] flushing frames
[2020-04-15T15:39:29.614 UTC] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 10474 (/usr/bin/isotovideo: backen)
[2020-04-15T15:39:29.616 UTC] [debug] sending magic and exit
[2020-04-15T15:39:29.616 UTC] [debug] received magic close
[2020-04-15T15:39:29.653 UTC] [debug] backend process exited: 0
[2020-04-15T15:39:29.655 UTC] [debug] stopping backend process 10474
[2020-04-15T15:39:29.655 UTC] [debug] done with backend process
10285: EXIT 0
[2020-04-15T15:41:34.0824 UTC] [info] [pid:15804] +++ worker notes +++
[2020-04-15T15:41:34.0824 UTC] [info] [pid:15804] End time: 2020-04-15 15:41:34
[2020-04-15T15:41:34.0824 UTC] [info] [pid:15804] Result: timeout

to me it looks like we are hitting a "lucky spot" where actually all is well from os-autoinst scope but then because it takes some time also to process results within the openQA worker it is still declared as "timeout". We should check the code if we look for timeout regardless of the result. Then we should adapt to still "pass" in case of timeout when all individual results are actually fine.

mkittler as you are assigned I am keeping you assigned as maybe there had been side-communication that you suggested zluo to pick you as assignee directly, otherwise feel free to unassign.

#2 Updated by okurz over 1 year ago

  • Target version set to future

#3 Updated by mkittler over 1 year ago

[2020-04-15T13:31:38.0644 UTC] [info] [pid:15804] +++ setup notes +++
...
[2020-04-15T15:39:28.467 UTC] [debug] stopping autotest process 10457
...

The timeout of the job is 7560 seconds (slightly more than 2 hours). So it makes sense that the result is timeout_exceeded. However, it seems that the worker didn't really manage to enforce the timeout e.g. by killing isotovideo which makes it look a little bit weird. Therefore I suppose the problem to investigate/fix here is that the worker is not able to enforce the timeout.

By the way, they've now fixed the test itself by increasing the timeout (see #65888).

I am keeping you assigned as maybe there had been side-communication

Not really, but I can have a look anyways.

#4 Updated by okurz over 1 year ago

ok, fine. But keep in mind that this is still a "Low" prio ticket. We might just as well reject it as one can consider "timeout is timeout" the reasonable choice here, regardless of individual module status. Feel free to update the ticket accordingly, I do not want to invest us much time here.

#5 Updated by mkittler about 1 year ago

I also think that there are more important issues so I'm unassigning myself.

zluo Don't assign me when you create a ticket. I check for new tickets from time to time anyways and setting me as assignee just prevents other people to pick up a ticket because they think I have already taken it.

#6 Updated by mkittler about 1 year ago

  • Assignee deleted (mkittler)

#7 Updated by okurz about 1 year ago

  • Status changed from New to Rejected
  • Assignee set to okurz

ok, fine. I have merged the content into #65271

Also available in: Atom PDF