action #16438
closed[tools]System stall during test run not detected
0%
Description
Observation¶
openQA test in scenario sle-12-SP3-Server-DVD-x86_64-RAID5@64bit fails in
install_and_reboot
Test fails on pressing "alt-s" combination during system reboot prompt dialog in "install_and_reboot".
But it's look like that it's not a product issue and it's happens because of this stall for 28 seconds :
11:03:54.1033 35503 MATCH(rebootnow-20150409:0.45)
11:03:54.1135 35503 MATCH(rebootnow-20160504:1.00)
11:04:22.2395 35325 >>> testapi::_check_backend_response: found rebootnow-20160504, similarity 1.00 @ 271/257
11:04:22.2397 Debug: /var/lib/openqa/share/tests/sle/tests/installation/install_and_reboot.pm:108 called testapi::send_key
Reproducible¶
Fails sporadically in different scenarios.
Expected result¶
Last good: 0231 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by asmorodskyi about 7 years ago
- Project changed from openQA Tests to openQA Project
- Category changed from Bugs in existing tests to 132
- Priority changed from Normal to High
Updated by asmorodskyi about 7 years ago
another job which fails with same symptoms has something interesting in the logs (https://openqa.suse.de/tests/757155):
Couldn't get a file descriptor referring to the console
%@umount: /var/lib/nfs/rpc_pipefs: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)
Couldn't get a file descriptor referring to the console
Couldn't get a file descriptor referring to the console
Updated by coolo about 7 years ago
- Priority changed from High to Immediate
- Target version set to Milestone 5
Checking what we deployed when it started - this is caused by a combination of slow NFS and 2f6118c39fa972b4aba94407e267e5e6f47cef0f in needle.pm - it is now checking the json within NFS (and not in needle cache). And as such there is a huge delay after every needle match. The backend is not stalling, but test thread is. In the end, it doesn't matter - we do not send alt+s
Updated by okurz about 7 years ago
[16 Feb 2017 10:18:38] <okurz> any news on https://progress.opensuse.org/issues/16438 "System stall during test run not detected" about NFS performance?
[16 Feb 2017 10:19:04] * coolo is mainly trying to workaround https://openqa.suse.de/tests/778533#step/scc_registration/42
[16 Feb 2017 10:19:48] <coolo> okurz: the news is: rudi loadbalanced the netapps and I rebooted the VM - the combination made the problem less severe
[16 Feb 2017 10:20:04] <coolo> but we need to a) keep monitoring it and b) get rid of NFS and use proper caching
[16 Feb 2017 10:20:30] <coolo> because we will have more and more worker CPUs - all pulling on the poor virtual NFSd
Updated by okurz about 7 years ago
- Status changed from New to Feedback
- Assignee set to okurz
- Priority changed from Immediate to High
- Target version changed from Milestone 5 to Milestone 6
as per #1648#note-4
Updated by okurz about 7 years ago
- Status changed from Feedback to In Progress
- Assignee changed from okurz to szarate
- Priority changed from High to Normal
https://openqa.suse.de/tests/781515 is an instance of working fine again. I guess this is very much related to stall detection, needle and asset caching, assigning to szarate.
Updated by RBrownSUSE about 7 years ago
- Subject changed from System stall during test run not detected to [tools]System stall during test run not detected
Updated by mgriessmeier about 7 years ago
https://openqa.suse.de/tests/812947
stall was 'only' 9 seconds but it looks like the same issue to me
23:18:20.0989 11234 MATCH(rebootnow-20160504:1.00)
23:18:29.1150 11220 >>> testapi::_handle_found_needle: found rebootnow-20160504, similarity 1.00 @ 271/257
23:18:29.1152 Debug: /var/lib/openqa/share/tests/sle/tests/installation/install_and_reboot.pm:110 called testapi::send_key
23:18:29.1152 11220 <<< testapi::send_key(key='alt-s')
23:18:29.3188 Debug: /var/lib/openqa/share/tests/sle/tests/installation/install_and_reboot.pm:110 called testapi::wait_still_screen
23:18:29.3190 11220 <<< testapi::wait_still_screen(stilltime=3, timeout=4, simlvl=47)
23:18:33.4918 11220 >>> testapi::wait_still_screen: wait_still_screen timed out after 4
Updated by RBrownSUSE about 7 years ago
- Related to action #17594: [tools]Missing characters in the middle of type_string/assert_script_run (Ninja Keys) added
Updated by SLindoMansilla about 7 years ago
sle-12-SP3-Server-DVD-ppc64le-Build0313-cryptlvm_minimal_x@ppc64le
osd#856166#step/grub_test/6
sle-12-SP3-Server-DVD-ppc64le-Build0313-gnome@ppc64le
osd#856402#step/grub_test/6
Updated by nicksinger about 7 years ago
@SLindoMansilla: Your references are wrong. This ticket is about "stall not detected". But the jobs you referenced clearly show "stall detected". That would be #13276
Updated by okurz almost 7 years ago
M6 is long gone and so is M7. Care to update? I assume currently it's not planned to continue with this as we did not see it more often. Maybe it does not even apply anymore?
Updated by RBrownSUSE almost 7 years ago
- Target version changed from Milestone 6 to Milestone 8
Updated by szarate almost 7 years ago
- Status changed from In Progress to Feedback
- Assignee deleted (
szarate) - Target version deleted (
Milestone 8)
I have not seen this for a while, and haven't worked with this either. Removing from my queue, and removing the milestone too
Updated by coolo over 6 years ago
- Status changed from Feedback to Resolved
Not using NFS anymore clearly helped - even when okurz didn't believe it.