action #34012
closed[kernel] too generic test failure in "execute_test_run" for stress tests, was previously something more specific like "acceptance_fs_stress"
Added by okurz over 6 years ago. Updated about 5 years ago.
0%
Description
Observation¶
openQA test in scenario sle-15-Installer-DVD-x86_64-fs_stress@64bit fails in
execute_test_run
which is a very generic name for a test failure and making test review more difficult because the test module does not relate to what was actually executed.
Expected result¶
Some months ago the feedback looked better, e.g. in https://openqa.suse.de/tests/1209225#step/acceptance_fs_stress/14 pointing to a test module "acceptance_fs_stress" which was more helpful.
Further details¶
Always latest result in this scenario: latest
Updated by okurz over 6 years ago
- Assignee set to yosun
@yosun do you have an idea how we can improve the test feedback again? In before we had a more helpful test module name "acceptance_fs_stress" failing, now it is "execute_test_run".
Updated by yosun over 6 years ago
In this fail case, some time you can't get useful log in snapshot, but by looking into console you may found some Call trace happened.
https://openqa.suse.de/tests/1561334/file/serial0.txt
Most kernel test fails need to be check via console and tarbal(if has), when you see a fail.
BTW, those three stress test, most likely a kernel acceptance test, if you think it fails most likely in kernel way and better to debug in kernel-qa team. Then feel free to transfer them into kernel job group for better classification.
Updated by okurz over 6 years ago
yosun wrote:
In this fail case, some time you can't get useful log in snapshot, but by looking into console you may found some Call trace happened.
https://openqa.suse.de/tests/1561334/file/serial0.txt
Well, sorry that does not help because I am mainly asking because the label carry over, the test overview page as well as openqa-review mainly rely on the name of the first test module failing in a scenario.
Most kernel test fails need to be check via console and tarbal(if has), when you see a fail.
Well, actually there is a better way. I implemented a simple y2log parser for installation and yast failures already and slindomansilla has adopted it in a cool way for the systemd-testsuite in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4795/files so I am thinking that a similar way could be applied for the kernel test failures as well.
BTW, those three stress test, most likely a kernel acceptance test, if you think it fails most likely in kernel way and better to debug in kernel-qa team. Then feel free to transfer them into kernel job group for better classification.
That's actually a good idea. I will discuss with marita as PM and sebchlad as kernel&network PO.
Updated by yosun over 6 years ago
- Status changed from New to In Progress
okurz wrote:
yosun wrote:
In this fail case, some time you can't get useful log in snapshot, but by looking into console you may found some Call trace happened.
https://openqa.suse.de/tests/1561334/file/serial0.txtWell, sorry that does not help because I am mainly asking because the label carry over, the test overview page as well as openqa-review mainly rely on the name of the first test module failing in a scenario.
Most kernel test fails need to be check via console and tarbal(if has), when you see a fail.
Well, actually there is a better way. I implemented a simple y2log parser for installation and yast failures already and slindomansilla has adopted it in a cool way for the systemd-testsuite in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4795/files so I am thinking that a similar way could be applied for the kernel test failures as well.
Nice suggestion, it will reduce the workforce to analysis log. In that case, this ticket need to split into many ticket for particular tests in different set of script. Kernel fails are little bit complicated than userspace's, you need to check test log to find minor issues, to check /var/log/messages or journal log for other issue, and to check kdump files for crash bugs. I suggest to create subtask to follow.
For kernel tests part split to:
- ctcs2 based testcase
- xfstests
- ltp
- network ...
BTW, those three stress test, most likely a kernel acceptance test, if you think it fails most likely in kernel way and better to debug in kernel-qa team. Then feel free to transfer them into kernel job group for better classification.
That's actually a good idea. I will discuss with marita as PM and sebchlad as kernel&network PO.
Any update in this part?
Updated by okurz over 6 years ago
- Due date set to 2018-06-19
I just need to talk to sebchlad and mawerner about this but I keep forgetting :(
Updated by okurz over 6 years ago
- Target version changed from Milestone 17 to Milestone 17
Updated by okurz over 6 years ago
- Due date changed from 2018-06-19 to 2018-07-17
- Status changed from In Progress to Feedback
- Assignee changed from okurz to sebchlad
@sebchlad as discussed. IIUC QSK is fine to take over the "…_stress" test suites and schedule according scenarios within SLE12 and SLE15 tests as well as improve them over time and hopefully also schedule them for relevant openSUSE tests. So I propose that you remove the according scenarios from the SLE15 and SLE12 schedule and add the according ones in the Kernel job group. OK? If you expect QSF to do something feel free to reassign to me, otherwise remove the "[functional]" tag and add "[kernel]" after moving the test scenarios.
Updated by okurz over 6 years ago
- Related to action #37782: [kernel][functional][u][medium] test fails in execute_test_run because it cannot handle broken pipes added
Updated by mgriessmeier over 6 years ago
- Subject changed from [functional][u] too generic test failure in "execute_test_run" for stress tests, was previously something more specific like "acceptance_fs_stress" to [kernel] too generic test failure in "execute_test_run" for stress tests, was previously something more specific like "acceptance_fs_stress"
- Due date deleted (
2018-07-17) - Target version deleted (
Milestone 17)
Updated by sebchlad over 6 years ago
- Status changed from Feedback to Workable
- Assignee deleted (
sebchlad) - Target version set to 445
Updated by yosun over 6 years ago
qa_test_* and xfstests use different testscript.
qa_test_* use test script in tests/qa_automation
xfstests test script in tests/xfstests
Then I remove this related issue.
Updated by yosun over 6 years ago
no problem, btw, I just more sched_stress, fs_stress, process_stress into kernel job group as told in previous comments.
Updated by okurz over 6 years ago
great. Can you do the same for the SLE15 codestream please?
Updated by yosun over 6 years ago
The sle15 code base used by QAM now. So I just remove them from functional job group in sle15, to add them or no could decide by QAM.
Updated by yosun over 6 years ago
I guess you mean SLE15SP1, I also remove from there, and will add in kernel job group when needed.
Updated by okurz over 6 years ago
I stated SLE15 codestream. Of course I mean whatever current service pack is in development. Please make sure to only remove it when you also add it. As we use the same job group for the whole SLE15 code stream I suggest to just add it to the kernel tests already now.
Updated by sebchlad almost 6 years ago
- Target version changed from 445 to future
Updated by jlausuch about 5 years ago
- Status changed from Workable to Resolved
sched_stress, fs_stress, process_stress are not run in kernel job group any more.
Updated by okurz about 5 years ago
- Status changed from Resolved to Workable
but they are! E.g. https://openqa.suse.de/tests/3527726 in Kernel for SLE15SP2. Also the ticket is not about "tests should not be run in kernel group" but about a too generic test failure in "execute_test_run". Please see the initial ticket description for the actual issue. Btw, mau-qa_acceptance_fs_stress seems to handle this better already with QA_TESTSET=acceptance_fs_stress
instead of QA_TESTSUITE=fs_stress
. Maybe that's the easy fix.
Updated by jlausuch about 5 years ago
- Status changed from Workable to Resolved
okurz wrote:
but they are! E.g. https://openqa.suse.de/tests/3527726 in Kernel for SLE15SP2.
Our bad. We decided to not run stress tests in kernel group and removed it for SLE12-SP5 but forgot about SLE15 group. Now they are removed from that group as well.
Also the ticket is not about "tests should not be run in kernel group" but about a too generic test failure in "execute_test_run". Please see the initial ticket description for the actual issue. Btw, mau-qa_acceptance_fs_stress seems to handle this better already with
QA_TESTSET=acceptance_fs_stress
instead ofQA_TESTSUITE=fs_stress
. Maybe that's the easy fix.
I know this ticket is not about that. Anyway, I have removed QA_TESTSUITE=fs_stress
and added QA_TESTSET=acceptance_fs_stress
in the test settings in case someone wants to enable that test again for whatever purpose.
Updated by okurz about 5 years ago
Yes, thanks. I think this should solve it for good! Without testing we wouldn't know if the "testset" approach works but let's take the chance ;)