action #34012

[kernel] too generic test failure in "execute_test_run" for stress tests, was previously something more specific like "acceptance_fs_stress"

Added by okurz about 2 years ago. Updated 5 months ago.

Status:ResolvedStart date:29/03/2018
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Enhancement to existing testsEstimated time:5.00 hours
Target version:QA - future
Difficulty:
Duration:

Description

Observation

openQA test in scenario sle-15-Installer-DVD-x86_64-fs_stress@64bit fails in
execute_test_run
which is a very generic name for a test failure and making test review more difficult because the test module does not relate to what was actually executed.

Expected result

Some months ago the feedback looked better, e.g. in https://openqa.suse.de/tests/1209225#step/acceptance_fs_stress/14 pointing to a test module "acceptance_fs_stress" which was more helpful.

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Tests - action #37782: [kernel][functional][u][medium] test fails in execute_tes... Resolved 25/06/2018

History

#1 Updated by okurz about 2 years ago

  • Assignee set to yosun

@yosun do you have an idea how we can improve the test feedback again? In before we had a more helpful test module name "acceptance_fs_stress" failing, now it is "execute_test_run".

#2 Updated by yosun almost 2 years ago

In this fail case, some time you can't get useful log in snapshot, but by looking into console you may found some Call trace happened.
https://openqa.suse.de/tests/1561334/file/serial0.txt

Most kernel test fails need to be check via console and tarbal(if has), when you see a fail.
BTW, those three stress test, most likely a kernel acceptance test, if you think it fails most likely in kernel way and better to debug in kernel-qa team. Then feel free to transfer them into kernel job group for better classification.

#3 Updated by okurz almost 2 years ago

yosun wrote:

In this fail case, some time you can't get useful log in snapshot, but by looking into console you may found some Call trace happened.
https://openqa.suse.de/tests/1561334/file/serial0.txt

Well, sorry that does not help because I am mainly asking because the label carry over, the test overview page as well as openqa-review mainly rely on the name of the first test module failing in a scenario.

Most kernel test fails need to be check via console and tarbal(if has), when you see a fail.

Well, actually there is a better way. I implemented a simple y2log parser for installation and yast failures already and slindomansilla has adopted it in a cool way for the systemd-testsuite in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4795/files so I am thinking that a similar way could be applied for the kernel test failures as well.

BTW, those three stress test, most likely a kernel acceptance test, if you think it fails most likely in kernel way and better to debug in kernel-qa team. Then feel free to transfer them into kernel job group for better classification.

That's actually a good idea. I will discuss with marita as PM and sebchlad as kernel&network PO.

#4 Updated by yosun almost 2 years ago

  • Status changed from New to In Progress

okurz wrote:

yosun wrote:

In this fail case, some time you can't get useful log in snapshot, but by looking into console you may found some Call trace happened.
https://openqa.suse.de/tests/1561334/file/serial0.txt


Well, sorry that does not help because I am mainly asking because the label carry over, the test overview page as well as openqa-review mainly rely on the name of the first test module failing in a scenario.

Most kernel test fails need to be check via console and tarbal(if has), when you see a fail.


Well, actually there is a better way. I implemented a simple y2log parser for installation and yast failures already and slindomansilla has adopted it in a cool way for the systemd-testsuite in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4795/files so I am thinking that a similar way could be applied for the kernel test failures as well.

Nice suggestion, it will reduce the workforce to analysis log. In that case, this ticket need to split into many ticket for particular tests in different set of script. Kernel fails are little bit complicated than userspace's, you need to check test log to find minor issues, to check /var/log/messages or journal log for other issue, and to check kdump files for crash bugs. I suggest to create subtask to follow.
For kernel tests part split to:
- ctcs2 based testcase
- xfstests
- ltp
- network
...

BTW, those three stress test, most likely a kernel acceptance test, if you think it fails most likely in kernel way and better to debug in kernel-qa team. Then feel free to transfer them into kernel job group for better classification.


That's actually a good idea. I will discuss with marita as PM and sebchlad as kernel&network PO.

Any update in this part?

#5 Updated by yosun almost 2 years ago

  • Assignee deleted (yosun)

no response, remove assignee.

#6 Updated by okurz almost 2 years ago

  • Assignee set to okurz

#7 Updated by okurz almost 2 years ago

  • Due date set to 19/06/2018

I just need to talk to sebchlad and mawerner about this but I keep forgetting :(

#8 Updated by okurz almost 2 years ago

  • Target version changed from Milestone 17 to Milestone 17

#9 Updated by okurz almost 2 years ago

  • Due date changed from 19/06/2018 to 17/07/2018
  • Status changed from In Progress to Feedback
  • Assignee changed from okurz to sebchlad

@sebchlad as discussed. IIUC QSK is fine to take over the "…_stress" test suites and schedule according scenarios within SLE12 and SLE15 tests as well as improve them over time and hopefully also schedule them for relevant openSUSE tests. So I propose that you remove the according scenarios from the SLE15 and SLE12 schedule and add the according ones in the Kernel job group. OK? If you expect QSF to do something feel free to reassign to me, otherwise remove the "[functional]" tag and add "[kernel]" after moving the test scenarios.

#10 Updated by okurz almost 2 years ago

  • Related to action #37782: [kernel][functional][u][medium] test fails in execute_test_run because it cannot handle broken pipes added

#11 Updated by riafarov almost 2 years ago

  • Estimated time set to 5.00

#12 Updated by mgriessmeier over 1 year ago

  • Subject changed from [functional][u] too generic test failure in "execute_test_run" for stress tests, was previously something more specific like "acceptance_fs_stress" to [kernel] too generic test failure in "execute_test_run" for stress tests, was previously something more specific like "acceptance_fs_stress"
  • Due date deleted (17/07/2018)
  • Target version deleted (Milestone 17)

#13 Updated by sebchlad over 1 year ago

  • Status changed from Feedback to Workable
  • Assignee deleted (sebchlad)
  • Target version set to Current Sprint - kernel

#15 Updated by yosun over 1 year ago

qa_test_* and xfstests use different testscript.
qa_test_* use test script in tests/qa_automation
xfstests test script in tests/xfstests

Then I remove this related issue.

#17 Updated by okurz over 1 year ago

ok, sorry

#18 Updated by yosun over 1 year ago

no problem, btw, I just more sched_stress, fs_stress, process_stress into kernel job group as told in previous comments.

#19 Updated by okurz over 1 year ago

great. Can you do the same for the SLE15 codestream please?

#20 Updated by yosun over 1 year ago

The sle15 code base used by QAM now. So I just remove them from functional job group in sle15, to add them or no could decide by QAM.

#21 Updated by yosun over 1 year ago

I guess you mean SLE15SP1, I also remove from there, and will add in kernel job group when needed.

#22 Updated by okurz over 1 year ago

I stated SLE15 codestream. Of course I mean whatever current service pack is in development. Please make sure to only remove it when you also add it. As we use the same job group for the whole SLE15 code stream I suggest to just add it to the kernel tests already now.

#23 Updated by yosun over 1 year ago

OK, done

#24 Updated by okurz over 1 year ago

@yosun the team QSF thanks you :)

#25 Updated by sebchlad about 1 year ago

  • Target version changed from Current Sprint - kernel to future

#26 Updated by jlausuch 5 months ago

  • Status changed from Workable to Resolved

sched_stress, fs_stress, process_stress are not run in kernel job group any more.

#27 Updated by okurz 5 months ago

  • Status changed from Resolved to Workable

but they are! E.g. https://openqa.suse.de/tests/3527726 in Kernel for SLE15SP2. Also the ticket is not about "tests should not be run in kernel group" but about a too generic test failure in "execute_test_run". Please see the initial ticket description for the actual issue. Btw, mau-qa_acceptance_fs_stress seems to handle this better already with QA_TESTSET=acceptance_fs_stress instead of QA_TESTSUITE=fs_stress. Maybe that's the easy fix.

#28 Updated by jlausuch 5 months ago

  • Status changed from Workable to Resolved

okurz wrote:

but they are! E.g. https://openqa.suse.de/tests/3527726 in Kernel for SLE15SP2.

Our bad. We decided to not run stress tests in kernel group and removed it for SLE12-SP5 but forgot about SLE15 group. Now they are removed from that group as well.

Also the ticket is not about "tests should not be run in kernel group" but about a too generic test failure in "execute_test_run". Please see the initial ticket description for the actual issue. Btw, mau-qa_acceptance_fs_stress seems to handle this better already with QA_TESTSET=acceptance_fs_stress instead of QA_TESTSUITE=fs_stress. Maybe that's the easy fix.

I know this ticket is not about that. Anyway, I have removed QA_TESTSUITE=fs_stress and added QA_TESTSET=acceptance_fs_stress in the test settings in case someone wants to enable that test again for whatever purpose.

#29 Updated by okurz 5 months ago

Yes, thanks. I think this should solve it for good! Without testing we wouldn't know if the "testset" approach works but let's take the chance ;)

Also available in: Atom PDF