Project

General

Profile

Actions

action #30478

closed

[sles][functional][u][sporadic][hard] test fails in snapper_cleanup: btrfs quote rescan times out, maybe extend timeout?

Added by mgriessmeier over 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Start date:
2018-01-18
Due date:
2018-04-24
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-Installer-DVD-x86_64-extra_tests_filesystem@64bit fails in
snapper_cleanup

Reproducible

Fails since (at least) Build 419.1

Expected result

Last good: 414.13 (or more recent)

Tasks

  • Investigate issue and find the root cause
  • Apply solution to make it stable, or if not trivial - create a new ticket for that
  • Perform many runs to get statistics

Further details

Always latest result in this scenario: latest

Actions #1

Updated by okurz about 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: extra_tests_filesystem
https://openqa.suse.de/tests/1467662

Actions #2

Updated by okurz about 6 years ago

  • Due date set to 2018-03-13
  • Priority changed from Normal to High
Actions #3

Updated by riafarov about 6 years ago

  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by mgriessmeier about 6 years ago

  • Subject changed from [sles][functional][sporadic] test fails in snapper_cleanup: btrfs quote rescan times out, maybe extend timeout? to [sles][functional][sporadic][hard] test fails in snapper_cleanup: btrfs quote rescan times out, maybe extend timeout?

setting to hard because it's sporadic

Actions #5

Updated by JERiveraMoya about 6 years ago

It didn't fail in 20 days, we still need to increase the timeout?

Actions #6

Updated by JERiveraMoya about 6 years ago

probably just right after my last comment failed :) : https://openqa.suse.de/tests/1520851#step/snapper_cleanup/79 7 days ago. At the moment I could only think about some memory issue regarding error "Illegal snapshot" but I don't know why is sporadic...

Actions #7

Updated by JERiveraMoya about 6 years ago

  • Status changed from Workable to Blocked

I think we need to give some time to see if timeout issues will appear again. What is present now is the "illegal snapshot" issue. We can block this ticket because of these two bugs:
https://bugzilla.suse.com/show_bug.cgi?id=1051920
https://bugzilla.suse.com/show_bug.cgi?id=1051920
As far as I could understand in these two ticket, it could be due to a memory issue so I added HDDSIZEGB=40 to OSD -> Test Suites -> extra_tests_filesystem.

Actions #8

Updated by JERiveraMoya about 6 years ago

  • Assignee set to JERiveraMoya
Actions #9

Updated by okurz about 6 years ago

Please add a comment in the test suite describing why we need the 40GB HDDSIZE. I do not think we should do this by default. Moreover I doubt this will work because the test uses a generated HDD. How should the disk be made bigger magically?

The two bugs you mentioned are actually the same.

Actions #10

Updated by okurz about 6 years ago

  • Due date changed from 2018-03-13 to 2018-04-10
  • Target version changed from Milestone 14 to Milestone 15
Actions #11

Updated by JERiveraMoya about 6 years ago

yes, good catch! I saw that you delete the 40GB from the child job (as no magic was applied). Both bugs are practically the same and mention problem with short memory, but I pasted twice the same link, I cannot find the other one now, but this one is the most related: https://bugzilla.suse.com/show_bug.cgi?id=1051920
Increased size for the parent job that create the image create_hdd_textmode and added comment to this test suite.

Actions #12

Updated by okurz about 6 years ago

hm, ok. I would prefer though to not change the test suites when a hypothesis has not been confirmed. Better either run it locally or just call clone-job with individual test changes, e.g. clone-job <job_id> HDDSIZEGB=40 to check.

Actions #13

Updated by JERiveraMoya about 6 years ago

  • Status changed from Blocked to In Progress

No actual works on this task during sprint #14.
Failure is still sporadic. Bug is still open considered by developers with low priority due to seems a memory issue.
Created locally parent image with 40GB and 10 child jobs currently running (most of then atm passing)

Actions #14

Updated by okurz about 6 years ago

  • Subject changed from [sles][functional][sporadic][hard] test fails in snapper_cleanup: btrfs quote rescan times out, maybe extend timeout? to [sles][functional][u][sporadic][hard] test fails in snapper_cleanup: btrfs quote rescan times out, maybe extend timeout?
Actions #15

Updated by mgriessmeier about 6 years ago

  • Due date changed from 2018-04-10 to 2018-04-24
  • Status changed from In Progress to Blocked
Actions #16

Updated by JERiveraMoya about 6 years ago

Currently parent job in osd is configured with 40GB. I tried to change again to 20GB in my local and then I run the same list of children jobs: http://dhcp254.suse.cz/tests/overview?distri=sle&version=15&build=555.1%40test_poo_jeriveramoya_HDDSIZEGB_20&groupid=1 I already see some error so it seems that current setting 40GB is currently helping.

Actions #17

Updated by mgriessmeier about 6 years ago

  • Status changed from Blocked to Resolved

worked for last 18 Days, so let's resolve this

Actions

Also available in: Atom PDF