Project

General

Profile

Actions

action #98682

open

jobs run powerqaworker-qam-1 fail with auto_review:"(?s)powerqaworker-qam-1.*Can't write to file (.*): No space left on device at .*":retry size:M

Added by okurz over 2 years ago. Updated about 1 year ago.

Status:
Workable
Priority:
Low
Assignee:
-
Category:
-
Target version:
Start date:
2021-09-15
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

See https://openqa.suse.de/tests/7118230
reported by mdoucha in
https://suse.slack.com/archives/C02CANHLANP/p1631721593278400?thread_ts=1631721593.278400&cid=C02CANHLANP

libpng error: Write Error
[2021-09-15T17:46:56.030 CEST] [info] ::: basetest::runtest: # Test died: 
Can't write to file "testresults/vfat_gf11-6.txt": No space left on device at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 143.

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#98682

Suggestion

  • DONE: As first mitigation try to use auto-review
  • Check #97139 and see if they're the same
  • See if mitigation is the same or different for this machine, e.g. reduce number of worker numbers

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #97139: [alert] multiple unhandled alerts about "malbec: Memory usage alert" size:MResolvedmkittler2021-08-182021-09-09

Actions
Actions #1

Updated by okurz over 2 years ago

  • Related to action #97139: [alert] multiple unhandled alerts about "malbec: Memory usage alert" size:M added
Actions #2

Updated by livdywan over 2 years ago

  • Subject changed from powerqaworker-qam-1 has been producing "No space left on device" errors again today to powerqaworker-qam-1 has been producing "No space left on device" errors again today size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by okurz over 2 years ago

  • Description updated (diff)
Actions #4

Updated by Xiaojing_liu over 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to Xiaojing_liu
Actions #5

Updated by openqa_review over 2 years ago

  • Due date set to 2021-10-02

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by Xiaojing_liu over 2 years ago

  • Due date deleted (2021-10-02)

Prepare a PR to re-run the jobs: https://github.com/os-autoinst/scripts/pull/111
Prepare a MR to reduce the number of workers in powerqaworker-qam-1: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/356

Actions #7

Updated by Xiaojing_liu over 2 years ago

  • Due date set to 2021-10-02
Actions #8

Updated by okurz over 2 years ago

As discussed in weekly unblock please just adjust the ticket subject line to have a single regex according to the format of https://github.com/os-autoinst/scripts#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger and put the ticket to "Feedback" with lower prio.

Actions #9

Updated by Xiaojing_liu over 2 years ago

  • Subject changed from powerqaworker-qam-1 has been producing "No space left on device" errors again today size:M to powerqaworker-qam-1 has been producing "No space left on device" errors with auto_review:"Can't write to file (.*): No space left on device at .*":retry
  • Due date deleted (2021-10-02)
  • Status changed from In Progress to Feedback
  • Assignee deleted (Xiaojing_liu)
  • Priority changed from High to Low
Actions #10

Updated by Xiaojing_liu over 2 years ago

  • Subject changed from powerqaworker-qam-1 has been producing "No space left on device" errors with auto_review:"Can't write to file (.*): No space left on device at .*":retry to jobs run powerqaworker-qam-1 fail with auto_review:"Can't write to file (.*): No space left on device at .*":retry
Actions #11

Updated by okurz over 2 years ago

  • Subject changed from jobs run powerqaworker-qam-1 fail with auto_review:"Can't write to file (.*): No space left on device at .*":retry to jobs run powerqaworker-qam-1 fail with auto_review:"(?s)powerqaworker-qam-1.*Can't write to file (.*): No space left on device at .*":retry size:M
  • Description updated (diff)
  • Status changed from Feedback to Workable

please no ticket without assignee. Someone should track it with a reasonable due date. But also there are more tasks to follow. Specifying regex to match only on powerqaworker-qam-1, not all "No space left" problems

Actions #12

Updated by okurz over 2 years ago

  • Target version changed from Ready to future
Actions #13

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp4_ltss_pscc_sdk-lp-asmm-contm-lgm-tcm-wsm_all_full
https://openqa.suse.de/tests/7363105

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #14

Updated by okurz over 2 years ago

Hm, I thought that if jobs are retried correctly we should not receive reminder comments as above. https://openqa.suse.de/tests/7369120 failed due to the problem reported in this ticket but there is a clone which was fine. Looking into the history of jobs on this worker https://openqa.suse.de/admin/workers/832 I see most is good. Let's see when this happens again.

Actions #15

Updated by openqa_review almost 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles15sp3_pscc_lp-basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm-pcm_all_full_console
https://openqa.suse.de/tests/8758753

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #16

Updated by okurz about 1 year ago

  • Tags set to infra
$ openqa-query-for-job-label poo#98682
10568085|2023-02-23 16:11:08|done|incomplete|install_ltp+sle+Server-DVD-Incidents-Kernel|terminated prematurely: Encountered corrupted state file: No space left on device, see log output for details|powerqaworker-qam-1
10548108|2023-02-21 08:43:58|done|failed|create_hdd_gnome_qr_sap|done: terminated with corrupted state file: No space left on device|powerqaworker-qam-1
10548089|2023-02-21 08:40:23|done|failed|create_hdd_gnome_qr_sap|isotovideo done: Can't locate auto/NetAddr/IP/InetBase/AF_INET6.al in @INC (@INC contains: . sle/lib /var/lib/openqa/pool/4/blib/arch /var/lib/openqa/pool/4/blib/lib /usr/lib/os-autoinst /usr/lib/perl5/site_perl/5.26.1/ppc64le-linux-thread-multi /usr/lib/perl5/site_perl/5.26.1 /usr/lib/perl5/vendor_…|powerqaworker-qam-1
10548083|2023-02-21 08:36:42|done|failed|create_hdd_gnome_qr_sap|isotovideo done: Can't locate auto/NetAddr/IP/InetBase/AF_INET6.al in @INC (@INC contains: . sle/lib /var/lib/openqa/pool/4/blib/arch /var/lib/openqa/pool/4/blib/lib /usr/lib/os-autoinst /usr/lib/perl5/site_perl/5.26.1/ppc64le-linux-thread-multi /usr/lib/perl5/site_perl/5.26.1 /usr/lib/perl5/vendor_…|powerqaworker-qam-1
10548074|2023-02-21 08:31:39|done|failed|create_hdd_gnome_qr_sap|done: terminated with corrupted state file: No space left on device|powerqaworker-qam-1
10548057|2023-02-21 08:26:32|done|failed|create_hdd_gnome_qr_sap|done: terminated with corrupted state file: No space left on device|powerqaworker-qam-1

still a thing

Actions

Also available in: Atom PDF