Project

General

Profile

Actions

action #73339

closed

coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues

coordination #62420: [epic] Distinguish all types of incompletes

auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*"

Added by Xiaojing_liu over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2020-10-14
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://openqa.suse.de/tests/4820229 shows

Reason: setup failure: Cache service status error from API: Minion job #45813 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.

please see more details on https://openqa.suse.de/tests/4820229/file/worker-log.txt

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label , call

for host in o3 osd; do echo "### $host" && openqa-query-for-job-label poo#73339; done

Acceptance criteria

  • AC1: No perl warning in case of errors

Suggestions

  • Look into the code of lib/OpenQA/CacheService/Task/Asset.pm line 30 and try to prevent the warning, potentially add a proper error message in this condition

Related issues 1 (0 open1 closed)

Related to openQA Project - action #67000: Job incompletes due to malformed worker cache database disk image with auto_review:"Cache service status error.*(database disk image is malformed|Specified job ID is invalid).*":retryResolvedmkittler2020-05-18

Actions
Actions #1

Updated by Xiaojing_liu over 3 years ago

  • Subject changed from auto_review:"setup failure: Cache service status error from API: Minion job #45813 failed: Can't use an undefined value as a HASH reference at.*" to auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*"
Actions #2

Updated by Xiaojing_liu over 3 years ago

  • Subject changed from auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*" to auto_review:"setup failure: Cache service status error from API: Minion job .* failed: Can't use an undefined value as a HASH reference at.*"
Actions #3

Updated by Xiaojing_liu over 3 years ago

  • Subject changed from auto_review:"setup failure: Cache service status error from API: Minion job .* failed: Can't use an undefined value as a HASH reference at.*" to auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*"
Actions #4

Updated by okurz over 3 years ago

  • Target version set to Ready
Actions #5

Updated by okurz over 3 years ago

  • Category set to Regressions/Crashes
Actions #6

Updated by okurz over 3 years ago

  • Tags set to cache, worker, minion
  • Description updated (diff)
  • Status changed from New to Workable
  • Priority changed from Normal to Low
Actions #7

Updated by kraih over 3 years ago

  • Assignee set to kraih

Curious, this is not something that should be possible. I'll have a closer look.

Actions #8

Updated by kraih over 3 years ago

  • Status changed from Workable to Feedback

The only condition where i could see this error happening would be if the SQLite database gets deleted right after the job started. Unfortunately i was too late with the investigation, so that the database had already been deleted again a few days after the error occurred. To be sure i've also double checked the Minion::Backend::SQLite code, and it looks fine. This was just bad timing, the SQLite file was deleted before the cache service was stopped.

Actions #9

Updated by okurz over 3 years ago

Ok, understood. Would it be possible to just avoid the Perl warning in this case? Something that is a bit more explicit than "Can't use an undefined value"?

Actions #10

Updated by kraih over 3 years ago

It's not a warning but an exception that got thrown when an unexpected condition occurred in the Minion job process. It's not the best error message, but appropriate enough for what happened. Have we actually seen this more than once? Otherwise i'd just say good enough and leave it as is.

Actions #11

Updated by kraih over 3 years ago

Pretty sure the underlying cause for this is our ongoing fight with SQLite corruption, since that's the only case where we delete the SQLite file. Finding a solution for that will probably make this condition impossible.

Actions #12

Updated by okurz over 3 years ago

  • Parent task set to #62420
Actions #13

Updated by okurz over 3 years ago

  • Related to action #67000: Job incompletes due to malformed worker cache database disk image with auto_review:"Cache service status error.*(database disk image is malformed|Specified job ID is invalid).*":retry added
Actions #14

Updated by okurz over 3 years ago

  • Description updated (diff)

kraih wrote:

Have we actually seen this more than once?

Good question. I have added "steps to reproduce" to find any other cases where we linked openQA jobs to this ticket as we can do with all "auto_review" tickets. Did for host in o3 osd; do echo "### $host" && openqa-query-for-job-label poo#73339; done and found:

### o3
1469527|2020-11-13 06:04:23|done|incomplete|gnome|setup failure: Cache service status error from API: Minion job #31656 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469499|2020-11-13 05:51:32|done|incomplete|krypton-live-wayland|setup failure: Cache service status error from API: Minion job #31621 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469435|2020-11-13 04:32:22|done|incomplete|upgrade_staging|setup failure: Cache service status error from API: Minion job #31546 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469409|2020-11-13 03:34:19|done|incomplete|minimalx|setup failure: Cache service status error from API: Minion job #31529 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
### osd
1469527|2020-11-13 06:04:23|done|incomplete|gnome|setup failure: Cache service status error from API: Minion job #31656 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469499|2020-11-13 05:51:32|done|incomplete|krypton-live-wayland|setup failure: Cache service status error from API: Minion job #31621 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469435|2020-11-13 04:32:22|done|incomplete|upgrade_staging|setup failure: Cache service status error from API: Minion job #31546 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469409|2020-11-13 03:34:19|done|incomplete|minimalx|setup failure: Cache service status error from API: Minion job #31529 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1

kraih wrote:

Pretty sure the underlying cause for this is our ongoing fight with SQLite corruption, since that's the only case where we delete the SQLite file. Finding a solution for that will probably make this condition impossible.

Definitely a good idea. I have linked #67000 here. Unless you plan work for this ticket in particular I recommend you set the status to "Blocked" and check the situation again as soon as we have #67000 resolved.

Actions #15

Updated by kraih over 3 years ago

  • Status changed from Feedback to Blocked
Actions #16

Updated by okurz over 3 years ago

@kraih as we consider the underlying issue #67000 solved I checked https://progress.opensuse.org/issues/73339#Steps-to-reproduce and found no reference of the issue more recent than 2020-11-13 . Do you have plans to improve the error handling and e.g. prevent the perl warnings mentioned in the initial ticket description? If you see this is not feasible or useful then you can set the ticket to "Resolved".

Actions #17

Updated by kraih about 3 years ago

  • Status changed from Blocked to Feedback

okurz wrote:

@kraih as we consider the underlying issue #67000 solved I checked https://progress.opensuse.org/issues/73339#Steps-to-reproduce and found no reference of the issue more recent than 2020-11-13 . Do you have plans to improve the error handling and e.g. prevent the perl warnings mentioned in the initial ticket description? If you see this is not feasible or useful then you can set the ticket to "Resolved".

Thanks, that suggests we might have resolved the issue together with the SQLite corruption (as expected). It's an exception, not a warning. And i don't think there is any need for further changes. It's a very unusual error and it was properly shown in the logs. I believe we can consider this issue resolved.

Actions #18

Updated by okurz about 3 years ago

  • Status changed from Feedback to Resolved

agreed

Actions

Also available in: Atom PDF