Project

General

Profile

action #73339

coordination #39719: [saga][epic] Detect "known failures" and mark jobs as such to make tests more stable, reviewing test results and tracking known issues easier

coordination #62420: [epic] Distinguish all types of incompletes

auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*"

Added by Xiaojing_liu 3 months ago. Updated about 2 hours ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2020-10-14
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

https://openqa.suse.de/tests/4820229 shows

Reason: setup failure: Cache service status error from API: Minion job #45813 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.

please see more details on https://openqa.suse.de/tests/4820229/file/worker-log.txt

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label , call

for host in o3 osd; do echo "### $host" && openqa-query-for-job-label poo#73339; done

Acceptance criteria

  • AC1: No perl warning in case of errors

Suggestions

  • Look into the code of lib/OpenQA/CacheService/Task/Asset.pm line 30 and try to prevent the warning, potentially add a proper error message in this condition

Related issues

Related to openQA Project - action #67000: Job incompletes due to malformed worker cache database disk image with auto_review:"Cache service status error.*(database disk image is malformed|Specified job ID is invalid).*":retryResolved2020-05-18

History

#1 Updated by Xiaojing_liu 3 months ago

  • Subject changed from auto_review:"setup failure: Cache service status error from API: Minion job #45813 failed: Can't use an undefined value as a HASH reference at.*" to auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*"

#2 Updated by Xiaojing_liu 3 months ago

  • Subject changed from auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*" to auto_review:"setup failure: Cache service status error from API: Minion job .* failed: Can't use an undefined value as a HASH reference at.*"

#3 Updated by Xiaojing_liu 3 months ago

  • Subject changed from auto_review:"setup failure: Cache service status error from API: Minion job .* failed: Can't use an undefined value as a HASH reference at.*" to auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*"

#4 Updated by okurz 3 months ago

  • Target version set to Ready

#5 Updated by okurz 3 months ago

  • Category set to Concrete Bugs

#6 Updated by okurz 3 months ago

  • Tags set to cache, worker, minion
  • Description updated (diff)
  • Status changed from New to Workable
  • Priority changed from Normal to Low

#7 Updated by kraih 3 months ago

  • Assignee set to kraih

Curious, this is not something that should be possible. I'll have a closer look.

#8 Updated by kraih 3 months ago

  • Status changed from Workable to Feedback

The only condition where i could see this error happening would be if the SQLite database gets deleted right after the job started. Unfortunately i was too late with the investigation, so that the database had already been deleted again a few days after the error occurred. To be sure i've also double checked the Minion::Backend::SQLite code, and it looks fine. This was just bad timing, the SQLite file was deleted before the cache service was stopped.

#9 Updated by okurz 3 months ago

Ok, understood. Would it be possible to just avoid the Perl warning in this case? Something that is a bit more explicit than "Can't use an undefined value"?

#10 Updated by kraih about 2 months ago

It's not a warning but an exception that got thrown when an unexpected condition occurred in the Minion job process. It's not the best error message, but appropriate enough for what happened. Have we actually seen this more than once? Otherwise i'd just say good enough and leave it as is.

#11 Updated by kraih about 2 months ago

Pretty sure the underlying cause for this is our ongoing fight with SQLite corruption, since that's the only case where we delete the SQLite file. Finding a solution for that will probably make this condition impossible.

#12 Updated by okurz about 2 months ago

  • Parent task set to #62420

#13 Updated by okurz about 2 months ago

  • Related to action #67000: Job incompletes due to malformed worker cache database disk image with auto_review:"Cache service status error.*(database disk image is malformed|Specified job ID is invalid).*":retry added

#14 Updated by okurz about 2 months ago

  • Description updated (diff)

kraih wrote:

Have we actually seen this more than once?

Good question. I have added "steps to reproduce" to find any other cases where we linked openQA jobs to this ticket as we can do with all "auto_review" tickets. Did for host in o3 osd; do echo "### $host" && openqa-query-for-job-label poo#73339; done and found:

### o3
1469527|2020-11-13 06:04:23|done|incomplete|gnome|setup failure: Cache service status error from API: Minion job #31656 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469499|2020-11-13 05:51:32|done|incomplete|krypton-live-wayland|setup failure: Cache service status error from API: Minion job #31621 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469435|2020-11-13 04:32:22|done|incomplete|upgrade_staging|setup failure: Cache service status error from API: Minion job #31546 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469409|2020-11-13 03:34:19|done|incomplete|minimalx|setup failure: Cache service status error from API: Minion job #31529 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
### osd
1469527|2020-11-13 06:04:23|done|incomplete|gnome|setup failure: Cache service status error from API: Minion job #31656 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469499|2020-11-13 05:51:32|done|incomplete|krypton-live-wayland|setup failure: Cache service status error from API: Minion job #31621 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469435|2020-11-13 04:32:22|done|incomplete|upgrade_staging|setup failure: Cache service status error from API: Minion job #31546 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469409|2020-11-13 03:34:19|done|incomplete|minimalx|setup failure: Cache service status error from API: Minion job #31529 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1

kraih wrote:

Pretty sure the underlying cause for this is our ongoing fight with SQLite corruption, since that's the only case where we delete the SQLite file. Finding a solution for that will probably make this condition impossible.

Definitely a good idea. I have linked #67000 here. Unless you plan work for this ticket in particular I recommend you set the status to "Blocked" and check the situation again as soon as we have #67000 resolved.

#15 Updated by kraih about 2 months ago

  • Status changed from Feedback to Blocked

#16 Updated by okurz about 2 months ago

kraih as we consider the underlying issue #67000 solved I checked https://progress.opensuse.org/issues/73339#Steps-to-reproduce and found no reference of the issue more recent than 2020-11-13 . Do you have plans to improve the error handling and e.g. prevent the perl warnings mentioned in the initial ticket description? If you see this is not feasible or useful then you can set the ticket to "Resolved".

#17 Updated by kraih about 3 hours ago

  • Status changed from Blocked to Feedback

okurz wrote:

kraih as we consider the underlying issue #67000 solved I checked https://progress.opensuse.org/issues/73339#Steps-to-reproduce and found no reference of the issue more recent than 2020-11-13 . Do you have plans to improve the error handling and e.g. prevent the perl warnings mentioned in the initial ticket description? If you see this is not feasible or useful then you can set the ticket to "Resolved".

Thanks, that suggests we might have resolved the issue together with the SQLite corruption (as expected). It's an exception, not a warning. And i don't think there is any need for further changes. It's a very unusual error and it was properly shown in the logs. I believe we can consider this issue resolved.

#18 Updated by okurz about 2 hours ago

  • Status changed from Feedback to Resolved

agreed

Also available in: Atom PDF