action #73339
coordination #39719: [saga][epic] Detect "known failures" and mark jobs as such to make tests more stable, reviewing test results and tracking known issues easier
coordination #62420: [epic] Distinguish all types of incompletes
auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*"
0%
Description
Observation¶
https://openqa.suse.de/tests/4820229 shows
Reason: setup failure: Cache service status error from API: Minion job #45813 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
please see more details on https://openqa.suse.de/tests/4820229/file/worker-log.txt
Steps to reproduce¶
Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label , call
for host in o3 osd; do echo "### $host" && openqa-query-for-job-label poo#73339; done
Acceptance criteria¶
- AC1: No perl warning in case of errors
Suggestions¶
- Look into the code of lib/OpenQA/CacheService/Task/Asset.pm line 30 and try to prevent the warning, potentially add a proper error message in this condition
Related issues
History
#1
Updated by Xiaojing_liu 3 months ago
- Subject changed from auto_review:"setup failure: Cache service status error from API: Minion job #45813 failed: Can't use an undefined value as a HASH reference at.*" to auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*"
#2
Updated by Xiaojing_liu 3 months ago
- Subject changed from auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*" to auto_review:"setup failure: Cache service status error from API: Minion job .* failed: Can't use an undefined value as a HASH reference at.*"
#3
Updated by Xiaojing_liu 3 months ago
- Subject changed from auto_review:"setup failure: Cache service status error from API: Minion job .* failed: Can't use an undefined value as a HASH reference at.*" to auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*"
#8
Updated by kraih 3 months ago
- Status changed from Workable to Feedback
The only condition where i could see this error happening would be if the SQLite database gets deleted right after the job started. Unfortunately i was too late with the investigation, so that the database had already been deleted again a few days after the error occurred. To be sure i've also double checked the Minion::Backend::SQLite
code, and it looks fine. This was just bad timing, the SQLite file was deleted before the cache service was stopped.
#10
Updated by kraih about 2 months ago
It's not a warning but an exception that got thrown when an unexpected condition occurred in the Minion job process. It's not the best error message, but appropriate enough for what happened. Have we actually seen this more than once? Otherwise i'd just say good enough and leave it as is.
#11
Updated by kraih about 2 months ago
Pretty sure the underlying cause for this is our ongoing fight with SQLite corruption, since that's the only case where we delete the SQLite file. Finding a solution for that will probably make this condition impossible.
#12
Updated by okurz about 2 months ago
- Parent task set to #62420
#13
Updated by okurz about 2 months ago
- Related to action #67000: Job incompletes due to malformed worker cache database disk image with auto_review:"Cache service status error.*(database disk image is malformed|Specified job ID is invalid).*":retry added
#14
Updated by okurz about 2 months ago
- Description updated (diff)
kraih wrote:
Have we actually seen this more than once?
Good question. I have added "steps to reproduce" to find any other cases where we linked openQA jobs to this ticket as we can do with all "auto_review" tickets. Did for host in o3 osd; do echo "### $host" && openqa-query-for-job-label poo#73339; done
and found:
### o3 1469527|2020-11-13 06:04:23|done|incomplete|gnome|setup failure: Cache service status error from API: Minion job #31656 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30. |openqaworker1 1469499|2020-11-13 05:51:32|done|incomplete|krypton-live-wayland|setup failure: Cache service status error from API: Minion job #31621 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30. |openqaworker1 1469435|2020-11-13 04:32:22|done|incomplete|upgrade_staging|setup failure: Cache service status error from API: Minion job #31546 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30. |openqaworker1 1469409|2020-11-13 03:34:19|done|incomplete|minimalx|setup failure: Cache service status error from API: Minion job #31529 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30. |openqaworker1 ### osd 1469527|2020-11-13 06:04:23|done|incomplete|gnome|setup failure: Cache service status error from API: Minion job #31656 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30. |openqaworker1 1469499|2020-11-13 05:51:32|done|incomplete|krypton-live-wayland|setup failure: Cache service status error from API: Minion job #31621 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30. |openqaworker1 1469435|2020-11-13 04:32:22|done|incomplete|upgrade_staging|setup failure: Cache service status error from API: Minion job #31546 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30. |openqaworker1 1469409|2020-11-13 03:34:19|done|incomplete|minimalx|setup failure: Cache service status error from API: Minion job #31529 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30. |openqaworker1
kraih wrote:
Pretty sure the underlying cause for this is our ongoing fight with SQLite corruption, since that's the only case where we delete the SQLite file. Finding a solution for that will probably make this condition impossible.
Definitely a good idea. I have linked #67000 here. Unless you plan work for this ticket in particular I recommend you set the status to "Blocked" and check the situation again as soon as we have #67000 resolved.
#15
Updated by kraih about 2 months ago
- Status changed from Feedback to Blocked
#16
Updated by okurz about 2 months ago
kraih as we consider the underlying issue #67000 solved I checked https://progress.opensuse.org/issues/73339#Steps-to-reproduce and found no reference of the issue more recent than 2020-11-13 . Do you have plans to improve the error handling and e.g. prevent the perl warnings mentioned in the initial ticket description? If you see this is not feasible or useful then you can set the ticket to "Resolved".
#17
Updated by kraih about 3 hours ago
- Status changed from Blocked to Feedback
okurz wrote:
kraih as we consider the underlying issue #67000 solved I checked https://progress.opensuse.org/issues/73339#Steps-to-reproduce and found no reference of the issue more recent than 2020-11-13 . Do you have plans to improve the error handling and e.g. prevent the perl warnings mentioned in the initial ticket description? If you see this is not feasible or useful then you can set the ticket to "Resolved".
Thanks, that suggests we might have resolved the issue together with the SQLite corruption (as expected). It's an exception, not a warning. And i don't think there is any need for further changes. It's a very unusual error and it was properly shown in the logs. I believe we can consider this issue resolved.