action #73339
closed
coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues
coordination #62420: [epic] Distinguish all types of incompletes
auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*"
Added by Xiaojing_liu about 4 years ago.
Updated almost 4 years ago.
Category:
Regressions/Crashes
- Subject changed from auto_review:"setup failure: Cache service status error from API: Minion job #45813 failed: Can't use an undefined value as a HASH reference at.*" to auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*"
- Subject changed from auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*" to auto_review:"setup failure: Cache service status error from API: Minion job .* failed: Can't use an undefined value as a HASH reference at.*"
- Subject changed from auto_review:"setup failure: Cache service status error from API: Minion job .* failed: Can't use an undefined value as a HASH reference at.*" to auto_review:"setup failure: Cache service status error from API: Minion job.* failed: Can't use an undefined value as a HASH reference at.*"
- Target version set to Ready
- Category set to Regressions/Crashes
- Tags set to cache, worker, minion
- Description updated (diff)
- Status changed from New to Workable
- Priority changed from Normal to Low
Curious, this is not something that should be possible. I'll have a closer look.
- Status changed from Workable to Feedback
The only condition where i could see this error happening would be if the SQLite database gets deleted right after the job started. Unfortunately i was too late with the investigation, so that the database had already been deleted again a few days after the error occurred. To be sure i've also double checked the Minion::Backend::SQLite
code, and it looks fine. This was just bad timing, the SQLite file was deleted before the cache service was stopped.
Ok, understood. Would it be possible to just avoid the Perl warning in this case? Something that is a bit more explicit than "Can't use an undefined value"?
It's not a warning but an exception that got thrown when an unexpected condition occurred in the Minion job process. It's not the best error message, but appropriate enough for what happened. Have we actually seen this more than once? Otherwise i'd just say good enough and leave it as is.
Pretty sure the underlying cause for this is our ongoing fight with SQLite corruption, since that's the only case where we delete the SQLite file. Finding a solution for that will probably make this condition impossible.
- Parent task set to #62420
- Related to action #67000: Job incompletes due to malformed worker cache database disk image with auto_review:"Cache service status error.*(database disk image is malformed|Specified job ID is invalid).*":retry added
- Description updated (diff)
kraih wrote:
Have we actually seen this more than once?
Good question. I have added "steps to reproduce" to find any other cases where we linked openQA jobs to this ticket as we can do with all "auto_review" tickets. Did for host in o3 osd; do echo "### $host" && openqa-query-for-job-label poo#73339; done
and found:
### o3
1469527|2020-11-13 06:04:23|done|incomplete|gnome|setup failure: Cache service status error from API: Minion job #31656 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469499|2020-11-13 05:51:32|done|incomplete|krypton-live-wayland|setup failure: Cache service status error from API: Minion job #31621 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469435|2020-11-13 04:32:22|done|incomplete|upgrade_staging|setup failure: Cache service status error from API: Minion job #31546 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469409|2020-11-13 03:34:19|done|incomplete|minimalx|setup failure: Cache service status error from API: Minion job #31529 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
### osd
1469527|2020-11-13 06:04:23|done|incomplete|gnome|setup failure: Cache service status error from API: Minion job #31656 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469499|2020-11-13 05:51:32|done|incomplete|krypton-live-wayland|setup failure: Cache service status error from API: Minion job #31621 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469435|2020-11-13 04:32:22|done|incomplete|upgrade_staging|setup failure: Cache service status error from API: Minion job #31546 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
1469409|2020-11-13 03:34:19|done|incomplete|minimalx|setup failure: Cache service status error from API: Minion job #31529 failed: Can't use an undefined value as a HASH reference at /usr/share/openqa/script/../lib/OpenQA/CacheService/Task/Asset.pm line 30.
|openqaworker1
kraih wrote:
Pretty sure the underlying cause for this is our ongoing fight with SQLite corruption, since that's the only case where we delete the SQLite file. Finding a solution for that will probably make this condition impossible.
Definitely a good idea. I have linked #67000 here. Unless you plan work for this ticket in particular I recommend you set the status to "Blocked" and check the situation again as soon as we have #67000 resolved.
- Status changed from Feedback to Blocked
@kraih as we consider the underlying issue #67000 solved I checked https://progress.opensuse.org/issues/73339#Steps-to-reproduce and found no reference of the issue more recent than 2020-11-13 . Do you have plans to improve the error handling and e.g. prevent the perl warnings mentioned in the initial ticket description? If you see this is not feasible or useful then you can set the ticket to "Resolved".
- Status changed from Blocked to Feedback
okurz wrote:
@kraih as we consider the underlying issue #67000 solved I checked https://progress.opensuse.org/issues/73339#Steps-to-reproduce and found no reference of the issue more recent than 2020-11-13 . Do you have plans to improve the error handling and e.g. prevent the perl warnings mentioned in the initial ticket description? If you see this is not feasible or useful then you can set the ticket to "Resolved".
Thanks, that suggests we might have resolved the issue together with the SQLite corruption (as expected). It's an exception, not a warning. And i don't think there is any need for further changes. It's a very unusual error and it was properly shown in the logs. I believe we can consider this issue resolved.
- Status changed from Feedback to Resolved
Also available in: Atom
PDF