action #78165
closed
infrastructure task: After osd deployment 2020-11-18 many jobs incomplete with auto_review:"Cache service (status error from API|.*error 500: Internal Server Error)":retry
Added by okurz about 4 years ago.
Updated about 4 years ago.
Description
Observation¶
Many incompletes with "Cache service status error 500: Internal Server Error"
An example:
https://openqa.suse.de/tests/5020263
The worker-log.txt only shows:
[2020-11-18T06:56:31.0997 CET] [debug] [pid:32153] REST-API call: POST http://openqa.suse.de/api/v1/jobs/5019106/status
[2020-11-18T06:56:32.0037 CET] [error] [pid:32153] Unable to setup job 5019106: Cache service status error 500: Internal Server Error
[2020-11-18T06:56:32.0037 CET] [debug] [pid:32153] Stopping job 5019106 from openqa.suse.de: 05019106-sle-15-SP3-Online-x86_64-Build81.1-xfstests_btrfs-generic-001-100@64bit-smp - reason: setup failure
- Copied from action #78163: After OSD upgrade, many jobs incomplete with "Cache service status error 500: Internal Server Error" added
I suggest stop all workers, clean the cache dir and restart. I suggest on osd salt -C 'G@roles:worker' cmd.run 'systemctl stop openqa-worker.target openqa-worker-cacheservice openqa-worker-cacheservice-minion && rm -rf /var/lib/openqa/cache/* && systemctl start openqa-worker.target openqa-worker-cacheservice openqa-worker-cacheservice-minion'
. I did that now on osd.
- Copied to action #78169: after osd-deploy 2020-11-18 incompletes with auto_review:"Cache service (status error from API|.*error 500: Internal Server Error)":retry added
- Subject changed from infrastructure task: After osd deployment 2020-11-18 many jobs incomplete with auto_review:"Cache service status error 500: Internal Server Error" to infrastructure task: After osd deployment 2020-11-18 many jobs incomplete with auto_review:"Cache service.*error 500: Internal Server Error":retry
- Subject changed from infrastructure task: After osd deployment 2020-11-18 many jobs incomplete with auto_review:"Cache service.*error 500: Internal Server Error":retry to infrastructure task: After osd deployment 2020-11-18 many jobs incomplete with auto_review:"Cache service (status error from API|.*error 500: Internal Server Error)":retry
- Status changed from In Progress to Resolved
I have re-enabled the two alerts about "incompletes from last 24h" and also auto-review from today is fine.
host=osd openqa-query-for-job-label 78165
shows reports after my change but not for openqaworker8:
5032329|2020-11-19 05:33:24|done|incomplete|qam-minimal-full|setup failure: Cache service info error 500: Internal Server Error|QA-Power8-5-kvm
5032151|2020-11-19 05:33:18|done|incomplete|offline_sles12sp4_ltss_media_sdk-lp-asmm-contm-lgm-tcm-wsm_all_full|setup failure: Cache service info error 500: Internal Server Error|QA-Power8-5-kvm
5032150|2020-11-19 05:33:05|done|incomplete|offline_sles12sp3_ltss_media_sdk-lp-asmm-contm-lgm-tcm-wsm_all_full|setup failure: Cache service info error 500: Internal Server Error|QA-Power8-5-kvm
5025118|2020-11-18 08:21:22|done|incomplete|home_encrypted|setup failure: Cache service status error 500: Internal Server Error|openqaworker8
5025241|2020-11-18 08:21:19|done|incomplete|migration_zypper_sle15sp1_ha_alpha_node02|setup failure: Cache service status error 500: Internal Server Error|openqaworker8
5025208|2020-11-18 08:21:17|done|incomplete|migration_online_zypper_sles4sap15sp2|setup failure: Cache service status error 500: Internal Server Error|openqaworker8
5025231|2020-11-18 08:21:14|done|incomplete|migration_media+scc_sle12sp5_ha_alpha_node01|setup failure: Cache service status error 500: Internal Server Error|openqaworker8
5025233|2020-11-18 08:21:14|done|incomplete|autoyast_sles4sap_hana|setup failure: Cache service status error 500: Internal Server Error|openqaworker8
5025120|2020-11-18 08:21:13|done|incomplete|migration_online_zypper_sles4sap15|setup failure: Cache service status error 500: Internal Server Error|openqaworker8
5025230|2020-11-18 08:21:12|done|incomplete|ha_textmode_extended|setup failure: Cache service status error 500: Internal Server Error|openqaworker8
moving the "auto_review" keyword back to #78169
Also available in: Atom
PDF