Project

General

Profile

action #78169

coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues

coordination #62420: [epic] Distinguish all types of incompletes

after osd-deploy 2020-11-18 incompletes with auto_review:"Cache service (status error from API|.*error 500: Internal Server Error)":retry

Added by okurz 7 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2020-11-18
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

For example https://openqa.suse.de/tests/5020506 showing
"Reason: setup failure: No workers active in the cache service"

with autoinst-log.txt:

[2020-11-18T06:40:55.0106 CET] [info] [pid:12060] +++ setup notes +++
[2020-11-18T06:40:55.0106 CET] [info] [pid:12060] Running on openqaworker9:17 (Linux 4.12.14-lp151.28.79-default #1 SMP Wed Nov 11 08:17:16 UTC 2020 (472d149) x86_64)
[2020-11-18T06:40:55.0147 CET] [info] [pid:12060] +++ worker notes +++
[2020-11-18T06:40:55.0147 CET] [info] [pid:12060] End time: 2020-11-18 05:40:55
[2020-11-18T06:40:55.0147 CET] [info] [pid:12060] Result: setup failure
[2020-11-18T06:40:55.0152 CET] [info] [pid:13158] Uploading autoinst-log.txt

and worker-log.txt:

[2020-11-18T06:40:55.0106 CET] [debug] [pid:12060] Preparing Mojo::IOLoop::ReadWriteProcess::Session
…
[2020-11-18T06:40:55.0111 CET] [error] [pid:12060] Unable to setup job 5020506: No workers active in the cache service
[2020-11-18T06:40:55.0111 CET] [debug] [pid:12060] Stopping job 5020506 from openqa.suse.de: 05020506-sle-12-SP2-Server-DVD-Incidents-Kernel-KOTD-x86_64-Build4.4.121-261.1.g13f6b6d-ltp_syscalls_pre12sp4@64bit - reason: setup failure
[2020-11-18T06:40:55.0112 CET] [debug] [pid:12060] REST-API call: POST http://openqa.suse.de/api/v1/jobs/5020506/status
[2020-11-18T06:40:55.0152 CET] [info] [pid:13158] Uploading autoinst-log.txt
[2020-11-18T06:40:55.0207 CET] [info] [pid:13158] Uploading worker-log.txt

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
for example to look this ticket #78169 call openqa-query-for-job-label poo#78169

Suggestions

  • Crosscheck what the osd deployment 2020-11-18 could have brought in as changes explaining the problems
  • Lookup in source code what this message could mean

Related issues

Copied from openQA Infrastructure - action #78165: infrastructure task: After osd deployment 2020-11-18 many jobs incomplete with auto_review:"Cache service (status error from API|.*error 500: Internal Server Error)":retryResolved2020-11-18

Copied to openQA Project - action #80202: jobs incomplete with auto_review:"setup failure: No workers active in the cache service":retryResolved2020-11-182020-12-19

Copied to openQA Project - action #80356: incompletes with auto_review:"Cache service.*error: Connection refused":retryWorkable2020-11-18

History

#1 Updated by okurz 7 months ago

  • Copied from action #78165: infrastructure task: After osd deployment 2020-11-18 many jobs incomplete with auto_review:"Cache service (status error from API|.*error 500: Internal Server Error)":retry added

#2 Updated by okurz 7 months ago

  • Status changed from Workable to New

#3 Updated by okurz 7 months ago

  • Subject changed from after osd-deploy 2020-11-18 incompletes with auto_review:"setup failure: No workers active in the cache service":retry to after osd-deploy 2020-11-18 incompletes with auto_review:"Cache service (status error from API|.*error 500: Internal Server Error)":retry

#4 Updated by okurz 7 months ago

  • Description updated (diff)
  • Status changed from New to Workable

#5 Updated by okurz 7 months ago

  • Description updated (diff)

#6 Updated by mkittler 7 months ago

Looks like yet another sympthom of #67000 and #78165 (SQLite database corruption). In the "No workers active in the cache service" case openqa-worker-cacheservice was likely ok again but openqa-worker-cacheservice-minion still in an error state.

#7 Updated by okurz 7 months ago

  • Status changed from Workable to Blocked
  • Assignee set to mkittler

ok, could you please track as blocked by #67000 and check after deployment of all relevant changes that all according errors are gone, e.g. by looking into the "summary" steps of the "auto-review" gitlab CI pipelines?

#8 Updated by okurz 7 months ago

  • Copied to action #80202: jobs incomplete with auto_review:"setup failure: No workers active in the cache service":retry added

#9 Updated by okurz 7 months ago

  • Copied to action #80356: incompletes with auto_review:"Cache service.*error: Connection refused":retry added

#10 Updated by okurz 7 months ago

  • Parent task set to #62420

#11 Updated by mkittler 7 months ago

  • Status changed from Blocked to Resolved

Closing because #67000 has been resolved. I don't see any workarounds mentioned in this ticket which needed to be reverted.

Also available in: Atom PDF