Project

General

Profile

Actions

action #159348

closed

s390x kvm jobs incomplete with auto_review:"cache failure: Failed to send asset request for SLE-Micro-.*Cache service enqueue error 500: Internal Server Error" size:M

Added by okurz 12 days ago. Updated 1 day ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-04-21
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://openqa.suse.de/tests/14103039 incomplete with auto_review:"cache failure: Failed to send asset request for SLE-Micro-.*Cache service enqueue error 500: Internal Server Error". Similar in multiple other jobs on at least the instance worker40:4. So there seems to be a problem in handling that in the cache service.
https://openqa.suse.de/admin/workers/3090 shows multiple tens of incomplete jobs with the same reason.

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label 159348

Acceptance criteria

  • AC1: No more references to this ticket from openqa-query-for-job-label

Suggestions

  • Find out if the issues are specific to the arch or product
  • Maybe related to recent changes with regard to git
Actions #1

Updated by nicksinger 12 days ago · Edited

This ticket is about the error 500 in the cache service, right? Because repairing the instances will be done in #158170

Actions #2

Updated by okurz 9 days ago

  • Subject changed from s390x kvm jobs incomplete with auto_review:"cache failure: Failed to send asset request for SLE-Micro-.*Cache service enqueue error 500: Internal Server Error" to s390x kvm jobs incomplete with auto_review:"cache failure: Failed to send asset request for SLE-Micro-.*Cache service enqueue error 500: Internal Server Error" size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by okurz 9 days ago

  • Priority changed from High to Normal
Actions #4

Updated by mkittler 1 day ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
Actions #5

Updated by mkittler 1 day ago · Edited

  • Description updated (diff)

openqa-query-for-job-label 159348 only shows the job already mentioned in the ticket description. So I used select id, t_finished, result, (select host from workers where workers.id = jobs.assigned_worker_id) as host, reason from jobs where reason ilike '%Cache service enqueue error 500: Internal Server Error%' order by t_finished desc; instead. It is definitely notable that all those jobs ran on worker40. The most recent job is 14103490 from 2024-04-20 23:44:39 and the oldest still relevant is 14083560 from 2024-04-20 05:00:56. So the problem persisted for many hours and was maybe only resolved by the next reboot on 2024-04-21 03:34. Unfortunately logs from that timeframe are gone so I can't tell what was going on. The minion dashboard also doesn't show any relevant jobs anymore (although the problem was probably not with the job execution anyway but with the minion web application).

Actions #6

Updated by mkittler 1 day ago

  • Status changed from In Progress to Resolved

Considering the job history looks good on https://openqa.suse.de/admin/workers/3090 and AC1 is fulfilled I'm resolving this ticket. If this happens again we have to be a bit faster (or at least adding relevant logs when creating the ticket).

Actions

Also available in: Atom PDF