action #159348: s390x kvm jobs incomplete with auto_review:"cache failure: Failed to send asset request for SLE-Micro-.*Cache service enqueue error 500: Internal Server Error" size:M - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #159348

closed

s390x kvm jobs incomplete with auto_review:"cache failure: Failed to send asset request for SLE-Micro-.*Cache service enqueue error 500: Internal Server Error" size:M

Added by okurz about 1 year ago. Updated about 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

mkittler

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2024-04-21

Due date:

% Done:

Estimated time:

Tags:

incomplete, reactive work, asset handling, worker cache

Description

Observation¶

https://openqa.suse.de/tests/14103039 incomplete with auto_review:"cache failure: Failed to send asset request for SLE-Micro-.*Cache service enqueue error 500: Internal Server Error". Similar in multiple other jobs on at least the instance worker40:4. So there seems to be a problem in handling that in the cache service.
https://openqa.suse.de/admin/workers/3090 shows multiple tens of incomplete jobs with the same reason.

Steps to reproduce¶

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label 159348

Acceptance criteria¶

AC1: No more references to this ticket from openqa-query-for-job-label

Suggestions¶

Find out if the issues are specific to the arch or product
Maybe related to recent changes with regard to git

Actions

Copy link

Updated by nicksinger about 1 year ago · Edited

This ticket is about the error 500 in the cache service, right? Because repairing the instances will be done in #158170

Actions

Copy link

Updated by okurz about 1 year ago

Subject changed from s390x kvm jobs incomplete with auto_review:"cache failure: Failed to send asset request for SLE-Micro-.*Cache service enqueue error 500: Internal Server Error" to s390x kvm jobs incomplete with auto_review:"cache failure: Failed to send asset request for SLE-Micro-.*Cache service enqueue error 500: Internal Server Error" size:M
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by okurz about 1 year ago

Priority changed from High to Normal

Actions

Copy link

Updated by mkittler about 1 year ago

Status changed from Workable to In Progress
Assignee set to mkittler

Actions

Copy link

Updated by mkittler about 1 year ago · Edited

Description updated (diff)

openqa-query-for-job-label 159348 only shows the job already mentioned in the ticket description. So I used select id, t_finished, result, (select host from workers where workers.id = jobs.assigned_worker_id) as host, reason from jobs where reason ilike '%Cache service enqueue error 500: Internal Server Error%' order by t_finished desc; instead. It is definitely notable that all those jobs ran on worker40. The most recent job is 14103490 from 2024-04-20 23:44:39 and the oldest still relevant is 14083560 from 2024-04-20 05:00:56. So the problem persisted for many hours and was maybe only resolved by the next reboot on 2024-04-21 03:34. Unfortunately logs from that timeframe are gone so I can't tell what was going on. The minion dashboard also doesn't show any relevant jobs anymore (although the problem was probably not with the job execution anyway but with the minion web application).

Actions

Copy link

Updated by mkittler about 1 year ago

Status changed from In Progress to Resolved

Considering the job history looks good on https://openqa.suse.de/admin/workers/3090 and AC1 is fulfilled I'm resolving this ticket. If this happens again we have to be a bit faster (or at least adding relevant logs when creating the ticket).

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #159348

s390x kvm jobs incomplete with auto_review:"cache failure: Failed to send asset request for SLE-Micro-.*Cache service enqueue error 500: Internal Server Error" size:M

Observation¶

Steps to reproduce¶

Acceptance criteria¶

Suggestions¶

Updated by nicksinger about 1 year ago · Edited

Updated by okurz about 1 year ago

Updated by okurz about 1 year ago

Updated by mkittler about 1 year ago

Updated by mkittler about 1 year ago · Edited

Updated by mkittler about 1 year ago