Project

General

Profile

Actions

action #125276

closed

coordination #103944: [saga][epic] Scale up: More robust handling of diverse infrastructure with varying performance

coordination #98463: [epic] Avoid too slow asset downloads leading to jobs exceeding the timeout with or run into auto_review:"(timeout: setup exceeded MAX_SETUP_TIME|Cache service queue already full)":retry

Ensure that the incomplete jobs with "cache service full" are properly restarted size:M

Added by okurz about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-03-02
Due date:
% Done:

0%

Estimated time:

Description

Motivation

We have been getting reports of jobs with "cache service full" having to be manually restarted, which is causing quite a bit of work for some people.

Acceptance criteria

  • AC1: Users don't see incomplete jobs with "cache service full" on their build status page, e.g. jobs are automatically restarted

Acceptance tests

  • AC1: Redo queries from #125276#note-3 to see that all "cache service full" jobs have been cloned and none had to be cloned by users

Suggestion

  • Make error of automatic invocation of auto_duplicate somewhere visible, e.g. by doing the restart within a Minion job (which then might fail and has the error message)
  • Fix the actual issue of restarting once it could be pinned down
  • Make sure the new Minion jobs don't fail later in production once this problem has been resolved
Actions

Also available in: Atom PDF