Project

General

Profile

Actions

action #157726

closed

openQA Project (public) - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

openQA Project (public) - coordination #108209: [epic] Reduce load on OSD

osd-deployment | Failed pipeline for master (worker3[6-9].oqa.prg2.suse.org)

Added by livdywan 9 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
2024-03-18
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://gitlab.suse.de/openqa/osd-deployment/-/jobs/2415705

worker37.oqa.prg2.suse.org:
    Minion did not return. [Not connected]
worker36.oqa.prg2.suse.org:
    Minion did not return. [Not connected]
worker38.oqa.prg2.suse.org:
    Minion did not return. [Not connected]
worker39.oqa.prg2.suse.org:
    Minion did not return. [Not connected]

Acceptance criteria

  • AC1: osd-deployment passes again
  • AC1: All w37-w39 run OSD production jobs

Suggestions

Rollback steps


Related issues 4 (2 open2 closed)

Related to openQA Infrastructure (public) - action #157666: OSD unresponsive and then not starting any more jobs on 2024-03-21Resolvedokurz2024-03-12

Actions
Related to openQA Project (public) - coordination #157669: websockets+scheduler improvements to support more online worker instancesNew2023-08-31

Actions
Related to openQA Infrastructure (public) - action #166802: Recover worker37, worker38, worker39 size:SBlockedokurz

Actions
Related to openQA Infrastructure (public) - action #139103: Long OSD ppc64le job queue - Decrease number of x86_64 worker slots on osd to give ppc64le jobs a better chance to be assigned jobs size:MResolvedokurz2023-11-04

Actions
Actions

Also available in: Atom PDF