Project

General

Profile

Actions

action #167057

closed

coordination #167054: [epic] Run more workloads in CC-compliant PRG2 to be less affected by CC related network changes

Run more standard, qemu OSD openQA jobs in CC-compliant PRG2 and none in NUE2 size:S

Added by okurz 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2024-09-19
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

non-compliant NUE2 based OSD workers might become problematic due to #165282 but we can not simply connect more PRG2 OSD workers as that overloads the webUI, see #166802, so we need to disable some worker slots in NUE2.

Acceptance criteria

  • AC1: No standard, qemu OSD jobs are executed anymore in NUE2
  • AC2: All jobs commonly scheduled on OSD are still executed

Suggestions


Related issues 2 (2 open0 closed)

Related to openQA Infrastructure - action #166802: Recover worker37, worker38, worker39 size:SBlockedokurz

Actions
Copied to openQA Infrastructure - action #168177: Migrate critical VM based services needing access to CC-services to CC areas size:MBlockedmkittler2024-09-19

Actions
Actions #1

Updated by okurz 2 months ago

  • Related to action #166802: Recover worker37, worker38, worker39 size:S added
Actions #2

Updated by okurz about 2 months ago

  • Subject changed from Run more standard qemu OSD openQA jobs in CC-compliant PRG2 and none in NUE2 to Run more standard, qemu OSD openQA jobs in CC-compliant PRG2 and none in NUE2 size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by okurz about 2 months ago

  • Assignee set to okurz
Actions #5

Updated by okurz about 2 months ago

  • Status changed from Feedback to In Progress

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/901 was merged and is effective. I now pulled out two more commits into https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/903 first and merged. Now merged https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/902 on top. After that was deployed I took the following machines out of production, powered them off and updated racktables accordingly.

For openqaworker-arm-1 I additionally muted the according notification policy in https://monitor.qa.suse.de/alerting/routes with "All times" and mentioned that on racktables

Actions #7

Updated by okurz about 2 months ago

openqaworker-arm-1 long-time alert was still complaining about "no data". I added a nested notification policy with no contact point and all time mute to not get notifications anymore. However the according alert(s) still show up on https://monitor.qa.suse.de/alerting/list?search=health:nodata but I would prefer to not delete the alert definitions completely.

Actions #8

Updated by okurz about 2 months ago

  • Due date deleted (2024-10-09)
  • Status changed from Feedback to Resolved

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/906 merged and deployed. I took sapworker2+3 out of production, powered them off and marked accordingly in racktables and verified that the machines are actually off. https://openqa.suse.de/admin/workers has currently 947 worker instances connected.

Actions #9

Updated by okurz about 1 month ago

  • Copied to action #168177: Migrate critical VM based services needing access to CC-services to CC areas size:M added
Actions

Also available in: Atom PDF