Project

General

Profile

Actions

action #178204

open

Reduce test start time on openqa.suse.de

Added by gpuliti about 17 hours ago. Updated 17 minutes ago.

Status:
In Progress
Priority:
High
Assignee:
Category:
Regressions/Crashes
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://monitor.qa.suse.de/d/7W06NBWGk/job-age?orgId=1&from=2025-03-03T02:45:29.209Z&to=2025-03-03T06:58:26.736Z&timezone=UTC

Relevant panel: https://monitor.qa.suse.de/d/7W06NBWGk/job-age?viewPanel=panel-5&orgId=1&from=2025-03-01T19%3A35%3A43.674Z&to=2025-03-04T06%3A19%3A31.656Z&timezone=utc

Based on observations there are recurring alerts indicating long wait times before execution.

gpuliti preferred to not silence the alert since is not that common yet, at least in the last week, but we should try to optimize test scheduling to reduce waiting times.

The main offender seem to be jobs with a worker class config that can never be picked up as there are no workers for "qemu_x86_64,intel,tap", scheduled by "QE Security"

Suggestions

  • are there any bottlenecks?
  • #73174

Rollback actions


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #73174: [osd][alert] Job age (scheduled) (median) alertResolvedokurz2020-10-09

Actions
Actions #1

Updated by gpuliti about 17 hours ago

  • Copied from action #174235: Cover code of os-autoinst path script/os-autoinst-openvswitch fully (statement coverage) size:S added
Actions #2

Updated by gpuliti about 17 hours ago

  • Copied from deleted (action #174235: Cover code of os-autoinst path script/os-autoinst-openvswitch fully (statement coverage) size:S)
Actions #3

Updated by okurz about 15 hours ago

  • Tags set to osd, infra, administration, openqa, tests
  • Project changed from openQA Project (public) to openQA Infrastructure (public)
  • Category changed from Regressions/Crashes to Regressions/Crashes
  • Priority changed from Normal to Urgent

Made urgent as is this is related to a recent alert and not silenced and no mitigation applied yet

Actions #4

Updated by mkittler about 2 hours ago

I mentioned the problematic old jobs on #eng-testing:

There are jobs scheduled on OSD with the worker class qemu_x86_64,intel,tap. Those cannot be scheduled because the combination intel,tap doesn't exist at the moment. I suppose qesapworker-prgX workers would in theory provide that but the tap worker class is disabled there as tap_secondary. Not sure what the best solution is.

There is also a s390-kvm,tap job which is also a combination that doesn't exist.

Actions #5

Updated by mkittler about 2 hours ago

  • Description updated (diff)
Actions #6

Updated by okurz 27 minutes ago

  • Status changed from New to In Progress
  • Assignee set to okurz
Actions #7

Updated by okurz 17 minutes ago

  • Description updated (diff)
  • Priority changed from Urgent to High
Actions #8

Updated by okurz 6 minutes ago

  • Related to action #73174: [osd][alert] Job age (scheduled) (median) alert added
Actions

Also available in: Atom PDF