action #178204
openReduce test start time on openqa.suse.de
0%
Description
Observation¶
Based on observations there are recurring alerts indicating long wait times before execution.
gpuliti preferred to not silence the alert since is not that common yet, at least in the last week, but we should try to optimize test scheduling to reduce waiting times.
The main offender seem to be jobs with a worker class config that can never be picked up as there are no workers for "qemu_x86_64,intel,tap", scheduled by "QE Security"
Suggestions¶
- are there any bottlenecks?
- #73174
Rollback actions¶
- Remove silence from https://monitor.qa.suse.de/alerting/silences?alertmanager=grafana
alertname=Job age (scheduled) (median) alert
Updated by gpuliti about 17 hours ago
- Copied from action #174235: Cover code of os-autoinst path script/os-autoinst-openvswitch fully (statement coverage) size:S added
Updated by gpuliti about 17 hours ago
- Copied from deleted (action #174235: Cover code of os-autoinst path script/os-autoinst-openvswitch fully (statement coverage) size:S)
Updated by okurz about 15 hours ago
- Tags set to osd, infra, administration, openqa, tests
- Project changed from openQA Project (public) to openQA Infrastructure (public)
- Category changed from Regressions/Crashes to Regressions/Crashes
- Priority changed from Normal to Urgent
Made urgent as is this is related to a recent alert and not silenced and no mitigation applied yet
Updated by mkittler about 2 hours ago
I mentioned the problematic old jobs on #eng-testing:
There are jobs scheduled on OSD with the worker class qemu_x86_64,intel,tap. Those cannot be scheduled because the combination intel,tap doesn't exist at the moment. I suppose qesapworker-prgX workers would in theory provide that but the tap worker class is disabled there as tap_secondary. Not sure what the best solution is.
There is also a s390-kvm,tap job which is also a combination that doesn't exist.
Updated by okurz 27 minutes ago
- Status changed from New to In Progress
- Assignee set to okurz
Updated by okurz 17 minutes ago
- Description updated (diff)
- Priority changed from Urgent to High
Updated by okurz 6 minutes ago
- Related to action #73174: [osd][alert] Job age (scheduled) (median) alert added