action #105804
closedJob age (scheduled) (median) alert size:S
Description
Scheduled jobs are piling up:
https://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?from=now-7d&to=now
The oldest jobs all seem to be ppc64le + ppc64le-2g
Acceptance criteria¶
- AC1: Alerts for
Job age (scheduled) (max)
andJob age (scheduled) (median)
are active - AC2: All worker slots for ppc64le are active
Updated by okurz almost 3 years ago
- Related to coordination #102882: [epic] All OSD PPC64LE workers except malbec appear to have horribly broken cache service added
Updated by mkittler almost 3 years ago
- Status changed from New to Blocked
- Assignee set to mkittler
- Priority changed from Urgent to High
Since I've already checked the performance of the power workers today I'll track this as blocked. I'm also lowering the prio because we have this issue now for quite a while and the figures from the performance test don't suggest it has gotten worse. Considering the alert turned off again after 2 hours I assume the impact is still not that high.
Updated by tinita almost 3 years ago
Although this ticket is assigned to mkittler, as okurz asked me directly to pause the alerts, I did that now and set both Job age (scheduled) (max)
and Job age (scheduled) (median)
to paused.
Please remember when resolving, I will also try to.
Updated by mkittler almost 3 years ago
Due to #102882#note-65 I dared to resume the alerts.
Updated by livdywan almost 3 years ago
- Subject changed from Job age (scheduled) (median) alert to Job age (scheduled) (median) alert size:S
- Description updated (diff)
Updated by mkittler almost 3 years ago
- Status changed from Blocked to Resolved
It looks good so far so I'm resolving the ticket.
Updated by okurz over 1 year ago
- Related to action #135008: Max job age graphs use mean aggregation when max would make more sense added