action #162521
opencoordination #162524: [epic] Optimized o3 infrastructure
Reconsider the global job limit on o3, try higher than 170
0%
Description
Motivation¶
In #151807-10 the global job limit on o3 was set to 170. The previous limit wasn't mentioned so I don't know what it was but I assume much higher. 170 is rather low considering also that we have so many worker instance availble. #151807 saw multiple changes so I assume we can actually use a much higher job limit again.
Acceptance criteria¶
- AC1: The global job limit on o3 is significantly higher than 170 or blocking improvement tasks are planned
Suggestions¶
- Understand why the original selection of 170 jobs was done
- Carefully increase the job limit and monitor over at least 10 days
- Try to find a hard upper limit and select a job limit below that with a sane buffer
Updated by okurz 8 months ago
- Related to action #151807: [alert] o3 zabbix: Problem: /var/lib/snapshot-changes: Disk space is critically low (used > 94%) size:M added
Updated by tinita 8 months ago ยท Edited
https://progress.opensuse.org/issues/151807#note-10 says
Also I set back max_running_jobs to 170. We lowered it to make sure the load is not too high so the cleanup job can finally finish.
That means "I set it back up to 170 from a lower limit".
170 is the limit we settled on, higher limits lead to more load and problems.
Also see https://progress.opensuse.org/issues/138545#note-20
Updated by tinita 8 months ago
- Related to action #138545: Munin - minion hook failed - opensuse.org :: openqa.opensuse.org size:S added
Updated by tinita 22 days ago
- Related to action #175503: Increase job limit on o3 added