action #158104
closedopenQA Project (public) - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances
openQA Project (public) - coordination #158110: [epic] Prevent worker overload
typing issue on ppc64 worker size:S
0%
Description
Observation¶
openQA test in scenario sle-15-SP6-Online-ppc64le-ha_beta_supportserver@ppc64le-2g fails in
setup
https://openqa.suse.de/tests/13885455#step/setup/84 (see attachment p1.png)
https://openqa.suse.de/tests/13885471#step/setup/30 (see attachment p2.png) It missed "$" before "?".
https://openqa.suse.de/tests/13885404#step/setup/12 (see attachment p3.png)
https://openqa.suse.de/tests/13885407#step/setup/9 (see attachment p4.png)
I think this may related with the high work load of underlying ppc64 worker.
All on "mania"
Test suite description¶
The base test suite is used for job templates defined in YAML documents. It has no settings of its own.
Reproducible¶
Fails since (at least) Build 73.1 (current job)
Expected result¶
Last good: 67.1 (or more recent)
Suggestions¶
- Identify the affected machines and workers, apply mitigations to prevent recurring typing issues, e.g. reducing CPU load
- Restart related failed jobs
- Identify follow-up tasks
- Reduce the number of worker instances as a first mitigation measure. https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/759 (merged)
- Make the alert for CPU load more strict - #158113
- Evaluate the impact on video encoding in particular on ppc64le, maybe ffmpeg on Power8 kvm is inefficient - #158116
- Check existing ffmpeg processes on mania which take a lot of CPU time - #158116
Out of scope¶
Further details¶
Always latest result in this scenario: latest
Files