Project

General

Profile

Actions

action #158104

closed

openQA Project (public) - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

openQA Project (public) - coordination #158110: [epic] Prevent worker overload

typing issue on ppc64 worker size:S

Added by zcjia 9 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Start date:
2024-03-27
Due date:
% Done:

0%

Estimated time:

Description

Observation

openQA test in scenario sle-15-SP6-Online-ppc64le-ha_beta_supportserver@ppc64le-2g fails in
setup

https://openqa.suse.de/tests/13885455#step/setup/84 (see attachment p1.png)

https://openqa.suse.de/tests/13885471#step/setup/30 (see attachment p2.png) It missed "$" before "?".

https://openqa.suse.de/tests/13885404#step/setup/12 (see attachment p3.png)

https://openqa.suse.de/tests/13885407#step/setup/9 (see attachment p4.png)

I think this may related with the high work load of underlying ppc64 worker.

All on "mania"

Test suite description

The base test suite is used for job templates defined in YAML documents. It has no settings of its own.

Reproducible

Fails since (at least) Build 73.1 (current job)

Expected result

Last good: 67.1 (or more recent)

Suggestions

  • Identify the affected machines and workers, apply mitigations to prevent recurring typing issues, e.g. reducing CPU load
  • Restart related failed jobs
  • Identify follow-up tasks
  • Reduce the number of worker instances as a first mitigation measure. https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/759 (merged)
  • Make the alert for CPU load more strict - #158113
  • Evaluate the impact on video encoding in particular on ppc64le, maybe ffmpeg on Power8 kvm is inefficient - #158116
  • Check existing ffmpeg processes on mania which take a lot of CPU time - #158116

Out of scope

Further details

Always latest result in this scenario: latest


Files

p2.png (53.3 KB) p2.png zcjia, 2024-03-27 06:52
p3.png (33.5 KB) p3.png zcjia, 2024-03-27 06:56
p4.png (31 KB) p4.png zcjia, 2024-03-27 06:57
p5.png (58.9 KB) p5.png zcjia, 2024-03-27 07:04
p6.png (28.7 KB) p6.png zcjia, 2024-03-27 07:07
p7.png (28.8 KB) p7.png zcjia, 2024-03-27 07:09
Screenshot from 2024-03-28 14-37-54.png (151 KB) Screenshot from 2024-03-28 14-37-54.png llzhao, 2024-03-28 06:38
Screenshot from 2024-03-28 14-37-43.png (109 KB) Screenshot from 2024-03-28 14-37-43.png llzhao, 2024-03-28 06:38

Related issues 4 (2 open2 closed)

Related to openQA Infrastructure (public) - action #157636: remove NOVIDEO=1 from ppc64le workersNewzcjia2024-03-21

Actions
Copied to openQA Infrastructure (public) - action #158113: typing issue on ppc64 worker - make CPU load alert more strict size:MResolvedokurz2024-03-27

Actions
Copied to openQA Infrastructure (public) - action #158116: typing issue on ppc64 worker - crosscheck performance impact of ffmpeg on ppc64le (Power8 kvm) size:MWorkable2024-03-27

Actions
Copied to openQA Project (public) - action #158125: typing issue on ppc64 worker - only pick up (or start) new jobs if CPU load is below configured threshold size:MResolvedmkittler

Actions
Actions

Also available in: Atom PDF