Project

General

Profile

Actions

action #158116

open

openQA Project (public) - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

openQA Project (public) - coordination #158110: [epic] Prevent worker overload

typing issue on ppc64 worker - crosscheck performance impact of ffmpeg on ppc64le (Power8 kvm) size:M

Added by okurz 9 months ago. Updated 8 months ago.

Status:
Workable
Priority:
Normal
Assignee:
-
Category:
Regressions/Crashes
Target version:
QA (public, currently private due to #173521) - Tools - Next
Start date:
2024-03-27
Due date:
% Done:

0%

Estimated time:

Description

Motivation

In #158104 system overload on ppc64le machines was found which was likely triggered by #157636. As a snapshot the current process list output from htop looks like this:

   PID USER       PRI  NI  VIRT   RES   SHR S    DISK R/W  CPU% MEM%   TIME+ â–½Command
  1541 root        20   0  320M  194M  182M S    0.00 B/s   0.0  0.0  2h29:59 /usr/lib/systemd/systemd-j
 96369 root        20   0  623M 98880 14336 S    0.00 B/s   0.0  0.0 54:05.86 /usr/bin/python3 /usr/bin/
     1 root        20   0  178M 25024 11776 S    0.00 B/s   0.0  0.0 48:46.08 /usr/lib/systemd/systemd n
  2000 root        20   0  9728  6208  2176 S    0.00 B/s   0.0  0.0 40:44.69 /usr/sbin/haveged -w 1024
157105 _openqa-wo  20   0  427M  189M 23808 R    0.00 B/s  68.4  0.0 32:22.39 ffmpeg -y -hide_banner -no
157062 _openqa-wo  20   0  427M  193M 23808 R    0.00 B/s  42.1  0.0 32:07.83 ffmpeg -y -hide_banner -no
157107 _openqa-wo  20   0  427M  189M 23808 R    0.00 B/s  68.4  0.0 30:29.03 ffmpeg -y -hide_banner -no
157063 _openqa-wo  20   0  427M  193M 23808 R    0.00 B/s   5.3  0.0 29:30.58 ffmpeg -y -hide_banner -no
  6267 _openqa-wo  20   0  427M  193M 23808 R    0.00 B/s  63.2  0.0 25:54.22 ffmpeg -y -hide_banner -no
157108 _openqa-wo  20   0  427M  189M 23808 R    0.00 B/s  63.2  0.0 25:03.79 ffmpeg -y -hide_banner -no
157064 _openqa-wo  20   0  427M  193M 23808 R    0.00 B/s   2.6  0.0 23:50.53 ffmpeg -y -hide_banner -no
156485 _openqa-wo  20   0  427M  189M 23808 R    0.00 B/s  34.2  0.0 22:18.78 ffmpeg -y -hide_banner -no
  6268 _openqa-wo  20   0  427M  193M 23808 R    0.00 B/s  57.9  0.0 21:48.92 ffmpeg -y -hide_banner -no
156601 _openqa-wo  20   0  427M  193M 23808 R    0.00 B/s  10.5  0.0 20:19.58 ffmpeg -y -hide_banner -no
  6269 _openqa-wo  20   0  427M  193M 23808 R    0.00 B/s  55.3  0.0 16:33.02 ffmpeg -y -hide_banner -no
  5898 _openqa-wo  20   0  427M  193M 23808 R    0.00 B/s  86.8  0.0 14:48.15 ffmpeg -y -hide_banner -no
 31080 _openqa-wo  20   0 5720M  758M 28416 R    0.00 B/s  57.9  0.1 12:58.63 /usr/bin/qemu-system-ppc64
 15778 _openqa-wo  20   0 6767M 1779M 28480 R    0.00 B/s  81.6  0.2 12:50.94 /usr/bin/qemu-system-ppc64
 15781 _openqa-wo  20   0 6767M 1779M 28480 S    0.00 B/s   0.0  0.2 10:13.25 /usr/bin/qemu-system-ppc64
156709 _openqa-wo  20   0 6762M 1766M 28288 S    0.00 B/s  13.2  0.2 10:08.67 /usr/bin/qemu-system-ppc64
 33559 _openqa-wo  20   0 6756M 1724M 28416 R    0.00 B/s  86.8  0.2 10:05.56 /usr/bin/qemu-system-ppc64
 35017 _openqa-wo  20   0 3946M  753M 28416 R    0.00 B/s  84.2  0.1  9:30.77 /usr/bin/qemu-system-ppc64
 24085 _openqa-wo  20   0 6901M 1781M 28480 S    0.00 B/s   0.0  0.2  9:13.94 /usr/bin/qemu-system-ppc64
 24092 _openqa-wo  20   0 6901M 1781M 28480 R    0.00 B/s  78.9  0.2  8:40.60 /usr/bin/qemu-system-ppc64
 28718 _openqa-wo  20   0 7135M 1787M 28480 S    0.00 B/s  50.0  0.2  8:17.91 /usr/bin/qemu-system-ppc64
 28720 _openqa-wo  20   0 7135M 1787M 28480 R    0.00 B/s  13.2  0.2  6:51.75 /usr/bin/qemu-system-ppc64
 39280 _openqa-wo  20   0 5712M  755M 28416 R    0.00 B/s  65.8  0.1  6:41.38 /usr/bin/qemu-system-ppc64
 39683 _openqa-wo  20   0 6731M 1549M 28416 R    0.00 B/s  65.8  0.2  6:24.06 /usr/bin/qemu-system-ppc64
  3699 root        20   0  3968  3200  2368 S    0.00 B/s   0.0  0.0  6:04.21 /sbin/agetty -o -p -- \u -
 34903 _openqa-wo  20   0 6334M 1483M 28416 R    0.00 B/s  50.0  0.2  5:29.90 /usr/bin/qemu-system-ppc64
 34902 _openqa-wo  20   0 6334M 1483M 28416 S    0.00 B/s   0.0  0.2  4:40.00 /usr/bin/qemu-system-ppc64
 38988 _openqa-wo  20   0 6790M 1376M 28480 R    0.00 B/s 107.9  0.2  3:52.33 /usr/bin/qemu-system-ppc64
 38599 _openqa-wo  20   0 8040M 4187M 28480 R    0.00 B/s  47.4  0.5  3:41.13 /usr/bin/qemu-system-ppc64
 45395 _openqa-wo  20   0 3732M  757M 28416 R    0.00 B/s  71.1  0.1  3:38.90 /usr/bin/qemu-system-ppc64
 38600 _openqa-wo  20   0 8040M 4187M 28480 S    0.00 B/s   0.0  0.5  3:18.94 /usr/bin/qemu-system-ppc64
 43853 _openqa-wo  20   0 5641M 1696M 28480 R    0.00 B/s  63.2  0.2  3:12.66 /usr/bin/qemu-system-ppc64
 38456 _openqa-wo  20   0 9087M 4195M 28480 R    0.00 B/s  78.9  0.5  3:08.68 /usr/bin/qemu-system-ppc64
 38986 _openqa-wo  20   0 6790M 1376M 28480 R    0.00 B/s  86.8  0.2  3:06.34 /usr/bin/qemu-system-ppc64

so ffmpeg shows significantly higher accumulated CPU time usage compared to the according qemu processes. We should investigate if ffmpeg is having a "too high" impact on machine performance, if it should be running with nice level to prevent typing issues, if ffmpeg parameters can be tweaked or if ffmpeg should be avoided at all on ppc64le.

Acceptance criteria

  • AC1: openQA test video compression is ensured to not significantly impact system performance causing typing issues
  • AC2: openQA tests pass consistently without typing issues due to video encoding
  • AC3: openQA tests can still provide useful videos with exceptions (e.g. keep videos completely disabled as last resort)

Suggestions

  • Be aware that as of 2024-04-04 NOVIDEO=1 was again set for ppc64le openQA machine definitions, see #157636
  • Check if ffmpeg CPU usage as visible in the above htop output is considered expected or something unusual
  • Try and compare ffmpeg manually on x86_64 and ppc64le to see if ppc64le is maybe much less efficient
  • Consider introducing a nice-level for calling ffmpeg in os-autoinst although this might counter-productive as the video encoder works on a queue and shouldn't be delayed, maybe in combination with some bigger buffers or bigger "pipe size"?
  • Crosscheck if ffmpeg can be tweaked, in particular for ppc64le qemu workers
  • We still have the alternative to not use the external ffmpeg encoder but use the internal OGV encoder
  • Decide if ffmpeg or even complete video encoding should be completely forbidden on ppc64le, see #157636

Out of scope

  • Actually enabling/disabling ffmpeg in production is handled as part of #157636

Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure (public) - action #158104: typing issue on ppc64 worker size:SResolvedokurz2024-03-27

Actions
Actions #1

Updated by okurz 9 months ago

  • Copied from action #158104: typing issue on ppc64 worker size:S added
Actions #2

Updated by okurz 8 months ago

  • Subject changed from typing issue on ppc64 worker - crosscheck performance impact of ffmpeg on ppc64le (Power8 kvm) to typing issue on ppc64 worker - crosscheck performance impact of ffmpeg on ppc64le (Power8 kvm) size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by okurz 8 months ago

  • Target version changed from Ready to Tools - Next
Actions

Also available in: Atom PDF