Project

General

Profile

action #158116

Updated by okurz about 1 month ago

## Motivation 
 In #158104 system overload on ppc64le machines was found which was likely triggered by #157636. As a snapshot the current process list output from htop looks like this: 

 ``` 
    PID USER         PRI    NI    VIRT     RES     SHR S      DISK R/W    CPU% MEM%     TIME+ â–½Command 
   1541 root          20     0    320M    194M    182M S      0.00 B/s     0.0    0.0    2h29:59 /usr/lib/systemd/systemd-j 
  96369 root          20     0    623M 98880 14336 S      0.00 B/s     0.0    0.0 54:05.86 /usr/bin/python3 /usr/bin/ 
      1 root          20     0    178M 25024 11776 S      0.00 B/s     0.0    0.0 48:46.08 /usr/lib/systemd/systemd n 
   2000 root          20     0    9728    6208    2176 S      0.00 B/s     0.0    0.0 40:44.69 /usr/sbin/haveged -w 1024 
 157105 _openqa-wo    20     0    427M    189M 23808 R      0.00 B/s    68.4    0.0 32:22.39 ffmpeg -y -hide_banner -no 
 157062 _openqa-wo    20     0    427M    193M 23808 R      0.00 B/s    42.1    0.0 32:07.83 ffmpeg -y -hide_banner -no 
 157107 _openqa-wo    20     0    427M    189M 23808 R      0.00 B/s    68.4    0.0 30:29.03 ffmpeg -y -hide_banner -no 
 157063 _openqa-wo    20     0    427M    193M 23808 R      0.00 B/s     5.3    0.0 29:30.58 ffmpeg -y -hide_banner -no 
   6267 _openqa-wo    20     0    427M    193M 23808 R      0.00 B/s    63.2    0.0 25:54.22 ffmpeg -y -hide_banner -no 
 157108 _openqa-wo    20     0    427M    189M 23808 R      0.00 B/s    63.2    0.0 25:03.79 ffmpeg -y -hide_banner -no 
 157064 _openqa-wo    20     0    427M    193M 23808 R      0.00 B/s     2.6    0.0 23:50.53 ffmpeg -y -hide_banner -no 
 156485 _openqa-wo    20     0    427M    189M 23808 R      0.00 B/s    34.2    0.0 22:18.78 ffmpeg -y -hide_banner -no 
   6268 _openqa-wo    20     0    427M    193M 23808 R      0.00 B/s    57.9    0.0 21:48.92 ffmpeg -y -hide_banner -no 
 156601 _openqa-wo    20     0    427M    193M 23808 R      0.00 B/s    10.5    0.0 20:19.58 ffmpeg -y -hide_banner -no 
   6269 _openqa-wo    20     0    427M    193M 23808 R      0.00 B/s    55.3    0.0 16:33.02 ffmpeg -y -hide_banner -no 
   5898 _openqa-wo    20     0    427M    193M 23808 R      0.00 B/s    86.8    0.0 14:48.15 ffmpeg -y -hide_banner -no 
  31080 _openqa-wo    20     0 5720M    758M 28416 R      0.00 B/s    57.9    0.1 12:58.63 /usr/bin/qemu-system-ppc64 
  15778 _openqa-wo    20     0 6767M 1779M 28480 R      0.00 B/s    81.6    0.2 12:50.94 /usr/bin/qemu-system-ppc64 
  15781 _openqa-wo    20     0 6767M 1779M 28480 S      0.00 B/s     0.0    0.2 10:13.25 /usr/bin/qemu-system-ppc64 
 156709 _openqa-wo    20     0 6762M 1766M 28288 S      0.00 B/s    13.2    0.2 10:08.67 /usr/bin/qemu-system-ppc64 
  33559 _openqa-wo    20     0 6756M 1724M 28416 R      0.00 B/s    86.8    0.2 10:05.56 /usr/bin/qemu-system-ppc64 
  35017 _openqa-wo    20     0 3946M    753M 28416 R      0.00 B/s    84.2    0.1    9:30.77 /usr/bin/qemu-system-ppc64 
  24085 _openqa-wo    20     0 6901M 1781M 28480 S      0.00 B/s     0.0    0.2    9:13.94 /usr/bin/qemu-system-ppc64 
  24092 _openqa-wo    20     0 6901M 1781M 28480 R      0.00 B/s    78.9    0.2    8:40.60 /usr/bin/qemu-system-ppc64 
  28718 _openqa-wo    20     0 7135M 1787M 28480 S      0.00 B/s    50.0    0.2    8:17.91 /usr/bin/qemu-system-ppc64 
  28720 _openqa-wo    20     0 7135M 1787M 28480 R      0.00 B/s    13.2    0.2    6:51.75 /usr/bin/qemu-system-ppc64 
  39280 _openqa-wo    20     0 5712M    755M 28416 R      0.00 B/s    65.8    0.1    6:41.38 /usr/bin/qemu-system-ppc64 
  39683 _openqa-wo    20     0 6731M 1549M 28416 R      0.00 B/s    65.8    0.2    6:24.06 /usr/bin/qemu-system-ppc64 
   3699 root          20     0    3968    3200    2368 S      0.00 B/s     0.0    0.0    6:04.21 /sbin/agetty -o -p -- \u - 
  34903 _openqa-wo    20     0 6334M 1483M 28416 R      0.00 B/s    50.0    0.2    5:29.90 /usr/bin/qemu-system-ppc64 
  34902 _openqa-wo    20     0 6334M 1483M 28416 S      0.00 B/s     0.0    0.2    4:40.00 /usr/bin/qemu-system-ppc64 
  38988 _openqa-wo    20     0 6790M 1376M 28480 R      0.00 B/s 107.9    0.2    3:52.33 /usr/bin/qemu-system-ppc64 
  38599 _openqa-wo    20     0 8040M 4187M 28480 R      0.00 B/s    47.4    0.5    3:41.13 /usr/bin/qemu-system-ppc64 
  45395 _openqa-wo    20     0 3732M    757M 28416 R      0.00 B/s    71.1    0.1    3:38.90 /usr/bin/qemu-system-ppc64 
  38600 _openqa-wo    20     0 8040M 4187M 28480 S      0.00 B/s     0.0    0.5    3:18.94 /usr/bin/qemu-system-ppc64 
  43853 _openqa-wo    20     0 5641M 1696M 28480 R      0.00 B/s    63.2    0.2    3:12.66 /usr/bin/qemu-system-ppc64 
  38456 _openqa-wo    20     0 9087M 4195M 28480 R      0.00 B/s    78.9    0.5    3:08.68 /usr/bin/qemu-system-ppc64 
  38986 _openqa-wo    20     0 6790M 1376M 28480 R      0.00 B/s    86.8    0.2    3:06.34 /usr/bin/qemu-system-ppc64 
 ``` 

 so ffmpeg shows significantly higher accumulated CPU time usage compared to the according qemu processes. We should investigate if ffmpeg is having a "too high" impact on machine performance, if it should be running with nice level to prevent typing issues, if ffmpeg parameters can be tweaked or if ffmpeg should be avoided at all on ppc64le. 

 ## Acceptance criteria 
 * **AC1:** openQA test video compression is ensured to not significantly impact impacting system performance causing typing issues 
 * **AC2:** openQA tests pass consistently without typing issues due to video encoding 
 * **AC3:** openQA tests can still provide useful videos with exceptions (e.g. keep videos completely disabled as last resort) 

 ## Suggestions 
 * Be aware that as of 2024-04-04 NOVIDEO=1 was again set for ppc64le openQA machine definitions, see #157636 
 * Check if ffmpeg CPU usage as visible in the above htop output is considered expected or something unusual 
 * Try and compare ffmpeg manually on x86_64 and ppc64le to see if ppc64le is maybe much less efficient 
 * Consider introducing a nice-level for calling ffmpeg in os-autoinst although this might counter-productive as the video encoder works on a queue and shouldn't be delayed, maybe in combination with some bigger buffers or bigger "pipe size"? 
 * Crosscheck if ffmpeg can be tweaked, in particular for ppc64le qemu workers 
 * We still have the alternative to not use the external ffmpeg encoder but use the internal OGV encoder 
 * Decide if ffmpeg or even complete video encoding should be completely forbidden on ppc64le, see #157636 


 ## Out of scope 
 * Actually enabling/disabling ffmpeg in production is handled as part of #157636

Back