coordination #101048: [epic] Investigate and fix higher instability of openqaworker-arm-4/5 vs. arm-1/2/3 - openQA Project - openSUSE Project Management Tool

coordination #101048

## Observation 
 According to https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?viewPanel=27&orgId=1&from=now-30d&to=now (sort by "avg" in the table on the right-hand side) openqaworker-arm-4/5 have a fail-ratio of 33-36% vs. openqaworker-arm-1/2/3 with a fail-ratio of 15-17% 

 ## Acceptance criteria 
 * **AC1:** openqaworker-arm-4/5 have a fail-ratio less or equal to arm-1/2/3 

 ## Additional information and ideas from the hardware comparison between arm-1/2/3 and arm-4/5 
 * The CPU (the specific model and version, Cavium ThunderX2) of arm-4/5 is known to behave badly for our use-case and that's the difference to the older arm workers (which have the previous version of that CPU model installed). 
 * Disabling cpu control and cpu frequency scaling in the firmware environment didn't make a difference. 
     * Before that we've already tried to reduce the number of worker slots a lot and it didn't help either. 
     * There are still a few ideas to consider (see #109232#note-5). 
     * There are also more variables in the firmware environment (see #109232#note-20) we can play with. 
 * Next time we should buy different hardware (see private comment #109232#note-11). 
 * See the full ticket #109232 for more context about these findings. 

 ## Suggestions 
 - Confirm if typing issues cause the failures (look for timeouts, observe additional or missing characters in typed commands) 
 - Upgrade arm3 to Leap 15.3 and compare failure rate -> #101265 => Leap 15.3 behaves similar as Leap 15.2 
 - Consider switching to kernel-stable or kernel-head -> #101271 => "kernel-default" from Kernel:stable behaves same as openSUSE:Leap:15.3 one 
 - ~~Consider downgrading kernel to what's used in 15.2~~ -> same upstream version is running on most 
 - Bring back arm 4 and 5 after verifying stability 
 - Run [typing.pm](https://github.com/os-autoinst/os-autoinst/blob/master/t/data/tests/tests/typing.pm) from os-autoinst as test in production -> #101262

Back

Project

General

Profile

QA » openQA Project

coordination #101048