action #110545
Updated by okurz about 2 years ago
## Motivation See parent #101048 . In #109232#note-5 ggardet_arm gave some additional hints that we could try. We should try all and run tests as mkittler did in #109232 ## Acceptance criteria * **AC1:** All concrete ideas have been tried and openQA tests have been executed with a statement regarding stability ## Suggestions * Remind mkittler that he should always write down the commands he used in tickets as otherwise his colleagues will ask him anyway what he did in in #109232 to run openQA tests ;) * See my notes on exporting job IDs via `psql`: https://github.com/Martchus/openQA-helper#useful-sql-queries= * Change the parameters on the systems as written in #109232#note-5 , one by one or in combination, reconduct tests and gather stability figures * Come up with final assessment ## Concrete ideas to try out * DONE Ask Guillaume if we can trade the machine for another one -> nope * DONE (does not help, see #110545#note-4): Disable mitigation (KPTI, etc.) * ~Use kernel parameter `mitigations=off` (see https://www.kernel.org/doc/html/v5.15-rc1/admin-guide/kernel-parameters.html)~ * DONE (does not help, see #110545#note-4): Enable/disable huge pages * DONE (at least `progdevfreq`, see #110545#note-8): Disable hardware threading in firmware (it will lower the number of CPU seen by the kernel) * Also tried disabling `progdevfreq` but haven't done any testing after that as the machines broke after that. * Check actual CPU frequency * Check temperature (cpu throttling could slow down cpu freq and you get lower perfs) * Use single socket instead of dual sockets (may be configurable in the firmware) * Which firmware option (see paste in #110545#note-8 for options) would this correspond to? * Use a distribution without LSE-atomics (known to be slow on TX2) * Not sure whether there's a firmware option to disable that support. * Not sure whether it is enabled in Leap anyways (https://en.opensuse.org/Arm_architecture_support#ARMv8.1_-_LSE_(Large_System_Extension)_atomics only mentions Tumbleweed). * `export GLIBC_TUNABLES=”glibc.mem.tagging=X”` where X defaults to 0, and we could confirm if 1 has an effect * see https://www.gnu.org/software/libc/manual/html_node/Memory-Related-Tunables.html for documentation * You can also run sudo perf stat while the system is busy with openQA tests