Project

General

Profile

action #110545

Updated by okurz over 1 year ago

## Motivation 
 See parent #101048 ##101048 . In #109232#note-5 ggardet_arm gave some additional hints that we could try. We should try all and run tests as mkittler did in #109232 

 ## Acceptance criteria 
 * **AC1:** All concrete ideas have been tried and openQA tests have been executed with a statement regarding stability 

 ## Suggestions 
 * Remind mkittler that he should always write down the commands he used in tickets as otherwise his colleagues will ask him anyway what he did in in #109232 to run openQA tests ;) 
     * See my notes on exporting job IDs via `psql`: https://github.com/Martchus/openQA-helper#useful-sql-queries= 
 * Change the parameters on the systems as written in #109232#note-5 , one by one or in combination, reconduct tests and gather stability figures 
 * Come up with final assessment 

 ## Concrete ideas to try out 
 * Ask Guillaume if we can trade the machine for another one 
 * DONE (does not help, see #110545#note-4): Disable mitigation (KPTI, etc.) 
     * ~Use kernel parameter `mitigations=off` (see https://www.kernel.org/doc/html/v5.15-rc1/admin-guide/kernel-parameters.html)~ 
 * DONE (does not help, see #110545#note-4): Enable/disable huge pages 
 * DONE (at least `progdevfreq`, see #110545#note-8): Disable hardware threading in firmware (it will lower the number of CPU seen by the kernel) 
     * Also tried disabling `progdevfreq` but haven't done any testing after that as the machines broke after that. 
 * Check actual CPU frequency 
 * Check temperature (cpu throttling could slow down cpu freq and you get lower perfs) 
 * Use single socket instead of dual sockets (may be configurable in the firmware) 
     * Which firmware option (see paste in #110545#note-8 for options) would this correspond to? 
 * Use a distribution without LSE-atomics (known to be slow on TX2) 
     * Not sure whether there's a firmware option to disable that support. 
     * Not sure whether it is enabled in Leap anyways (https://en.opensuse.org/Arm_architecture_support#ARMv8.1_-_LSE_(Large_System_Extension)_atomics only mentions Tumbleweed). 
     * `export GLIBC_TUNABLES=”glibc.mem.tagging=X”` where X defaults to 0, and we could confirm if 1 has an effect 
         * see https://www.gnu.org/software/libc/manual/html_node/Memory-Related-Tunables.html for documentation 
 * You can also run sudo perf stat while the system is busy with openQA tests 

Back