action #111908
Updated by dzedro over 2 years ago
## Observation There are "random unexpected" MM failures due to some issue between multiple workers. Below is list of support_server jobs of failed MM HA/SAP jobs in last two weeks. This jobs I restarted on same openQA worker and they didn't fail. Same experience I have with local HA/SAP instance, when I use one worker, there are nearly no "random unexpected" failures. When I use two physical workers, the rate of "random unexpected" failures does increase. https://openqa.suse.de/tests/8804890#dependencies https://openqa.suse.de/tests/8804876#dependencies https://openqa.suse.de/tests/8804944#dependencies https://openqa.suse.de/tests/8796653#dependencies https://openqa.suse.de/tests/8806626#dependencies https://openqa.suse.de/tests/8813734#dependencies https://openqa.suse.de/tests/8819834#dependencies https://openqa.suse.de/tests/8818172#dependencies https://openqa.suse.de/tests/8818165#dependencies https://openqa.suse.de/tests/8825849#dependencies https://openqa.suse.de/tests/8842164#dependencies https://openqa.suse.de/tests/8844261#dependencies https://openqa.suse.de/tests/8855774#dependencies https://openqa.suse.de/tests/8856411#dependencies ## Steps to reproduce The failures are random, I could reproduce this failures on local instance with multiple physical worker. ## Problem I assume it's network/openvswitch/GRE issue between servers. ## Workaround Run the jobs on one physical worker via WORKER_CLASS e.g. WORKER_CLASS=qemu_x86_64,tap,openqaworker8