Project

General

Profile

action #111908

Updated by dzedro over 2 years ago

## Observation 
 There are "random unexpected" MM failures due to some issue between multiple workers. 
 Below is list of support_server jobs of failed MM HA/SAP jobs in last two weeks. 
 This jobs I restarted on same openQA worker and they didn't fail. 

 Same experience I have with local HA/SAP instance, when I use one worker, there are nearly no "random unexpected" failures. 
 When I use two physical workers, the rate of "random unexpected" failures does increase. 

 https://openqa.suse.de/tests/8804890#dependencies 
 https://openqa.suse.de/tests/8804876#dependencies 
 https://openqa.suse.de/tests/8804944#dependencies 
 https://openqa.suse.de/tests/8796653#dependencies 
 https://openqa.suse.de/tests/8806626#dependencies 
 https://openqa.suse.de/tests/8813734#dependencies 
 https://openqa.suse.de/tests/8819834#dependencies 
 https://openqa.suse.de/tests/8818172#dependencies 
 https://openqa.suse.de/tests/8818165#dependencies 
 https://openqa.suse.de/tests/8825849#dependencies 
 https://openqa.suse.de/tests/8842164#dependencies 
 https://openqa.suse.de/tests/8844261#dependencies 
 https://openqa.suse.de/tests/8855774#dependencies 
 https://openqa.suse.de/tests/8856411#dependencies 

 ## Steps to reproduce 
 The failures are random, I could reproduce this failures on local instance with multiple physical worker. 

 ## Problem 
 I assume it's network/openvswitch/GRE issue between servers. 

 ## Workaround 
 Run the jobs on one physical worker via WORKER_CLASS e.g. WORKER_CLASS=qemu_x86_64,tap,openqaworker8

Back