Project

General

Profile

action #160646

Updated by okurz about 2 months ago

## Observation 
 Originally reported in https://suse.slack.com/archives/C02CANHLANP/p1716169544132569 
 > (Richard Fan) Hello experts, many Multi-machine tests are failed like MM failed jobs on qe-core (edited) 

 After that as visible in https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=1716124476616&to=1716212983933&viewPanel=24 
 2024-05-19 22:30 there is an increase of parallel_failed. 

 E.g. 
 openQA test in scenario sle-15-SP3-Server-DVD-Updates-x86_64-qam_kernel_multipath@64bit fails in 
 [multipath_iscsi](https://openqa.suse.de/tests/14377269/modules/multipath_iscsi/steps/15) 
 shows that it's not a problem with MTU as the error message is "connect: Network is unreachable" 

 `ssh worker29.oqa.prg2.suse.org "cat /etc/wicked/scripts/gre_tunnel_preup.sh"` shows a problem 

 ``` 
 #!/bin/sh 
 action="$1" 
 bridge="$2" 
 # enable STP for the multihost bridges 
 ovs-vsctl set bridge $bridge stp_enable=false 
 ovs-vsctl set bridge $bridge rstp_enable=true 
 for gre_port in $(ovs-vsctl list-ifaces $bridge | grep gre) ; do ovs-vsctl --if-exists del-port $bridge $gre_port ; done 
 ``` 

 there should be a list of GRE tunnel interface setup calls after the last line between those machines but no GRE tunnels are setup at all 

 ## Reproducible 

 Fails since 
 https://openqa.suse.de/tests/overview?result=parallel_failed&distri=sle&version=15-SP4&build=20240519-1 


 ## Expected result 

 Last good: [20240519-1](https://openqa.suse.de/tests/14362720) (or more recent) 

 ## Suggestions 
 * *DONE* Mitigate -> https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/814 
 * Apply workarounds 
 * Retrigger affected jobs 
 * Investigate error source 
 * Fix the problem or at least abort the generation with error if the section would be completely empty? 
 * Prevent the same and similar problems in the future 
 * Apply rollback steps 
 * Monitor effect carefully 
 * Look into https://stats.openqa-monitor.qa.suse.de/alerting/grafana/0XohcmfVk/view?orgId=1 

 ## Rollback steps 
 * Revert https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/814 


 ## Further details 

 Always latest result in the originally mentioned scenario: [latest](https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-Updates&machine=64bit&test=qam_kernel_multipath&version=15-SP3)

Back