Project

General

Profile

action #152578

Updated by tinita 5 months ago

## Observation 

 See also #152569 / #152560 

 There seems to be a problem connecting to unreal6.qe.nue2.suse.org for several weeks now. 

 https://openqa.suse.de/tests/13062217 
 ``` 
 Reason: backend died: Error connecting to VNC server <unreal6.qe.nue2.suse.org:5904>: IO::Socket::INET: connect: Connection refused 
 ``` 
 ``` 
 select count(id), substring(reason from 0 for 70) as reason_substr from jobs where t_finished >= '2023-11-01T00:00:00' and result = 'incomplete' group by reason_substr order by count(id) desc; 
 ``` 

 ## Suggestions 
 * Take unreal6 out of prod by disabling the slot(s) on all relevant worker hosts? 
   * But that might not be the best idea because the worker slots don't seem generally broken 
       * e.g. https://openqa.suse.de/tests/13064403 and https://openqa.suse.de/tests/13064408 pass even though it's using unreal6 (https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=JeOS-for-kvm-and-xen-Updates&machine=svirt-xen-pv&test=jeos-containers-docker&version=15-SP5, https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=JeOS-for-kvm-and-xen-Updates&machine=svirt-xen-hvm&test=jeos-containers-podman&version=15-SP5) 
 * Find out what's wrong on unreal6 by investigating that jump host or asking test maintainers 
 * Confirm if the test itself may be broken versus a general issue with the vnc backend 
 * Maybe this is a product issue - it's all SLE15SP6 by the looks of it? 

 ## Problem 
 * **H1** *REJECTED* The product has changed *-> It happened for several different builds, and there were also successful tests with the same builds* 
  * **H1.1** product changed slightly but in an acceptable way without the need for communication with DEV+RM --> adapt test 
  * **H1.2** product changed slightly but in an acceptable way found after feedback from RM --> adapt test 
  * **H1.3** product changed significantly --> after approval by RM adapt test 

 * **H2** Fails because of changes in test setup 
  * **H2.1** Our test hardware equipment behaves different 
  * **H2.2** The network behaves different 

 * **H3** Fails because of changes in test infrastructure software, e.g. os-autoinst, openQA 
 * **H4** Fails because of changes in test management configuration, e.g. openQA database settings 
 * **H5** Fails because of changes in the test software itself (the test plan in source code as well as needles) 
 * **H6** Sporadic issue, i.e. the root problem is already hidden in the system for a long time but does not show symptoms every time

Back