Project

General

Profile

action #134282

Updated by livdywan 9 months ago

## Observation 

 There are multiple failures going on on iscsi tests done on multimachine setup. 
 So far, almost all tests are failing on "iscsi_client" step, like: 
 12SP5: 
 https://openqa.suse.de/tests/11821503 
 15SP1: 
 https://openqa.suse.de/tests/11822477 
 15SP2: 
 https://openqa.suse.de/tests/11827371 
 15SP3: 
 https://openqa.suse.de/tests/11821798 
 15SP4: 
 https://openqa.suse.de/tests/11820612 
 15SP5: 
 https://openqa.suse.de/tests/11821882 

 So far, I was unable to pinpoint an update that could be the root cause of this issue (since it is happening on all supported sles versions) 
 From the serial0.txt log from one test node, it seems that it somehow lost communication with iscsi server: 
 [    445.225255][ T3182] sd 3:0:0:0: [sda] Optimal transfer size 42949672 logical blocks > dev_max (65535 logical blocks) 
 [    455.449746][      C3]    connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295003573, last ping 4295004824, now 4295006080 
 [    455.452820][      C3]    connection1:0: detected conn error (1022) 
 [    455.281284] iscsid[9644]: iscsid: Kernel reported iSCSI connection 1:0 error (1022 - ISCSI_ERR_NOP_TIMEDOUT: A NOP has timed out) state (3) 
 [    458.309464] iscsid[9644]: iscsid: connection1:0 is operational after recovery (1 attempts) 
 [    458.513865][ T9694] sd 3:0:0:2: Attached scsi generic sg2 type 0 
 [    468.761789][      C3]    connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4295006845, last ping 4295008128, now 4295009408 
 [    468.765772][      C3]    connection1:0: detected conn error (1022) 
 [    468.594268] iscsid[9644]: iscsid: Kernel reported iSCSI connection 1:0 error (1022 - ISCSI_ERR_NOP_TIMEDOUT: A NOP has timed out) state (3) 
 [    471.621874] iscsid[9644]: iscsid: connection1:0 is operational after recovery (1 attempts) 

 I'm struggling to get a debug mode to run, since it seems that osd is overloaded at this moment, but last time I have tried to debug, I was able to ping and communicate with "support server" normally (but issue was not happening very often that time) 

 ## Acceptance criteria 
 * **AC1:** Multi-machine tests work with different physical hosts 

 ## Suggestions 
 - File SD-INFRA ticket for network issue 
 - Confirm how #111908 is related

Back