Project

General

Profile

action #12410

Updated by okurz almost 8 years ago

## observation 
 https://openqa.suse.de/tests/448619 dasdfmt seems to be done but the wait_serial exited, maybe it was taking unusually long? 

 ## steps to reproduce 
 TBC 

 ## problem 
 happened during the last week sporadically 
 H1. worker/s390-host specific 
 H2. can happen everywhere 
 H3. our recent changes in bootloader_s390 introduced some behaviour change 
 H4. serial output gets lost 
 H4.1. REJECTED: output to serial gets lost randomly -> 1000/1000 runs of `assert_script_run("echo $_", 10, 'failed');` succeeded, see http://opeth.suse.de/tests/2825 
 H4.2. REJECTED: long timeouts cause serial output loss -> 10/10 runs of `assert_script_run("sleep 900 && echo $_", 1200, 'failed');` succeeded, succceeded, see http://opeth.suse.de/tests/2825 and http://opeth.suse.de/tests/2866 
 H4.3. UNCLEAR: serial output only gets lost when dasdfmt is called with assert_script_run -> not reproducable at all on lord.arch, maybe E3-1 and E4-1 are invalid therefore 

 ## suggestion 
 * <del>check logfiles, e.g. for exact timing sequence</del> -> wait_serial times out after 20 minutes in both occassions. from video we can see that the actual formatting process was finished already in before 
 * E1-1. DONE: reproduce by calling the dasdfmt repeatedly on another host (my host (okurz)) -> done, could not reproduce in http://lord.arch/tests/1582 in 17/17 runs of full dasdfmt on personal instance 
 * E2-1. find out if problem only occurs on some or a single host 
 * E3-1. @mgriessmeier: find old test run before we deployed new backend that shows this error 
 * E4-1. DONE: @mgriessmeier: find s390x host with small disk (to save time) and format many times, i.e. call for-loop with the assert_script_run on dasdfmt -> could not reproduce 


 ## workaround 
 sporadic, restart

Back