action #12410
closeds390 dasdfmt fails even though command looks complete in screenshot
0%
Description
observation¶
https://openqa.suse.de/tests/448619 dasdfmt seems to be done but the wait_serial exited, maybe it was taking unusually long?
steps to reproduce¶
TBC
problem¶
happened during the last week sporadically
H1. REJECTED: worker/s390-host specific
H2. can happen everywhere
H3. our recent changes in bootloader_s390 introduced some behaviour change
H4. serial output gets lost
H4.1. REJECTED: output to serial gets lost randomly -> 1000/1000 runs of assert_script_run("echo $_", 10, 'failed');
succeeded, see http://opeth.suse.de/tests/2825
H4.2. REJECTED: long timeouts cause serial output loss -> 10/10 runs of assert_script_run("sleep 900 && echo $_", 1200, 'failed');
succeeded, see http://opeth.suse.de/tests/2825 and http://opeth.suse.de/tests/2866
H4.3. UNCLEAR: serial output only gets lost when dasdfmt is called with assert_script_run -> not reproducable at all on lord.arch, maybe E3-1 and E4-1 are invalid therefore
H4.4: iucvconn and agetty processes are not running, <- most likely, since we can see this in our debug output
suggestion¶
- check logfiles, e.g. for exact timing sequence -> wait_serial times out after 20 minutes in both occassions. from video we can see that the actual formatting process was finished already in before
- E1-1. DONE: reproduce by calling the dasdfmt repeatedly on another host (my host (okurz)) -> done, could not reproduce in http://lord.arch/tests/1582 in 17/17 runs of full dasdfmt on personal instance
- E2-1. DONE: find out if problem only occurs on some or a single host -> found 3 different hosts with this issue
- E3-1. DONE: @mgriessmeier: find old test run before we deployed new backend that shows this error -> none found
- E4-1. DONE: @mgriessmeier: find s390x host with small disk (to save time) and format many times, i.e. call for-loop with the assert_script_run on dasdfmt -> could not reproduce
workaround¶
sporadic, restart
Updated by okurz over 8 years ago
also happened in https://openqa.suse.de/tests/448532
Updated by okurz over 8 years ago
- Description updated (diff)
special test run started for reproduction of hypotheses in
http://opeth.suse.de/tests/2825
we use
# Test if randomly output to serial gets lost
for (1..1_000) {
assert_script_run("echo $_", 10, 'failed');
}
# Test if long timeouts cause serial output loss
for (1..10) {
assert_script_run("sleep 900 && echo $_", 1200, 'failed');
}
Updated by okurz over 8 years ago
- Description updated (diff)
- Assignee changed from okurz to mgriessmeier
experiment finished in http://lord.arch/tests/1582, build timed out after 2h, 17/17 succeeded
@mgriessmeier, conduct missing experiments
Updated by okurz over 8 years ago
recent example on Build1636: https://openqa.suse.de/tests/454616 failed for the same reason, on openqaw1:1. Previous tests triggered from lord.arch couldn't reproduce this but I have the free ressource so I am making use of it: http://lord.arch/tests/1593 and following with test "crosscheck_poo#12410@s390x-zVM-vswitch-l3"
… but it does not work, always problems to connect over ssh, don't know why.
Updated by okurz over 8 years ago
I am trying once more to reproduce this, this time based on the most recent failing example in Build 1648.
Triggered as http://lord.arch/tests/1825 and 20 following.
Updated by okurz over 8 years ago
- Related to action #12596: s390: wait serial output in "logpackages" and "consoletest_setup" is lost added
Updated by okurz over 8 years ago
- Related to action #12452: [s390x] mysql_srv: wait_serial expects regexp, serial0.log shows the right match, but test fails (timeout too short sometimes) added
Updated by okurz over 8 years ago
- Related to action #12300: [s390] can fail during formatting/wait_serial added
Updated by okurz over 8 years ago
- Related to deleted (action #12452: [s390x] mysql_srv: wait_serial expects regexp, serial0.log shows the right match, but test fails (timeout too short sometimes))
Updated by okurz over 8 years ago
- Blocks action #12452: [s390x] mysql_srv: wait_serial expects regexp, serial0.log shows the right match, but test fails (timeout too short sometimes) added
Updated by okurz over 8 years ago
- Related to deleted (action #12596: s390: wait serial output in "logpackages" and "consoletest_setup" is lost)
Updated by okurz over 8 years ago
- Blocked by action #12596: s390: wait serial output in "logpackages" and "consoletest_setup" is lost added
Updated by mgriessmeier about 8 years ago
- Description updated (diff)
latest failing example:
https://openqa.suse.de/tests/539995
if you compare this step https://openqa.suse.de/tests/539995#step/bootloader_s390/8
to the same step of a passed job, e.g. https://openqa.suse.de/tests/538665#step/bootloader_s390/8
you see that somehow the iucvconn and the agetty were killed/not yet established which results in the wait_serial issue
Updated by okurz about 8 years ago
- Blocks deleted (action #12452: [s390x] mysql_srv: wait_serial expects regexp, serial0.log shows the right match, but test fails (timeout too short sometimes))
Updated by mgriessmeier about 8 years ago
- Status changed from New to Feedback
not seen for a long time, considering as fixed
Updated by mgriessmeier about 8 years ago
- Status changed from Feedback to Resolved