Project

General

Profile

Actions

action #12410

closed

s390 dasdfmt fails even though command looks complete in screenshot

Added by okurz over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2016-06-20
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

observation

https://openqa.suse.de/tests/448619 dasdfmt seems to be done but the wait_serial exited, maybe it was taking unusually long?

steps to reproduce

TBC

problem

happened during the last week sporadically
H1. REJECTED: worker/s390-host specific
H2. can happen everywhere
H3. our recent changes in bootloader_s390 introduced some behaviour change
H4. serial output gets lost
H4.1. REJECTED: output to serial gets lost randomly -> 1000/1000 runs of assert_script_run("echo $_", 10, 'failed'); succeeded, see http://opeth.suse.de/tests/2825
H4.2. REJECTED: long timeouts cause serial output loss -> 10/10 runs of assert_script_run("sleep 900 && echo $_", 1200, 'failed'); succeeded, see http://opeth.suse.de/tests/2825 and http://opeth.suse.de/tests/2866
H4.3. UNCLEAR: serial output only gets lost when dasdfmt is called with assert_script_run -> not reproducable at all on lord.arch, maybe E3-1 and E4-1 are invalid therefore
H4.4: iucvconn and agetty processes are not running, <- most likely, since we can see this in our debug output

suggestion

  • check logfiles, e.g. for exact timing sequence -> wait_serial times out after 20 minutes in both occassions. from video we can see that the actual formatting process was finished already in before
  • E1-1. DONE: reproduce by calling the dasdfmt repeatedly on another host (my host (okurz)) -> done, could not reproduce in http://lord.arch/tests/1582 in 17/17 runs of full dasdfmt on personal instance
  • E2-1. DONE: find out if problem only occurs on some or a single host -> found 3 different hosts with this issue
  • E3-1. DONE: @mgriessmeier: find old test run before we deployed new backend that shows this error -> none found
  • E4-1. DONE: @mgriessmeier: find s390x host with small disk (to save time) and format many times, i.e. call for-loop with the assert_script_run on dasdfmt -> could not reproduce

workaround

sporadic, restart


Related issues 2 (0 open2 closed)

Related to openQA Tests - action #12300: [s390] can fail during formatting/wait_serialResolvedokurz2016-05-24

Actions
Blocked by openQA Tests - action #12596: s390: wait serial output in "logpackages" and "consoletest_setup" is lostResolvedmgriessmeier2016-07-04

Actions
Actions

Also available in: Atom PDF