action #37782: [kernel][functional][u][medium] test fails in execute_test_run because it cannot handle broken pipes - openQA Tests (public) - openSUSE Project Management Tool

Actions

Copy link

action #37782

closed

[kernel][functional][u][medium] test fails in execute_test_run because it cannot handle broken pipes

Added by nicksinger almost 7 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

yosun

Category:

Bugs in existing tests

Target version:

QA (public) - future

Start date:

2018-06-25

Due date:

% Done:

Estimated time:

Difficulty:

Description

Observation¶

openQA test in scenario sle-12-SP4-Server-DVD-s390x-fs_stress@s390x-kvm-sle12 fails in
execute_test_run due to a broken pipe.

Suggestions to improve this test¶

To me this issue looks like some timeout after copying many files around. IMHO this can always happen if we relay on long open TCP sessions.
Without looking into test_fs_stress-run, I'd assume it uses SSH. If so, one could try to increase the ssh timeout value (https://askubuntu.com/questions/127369/how-to-prevent-write-failed-broken-pipe-on-ssh-connection)

Another idea would be to implement retries (e.g. only fail after 3 retries).

Reproducible¶

Fails (until now) only at build 0263 and should be sporadic.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by nicksinger almost 7 years ago

Subject changed from [functional][s390x][medium] test fails in execute_test_run because it cannot handle broken pipes to [functional][medium][u] test fails in execute_test_run because it cannot handle broken pipes

Actions

Copy link

Updated by okurz almost 7 years ago

Related to action #34012: [kernel] too generic test failure in "execute_test_run" for stress tests, was previously something more specific like "acceptance_fs_stress" added

Actions

Copy link

Updated by okurz almost 7 years ago

Subject changed from [functional][medium][u] test fails in execute_test_run because it cannot handle broken pipes to [kernel][functional][u][medium] test fails in execute_test_run because it cannot handle broken pipes
Assignee set to yosun

Hi @yosun, as discussed in #34012 I assume you want to pick it up?

Actions

Copy link

Updated by yosun almost 7 years ago

Assignee changed from yosun to okurz

Thanks for the info.
This failed by "packet_write_wait: Connection to 10.161.145.16 port 22: Broken pipe", and checked serial log don't have any oops and crash info in it. It fails when doing "/usr/share/qa/tools/file_copy -j 4 -i 5 -s 5000", which means run in 4 parallels, iteration 5 times with 5000MB(5GB) files copy in 1 time. In this high fs stress, and this test KVM only has 912MB RAM, it's easy to cause OOM then system ramdom kill process, and then ssh service or related process being killed randomly.
Log just shows it lose ssh connect, neither a kernel nor test issue but a random issue. The solution is give larger RAM or harddisk for this poor s390x KVM to avoid/reduce this kind of issue. We have reported related issue as bug to Lab team, but Lab's developer didn't take it because of resource too limited.
In all, I suggest solve it in tools team to give more resources to this KVM or just reject this kind of ticket.

Actions

Copy link

Updated by okurz almost 7 years ago

Status changed from New to Feedback

yosun wrote:

We have reported related issue as bug to Lab team, but Lab's developer didn't take it because of resource too limited.

What "Lab's developer" are you referring to? Do you have a ticket for that?

In all, I suggest solve it in tools team to give more resources to this KVM or just reject this kind of ticket.

I don't think this is related to the tools team because when we talk about KVM we just configure the machine accordingly. We could do that but I want to wait for your response first.

Actions

Copy link

Updated by okurz almost 7 years ago

Target version set to future

Actions

Copy link

Updated by yosun almost 7 years ago

I was wrong, it's not failed by OOM issue. I check test code and log in https://openqa.suse.de/tests/1777070/file/autoinst-log.txt again, I found this test randomly fail by following line timeout in 90 second:
assert_script_run("tar cjf $tarball -C /var/log/qa/ctcs2 ls /var/log/qa/ctcs2/");

It only fail after test fs_stress, and this line just after test finish. I guess after fs stress, system need more time to get enough space to create a log tarbal. I think the solution is add following lines before fail part:
if (get_var("QA_TESTSUITE")=="fs_stress") {
sleep 120;
}

Actions

Copy link

Updated by okurz almost 7 years ago

I guess rather than this big sleep time we should wait for what we really need, e.g. look for free space and wait until there is more free space again. Or just save to a different location, e.g. ram disk /dev/shm

Actions

Copy link

Updated by okurz almost 7 years ago

Assignee changed from okurz to yosun

@yosun anything else what I could help with here?

Actions

Copy link

#10

Updated by yosun almost 7 years ago

It's helpful in #8, thanks! But I still didn't find time to work on it. Maybe I could work on it in SLE12SP4 RC period.

Actions

Copy link

#11

Updated by yosun over 6 years ago

Status changed from Feedback to Resolved

Fixed with add a sync before tar logs. I tried some tests not reproduce this issue. Feel free to reopen it, when happen again.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Tests (public)

Tags

Custom queries

action #37782

[kernel][functional][u][medium] test fails in execute_test_run because it cannot handle broken pipes

Observation¶

Suggestions to improve this test¶

Reproducible¶

Updated by nicksinger almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by yosun almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by yosun almost 7 years ago

Updated by okurz almost 7 years ago

Updated by okurz almost 7 years ago

Updated by yosun almost 7 years ago

Updated by yosun over 6 years ago