Project

General

Profile

action #89614

coordination #39719: [saga][epic] Detection of "known failures" for stable tests, easy test results review and easy tracking of known issues

coordination #62420: [epic] Distinguish all types of incompletes

openqa workers on `ip-172-25-5-39` fails with no clue on failure

Added by ggardet_arm 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Support
Target version:
Start date:
2021-03-08
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

openqa workers on ip-172-25-5-39 fails with no clue on the obvious reason of the failure. This host is an aarch64 AWS A1 instance with 3 workers running on SLE15-SP1.

One occurrence of the failure: https://openqa.opensuse.org/tests/1660849

GOT GO

[2021-03-08T10:18:37.703 UTC] [debug] THERE IS NOTHING TO READ 4 5 4
myjsonrpc: remote end terminated connection, stopping at /usr/lib/os-autoinst/myjsonrpc.pm line 57, <$fh> line 78.
[2021-03-08T10:18:37.703 UTC] [debug] stopping backend process 2994
[2021-03-08T10:18:37.705 UTC] [debug] backend got TERM
[2021-03-08T10:18:37.705 UTC] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2021-03-08T10:18:38.714 UTC] [debug] flushing frames
[2021-03-08T10:18:38.836 UTC] [debug] QEMU: QEMU emulator version 3.1.1.1 (SUSE Linux Enterprise 15)
[2021-03-08T10:18:38.836 UTC] [debug] QEMU: Copyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers
[2021-03-08T10:18:38.836 UTC] [debug] QEMU: qemu-system-aarch64: terminating on signal 15 from pid 2994 (/usr/bin/isotovideo: backen)
[2021-03-08T10:18:38.838 UTC] [debug] sending magic and exit
[2021-03-08T10:18:39.005 UTC] [debug] done with backend process
[2021-03-08T10:18:39.005 UTC] [debug] stopping autotest process 2978
[2021-03-08T10:18:39.206 UTC] [debug] done with autotest process
2968: EXIT 0

For now I disabled the workers on this machine to avoid lots of test failures.

History

#1 Updated by okurz 3 months ago

  • Due date set to 2021-03-16
  • Category set to Support
  • Status changed from New to Feedback
  • Assignee set to okurz
  • Target version set to Ready
  • Parent task set to #62420

https://openqa.opensuse.org/admin/workers/270 looks like this is reproducible, right? Could you try to run a test locally with just isotovideo and a vars.json file?

We also have some other cases in #62420 regarding "we do not understand from logs what is going on".

Please also keep in mind that packages on SLE15-SP1 - if they are even up-to-date - are not properly tested and likely to have some problems.

#2 Updated by ggardet_arm 3 months ago

Yes ,this is reproducable.

A manual call to isototvideo (from a regular user) fails the same way.

#3 Updated by okurz 3 months ago

  • Assignee changed from okurz to ggardet_arm
  • Target version changed from Ready to future

That's a step forward. I hope you understand that so far we heavily rely on the special environment where this happens because so far I am unaware of any way to reproduce the same issue elsewhere.

I think what could be improved from backend side though is the logging and also the tracking of subprocesses to improve what people can understand from the not so clear text flow in this section:

GOT GO

[2021-03-08T10:18:37.703 UTC] [debug] THERE IS NOTHING TO READ 4 5 4
myjsonrpc: remote end terminated connection, stopping at /usr/lib/os-autoinst/myjsonrpc.pm line 57, <$fh> line 78.

But again the main problem I see here is that you are using SLE15-SP1 which is unfortunately unsupported. Could you switch to openSUSE Leap 15.2 for the complete system or run the worker in container of either openSUSE Leap 15.2 or openSUSE Tumbleweed? As an alternative if you are interested in improving the support for SLE15-SP1 yourself then look into the unresolvables on https://build.opensuse.org/project/monitor/devel:openQA?arch_aarch64=1&blocked=1&broken=1&building=1&defaults=0&deleting=1&dispatching=1&failed=1&finished=1&locked=1&repo_SLE_15_SP1=1&scheduled=1&signing=1&succeeded=1&unresolvable=1

#4 Updated by ggardet_arm 3 months ago

Unfortunately, this hardware is pre-installed with SLE15-SP1. I may try a zypper dup to Leap 15.2.
Anyway, I created https://github.com/os-autoinst/openQA/pull/3777 to build again openQA for non-openSUSE targets.

#5 Updated by okurz 3 months ago

Both is a good idea :)

#6 Updated by okurz 3 months ago

  • Due date deleted (2021-03-16)

#7 Updated by ggardet_arm 3 months ago

  • Status changed from Feedback to Resolved

I upgraded to SLE15-SP2 and it fixed the problem: https://openqa.opensuse.org/tests/1678067

Also available in: Atom PDF