Project

General

Profile

action #119713

Leap tests are failing because of failed log file uploading in multiple tests on s390x size:M

Added by AdaLovelace 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2022-11-01
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

openQA tests are failing for openSUSE Leap 15.5 on s390x. The curiosity is, that log files can not be uploaded in multiple tests.
https://openqa.opensuse.org/tests/2842380#step/logs_from_installation_system/13

textinfo is affected with textinfo-logs.tar.bz2.
logs_from_installation_system is affected with y2logs.tar.bz2.

the test tries to reach 192.168.112.6:21013. IPv4 which might not work as the IPv4 address is from openqaworker1 but the openQA worker instance runs within a container. But https://openqa.opensuse.org/tests/2833511#step/logs_from_installation_system/12 on Tumbleweed seems to work from 2022-10-25, maybe broke after that?

Acceptance criteria

Suggestions


Related issues

Related to openQA Infrastructure - action #117631: Failed systemd service transactional-update on openqaworker1 - system is no longer reachable after reboot size:MResolved2022-10-052022-11-04

Related to openQA Infrastructure - action #116782: o3 s390 workers are offlineResolved2022-09-192022-10-04

Related to openQA Infrastructure - action #97751: replacement setup for o3 s390x openQA workers size:MResolved2021-09-17

Related to openQA Project - action #120405: Failed log file tests are transferred from Leap to Tumbleweed for s390x size:MResolved2022-11-14

History

#1 Updated by okurz 3 months ago

  • Priority changed from Normal to Urgent
  • Target version set to Ready

#2 Updated by okurz 3 months ago

  • Related to action #117631: Failed systemd service transactional-update on openqaworker1 - system is no longer reachable after reboot size:M added

#3 Updated by okurz 3 months ago

#4 Updated by okurz 3 months ago

  • Related to action #97751: replacement setup for o3 s390x openQA workers size:M added

#5 Updated by okurz 3 months ago

  • Subject changed from Leap tests are failing because of failed log file uploading in multiple tests on s390x to Leap tests are failing because of failed log file uploading in multiple tests on s390x size:M
  • Description updated (diff)
  • Status changed from New to Workable

#6 Updated by okurz 3 months ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

working on that within the mob session.

Triggered a rerun of the "last good" to see if it still works:

openqa-clone-job --within-instance https://openqa.opensuse.org/tests/2833511 BUILD= TEST=textmode-okurz _GROUP=0

-> Created job #2851630: opensuse-Tumbleweed-DVD-s390x-Build20221024-textmode@s390x-zVM-vswitch-l2 -> https://openqa.opensuse.org/t2851630

https://openqa.opensuse.org/tests/2842380#step/textinfo/10 shows that the curl command does not finish within a reasonable time. But later curl commands for smaller log files do succeed. Using the SSH config that is visible in https://openqa.opensuse.org/tests/2842380#step/bootloader_s390/17 we interactively logged into the test https://openqa.opensuse.org/tests/2851630# while it was running. The test by the way reproduced the problem so very likely the problem was introduced in the past days when dheidler and mkittler worked on forwarding the cache service into the s390x worker containers. Following the autoinst-log.txt file while we tried uploads we found:

Mojo::Reactor::Poll: I/O watcher failed: Error in tempfile() using template /var/lib/openqa/cache/tmp/mojo.tmp.XXXXXXXXXXXXXXXX: Could not create temp file /var/lib/openqa/cache/tmp/mojo.tmp.mbmTdPv6NyxTOBTy: Permission denied at /usr/lib/perl5/5.26.1/File/Temp.pm line 1102.
        File::Temp::tempfile("mojo.tmp.XXXXXXXXXXXXXXXX", "DIR", "/var/lib/openqa/cache/tmp") called at /usr/lib/perl5/5.26.1/File/Temp.pm l
ine 898
        File::Temp::new("File::Temp", "DIR", "/var/lib/openqa/cache/tmp", "TEMPLATE", "mojo.tmp.XXXXXXXXXXXXXXXX", "UNLINK", 0) called at /u
sr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 153
        Mojo::File::tempfile("DIR", "/var/lib/openqa/cache/tmp", "TEMPLATE", "mojo.tmp.XXXXXXXXXXXXXXXX", "UNLINK", 0) called at /usr/lib/pe
rl5/vendor_perl/5.26.1/Mojo/Asset/File.pm line 23
        Mojo::Asset::File::__ANON__(Mojo::Asset::File=HASH(0x562be3256808)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Base.pm line 78
        Mojo::Asset::File::handle(Mojo::Asset::File=HASH(0x562be3256808)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Asset/File.pm line 41

so the problem is unable to create temp files. Taking a look into the original job's log file we find
https://openqa.opensuse.org/tests/2842380/logfile?filename=autoinst-log.txt

Mojo::Reactor::Poll: I/O watcher failed: Error in tempfile() using template /var/lib/openqa/cache/tmp/mojo.tmp.XXXXXXXXXXXXXXXX: Could not create temp file /var/lib/openqa/cache/tmp/mojo.tmp.56k5Q5xGI4tiAW9g: Permission denied at /usr/lib/perl5/5.26.1/File/Temp.pm line 1102.
    File::Temp::tempfile("mojo.tmp.XXXXXXXXXXXXXXXX", "DIR", "/var/lib/openqa/cache/tmp") called at /usr/lib/perl5/5.26.1/File/Temp.pm line 898
    File::Temp::new("File::Temp", "DIR", "/var/lib/openqa/cache/tmp", "TEMPLATE", "mojo.tmp.XXXXXXXXXXXXXXXX", "UNLINK", 0) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 153
    Mojo::File::tempfile("DIR", "/var/lib/openqa/cache/tmp", "TEMPLATE", "mojo.tmp.XXXXXXXXXXXXXXXX", "UNLINK", 0) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Asset/File.pm line 23

so the exact same error

#7 Updated by mkittler 3 months ago

  • Assignee deleted (okurz)

Error message we see on the worker:

openqaworker1_container:/var/lib/openqa/pool/104 # tail -f autoinst-log.txt                                                              

        autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x562be31010a8)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/R
eadWriteProcess.pm line 326
        eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
        Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x562be31010a8), CODE(0x562be31d27c8)) called at /usr/l
ib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 488
        Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x562be31010a8)) called at /usr/lib/os-autoinst/autotes
t.pm line 296
        autotest::start_process() called at /usr/bin/isotovideo line 273

[2022-11-03T13:31:36.497328Z] [debug] ||| finished logs_from_installation_system installation (runtime: 450 s)
[2022-11-03T13:31:36.497458Z] [debug] ||| post fail hooks runtime: 326 s
[2022-11-03T13:31:36.502084Z] [debug] ||| starting reboot_after_installation tests/installation/reboot_after_installation.pm
[2022-11-03T13:31:36.504036Z] [debug] isotovideo: pausing test execution of installation-reboot_after_installation because we're supposed
 to pause at this test module
Mojo::Reactor::Poll: I/O watcher failed: Error in tempfile() using template /var/lib/openqa/cache/tmp/mojo.tmp.XXXXXXXXXXXXXXXX: Could no
t create temp file /var/lib/openqa/cache/tmp/mojo.tmp.mbmTdPv6NyxTOBTy: Permission denied at /usr/lib/perl5/5.26.1/File/Temp.pm line 1102
.
        File::Temp::tempfile("mojo.tmp.XXXXXXXXXXXXXXXX", "DIR", "/var/lib/openqa/cache/tmp") called at /usr/lib/perl5/5.26.1/File/Temp.pm line 898
        File::Temp::new("File::Temp", "DIR", "/var/lib/openqa/cache/tmp", "TEMPLATE", "mojo.tmp.XXXXXXXXXXXXXXXX", "UNLINK", 0) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 153
        Mojo::File::tempfile("DIR", "/var/lib/openqa/cache/tmp", "TEMPLATE", "mojo.tmp.XXXXXXXXXXXXXXXX", "UNLINK", 0) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Asset/File.pm line 23
        Mojo::Asset::File::__ANON__(Mojo::Asset::File=HASH(0x562be3256808)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Base.pm line 78
        Mojo::Asset::File::handle(Mojo::Asset::File=HASH(0x562be3256808)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Asset/File.pm line 41
        Mojo::Asset::File::add_chunk(Mojo::Asset::File=HASH(0x562be3256808), "BZh91AY\x{26}SY!\x{ac}1\"\x{3}\x{19}\x{14}\x{7f}\x{ff}\x{ff
}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{f
f}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{e6}\x{ed}\x{df}\x{2}\x{dd}\x{b7}{}\x{cf}3"...) called at /usr/lib/perl5/vendor_perl/
5.26.1/Mojo/Asset/Memory.pm line 18
        Mojo::Asset::Memory::add_chunk(Mojo::Asset::Memory=HASH(0x562be325cb18), "I\x{81}\x{b0}-\x{1c}\x{fd}\x{a7}r\x{f2}*\x{88}\x{be}\x{
a1}\x{dc}\x{8f}\x{e3}m\x{aa}t\x{cd}\x{81}\x{bc}V\x{c}\x{97}\x{c5}\x{f4}u\x{9b}0\x{a2}o\"\x{d4}G)\x{1b}\x{8a}J\x{ea}N\x{fa}cj\x{88}8a\x{df
}V{\x{1c}\x{83},\x{ed}[\x{9c}h\x{8e}\x{85}\x{a4}\x{be}"...) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Content/Single.pm line 32
        Mojo::Content::Single::__ANON__(Mojo::Content::Single=HASH(0x562be2eb3b80), "I\x{81}\x{b0}-\x{1c}\x{fd}\x{a7}r\x{f2}*\x{88}\x{be}
\x{a1}\x{dc}\x{8f}\x{e3}m\x{aa}t\x{cd}\x{81}\x{bc}V\x{c}\x{97}\x{c5}\x{f4}u\x{9b}0\x{a2}o\"\x{d4}G)\x{1b}\x{8a}J\x{ea}N\x{fa}cj\x{88}8a\x
{df}V{\x{1c}\x{83},\x{ed}[\x{9c}h\x{8e}\x{85}\x{a4}\x{be}"...) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/EventEmitter.pm line 15
        Mojo::EventEmitter::emit(Mojo::Content::Single=HASH(0x562be2eb3b80), "read", "I\x{81}\x{b0}-\x{1c}\x{fd}\x{a7}r\x{f2}*\x{88}\x{be

#8 Updated by okurz 3 months ago

  • Due date set to 2022-11-17
  • Status changed from In Progress to Feedback
  • Assignee set to okurz

https://github.com/os-autoinst/openQA/pull/4884

https://openqa.opensuse.org/tests/2851985#live is a verification run to show the code in hot-patched production.

https://openqa.opensuse.org/tests/2851985#step/logs_from_installation_system/12 shows the test passing the upload attempt.

We should still ensure that the current cache service implementation does the efficient approach using a temporary directory on the same filesystem where final assets end up.

Actually we verified that https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/CacheService/Model/Cache.pm#L113 does that with

my $downloader = $self->downloader->log($log)->tmpdir($self->_realpath->child('tmp')->to_string);

so all good.

#10 Updated by okurz 3 months ago

  • Due date deleted (2022-11-17)
  • Status changed from Feedback to Resolved

#11 Updated by okurz 2 months ago

  • Related to action #120405: Failed log file tests are transferred from Leap to Tumbleweed for s390x size:M added

#12 Updated by jstehlik 2 months ago

Lubos Kocman state sin his e-mail, that the issues is fixed only for single test running. When multiple tests run, the issue persists. Also tumbleweed seems to be affected:
https://progress.opensuse.org/issues/120405

Also available in: Atom PDF