action #119713
closedLeap tests are failing because of failed log file uploading in multiple tests on s390x size:M
Description
Motivation¶
openQA tests are failing for openSUSE Leap 15.5 on s390x. The curiosity is, that log files can not be uploaded in multiple tests.
https://openqa.opensuse.org/tests/2842380#step/logs_from_installation_system/13
textinfo is affected with textinfo-logs.tar.bz2.
logs_from_installation_system is affected with y2logs.tar.bz2.
the test tries to reach 192.168.112.6:21013. IPv4 which might not work as the IPv4 address is from openqaworker1 but the openQA worker instance runs within a container. But https://openqa.opensuse.org/tests/2833511#step/logs_from_installation_system/12 on Tumbleweed seems to work from 2022-10-25, maybe broke after that?
Acceptance criteria¶
- AC1: The scenario https://openqa.opensuse.org/tests/latest?arch=s390x&distri=opensuse&flavor=DVD&machine=s390x-zVM-vswitch-l2&test=textmode&version=15.5 passes again
Suggestions¶
- Investigate the difference between the mentioned bad and https://openqa.opensuse.org/tests/2833511#step/logs_from_installation_system/12 on Tumbleweed which still works fine
- Look into related issues for background stories
Updated by okurz about 2 years ago
- Priority changed from Normal to Urgent
- Target version set to Ready
Updated by okurz about 2 years ago
- Related to action #117631: Failed systemd service transactional-update on openqaworker1 - system is no longer reachable after reboot size:M added
Updated by okurz about 2 years ago
- Related to action #116782: o3 s390 workers are offline added
Updated by okurz about 2 years ago
- Related to action #97751: replacement setup for o3 s390x openQA workers size:M added
Updated by okurz about 2 years ago
- Subject changed from Leap tests are failing because of failed log file uploading in multiple tests on s390x to Leap tests are failing because of failed log file uploading in multiple tests on s390x size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz about 2 years ago
- Status changed from Workable to In Progress
- Assignee set to okurz
working on that within the mob session.
Triggered a rerun of the "last good" to see if it still works:
openqa-clone-job --within-instance https://openqa.opensuse.org/tests/2833511 BUILD= TEST=textmode-okurz _GROUP=0
-> Created job #2851630: opensuse-Tumbleweed-DVD-s390x-Build20221024-textmode@s390x-zVM-vswitch-l2 -> https://openqa.opensuse.org/t2851630
https://openqa.opensuse.org/tests/2842380#step/textinfo/10 shows that the curl command does not finish within a reasonable time. But later curl commands for smaller log files do succeed. Using the SSH config that is visible in https://openqa.opensuse.org/tests/2842380#step/bootloader_s390/17 we interactively logged into the test https://openqa.opensuse.org/tests/2851630# while it was running. The test by the way reproduced the problem so very likely the problem was introduced in the past days when dheidler and mkittler worked on forwarding the cache service into the s390x worker containers. Following the autoinst-log.txt file while we tried uploads we found:
Mojo::Reactor::Poll: I/O watcher failed: Error in tempfile() using template /var/lib/openqa/cache/tmp/mojo.tmp.XXXXXXXXXXXXXXXX: Could not create temp file /var/lib/openqa/cache/tmp/mojo.tmp.mbmTdPv6NyxTOBTy: Permission denied at /usr/lib/perl5/5.26.1/File/Temp.pm line 1102.
File::Temp::tempfile("mojo.tmp.XXXXXXXXXXXXXXXX", "DIR", "/var/lib/openqa/cache/tmp") called at /usr/lib/perl5/5.26.1/File/Temp.pm l
ine 898
File::Temp::new("File::Temp", "DIR", "/var/lib/openqa/cache/tmp", "TEMPLATE", "mojo.tmp.XXXXXXXXXXXXXXXX", "UNLINK", 0) called at /u
sr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 153
Mojo::File::tempfile("DIR", "/var/lib/openqa/cache/tmp", "TEMPLATE", "mojo.tmp.XXXXXXXXXXXXXXXX", "UNLINK", 0) called at /usr/lib/pe
rl5/vendor_perl/5.26.1/Mojo/Asset/File.pm line 23
Mojo::Asset::File::__ANON__(Mojo::Asset::File=HASH(0x562be3256808)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Base.pm line 78
Mojo::Asset::File::handle(Mojo::Asset::File=HASH(0x562be3256808)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Asset/File.pm line 41
so the problem is unable to create temp files. Taking a look into the original job's log file we find
https://openqa.opensuse.org/tests/2842380/logfile?filename=autoinst-log.txt
Mojo::Reactor::Poll: I/O watcher failed: Error in tempfile() using template /var/lib/openqa/cache/tmp/mojo.tmp.XXXXXXXXXXXXXXXX: Could not create temp file /var/lib/openqa/cache/tmp/mojo.tmp.56k5Q5xGI4tiAW9g: Permission denied at /usr/lib/perl5/5.26.1/File/Temp.pm line 1102.
File::Temp::tempfile("mojo.tmp.XXXXXXXXXXXXXXXX", "DIR", "/var/lib/openqa/cache/tmp") called at /usr/lib/perl5/5.26.1/File/Temp.pm line 898
File::Temp::new("File::Temp", "DIR", "/var/lib/openqa/cache/tmp", "TEMPLATE", "mojo.tmp.XXXXXXXXXXXXXXXX", "UNLINK", 0) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 153
Mojo::File::tempfile("DIR", "/var/lib/openqa/cache/tmp", "TEMPLATE", "mojo.tmp.XXXXXXXXXXXXXXXX", "UNLINK", 0) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Asset/File.pm line 23
so the exact same error
Updated by mkittler about 2 years ago
- Assignee deleted (
okurz)
Error message we see on the worker:
openqaworker1_container:/var/lib/openqa/pool/104 # tail -f autoinst-log.txt
autotest::__ANON__(Mojo::IOLoop::ReadWriteProcess=HASH(0x562be31010a8)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/R
eadWriteProcess.pm line 326
eval {...} called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 326
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x562be31010a8), CODE(0x562be31d27c8)) called at /usr/l
ib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 488
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x562be31010a8)) called at /usr/lib/os-autoinst/autotes
t.pm line 296
autotest::start_process() called at /usr/bin/isotovideo line 273
[2022-11-03T13:31:36.497328Z] [debug] ||| finished logs_from_installation_system installation (runtime: 450 s)
[2022-11-03T13:31:36.497458Z] [debug] ||| post fail hooks runtime: 326 s
[2022-11-03T13:31:36.502084Z] [debug] ||| starting reboot_after_installation tests/installation/reboot_after_installation.pm
[2022-11-03T13:31:36.504036Z] [debug] isotovideo: pausing test execution of installation-reboot_after_installation because we're supposed
to pause at this test module
Mojo::Reactor::Poll: I/O watcher failed: Error in tempfile() using template /var/lib/openqa/cache/tmp/mojo.tmp.XXXXXXXXXXXXXXXX: Could no
t create temp file /var/lib/openqa/cache/tmp/mojo.tmp.mbmTdPv6NyxTOBTy: Permission denied at /usr/lib/perl5/5.26.1/File/Temp.pm line 1102
.
File::Temp::tempfile("mojo.tmp.XXXXXXXXXXXXXXXX", "DIR", "/var/lib/openqa/cache/tmp") called at /usr/lib/perl5/5.26.1/File/Temp.pm line 898
File::Temp::new("File::Temp", "DIR", "/var/lib/openqa/cache/tmp", "TEMPLATE", "mojo.tmp.XXXXXXXXXXXXXXXX", "UNLINK", 0) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 153
Mojo::File::tempfile("DIR", "/var/lib/openqa/cache/tmp", "TEMPLATE", "mojo.tmp.XXXXXXXXXXXXXXXX", "UNLINK", 0) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Asset/File.pm line 23
Mojo::Asset::File::__ANON__(Mojo::Asset::File=HASH(0x562be3256808)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Base.pm line 78
Mojo::Asset::File::handle(Mojo::Asset::File=HASH(0x562be3256808)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Asset/File.pm line 41
Mojo::Asset::File::add_chunk(Mojo::Asset::File=HASH(0x562be3256808), "BZh91AY\x{26}SY!\x{ac}1\"\x{3}\x{19}\x{14}\x{7f}\x{ff}\x{ff
}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{f
f}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{ff}\x{e6}\x{ed}\x{df}\x{2}\x{dd}\x{b7}{}\x{cf}3"...) called at /usr/lib/perl5/vendor_perl/
5.26.1/Mojo/Asset/Memory.pm line 18
Mojo::Asset::Memory::add_chunk(Mojo::Asset::Memory=HASH(0x562be325cb18), "I\x{81}\x{b0}-\x{1c}\x{fd}\x{a7}r\x{f2}*\x{88}\x{be}\x{
a1}\x{dc}\x{8f}\x{e3}m\x{aa}t\x{cd}\x{81}\x{bc}V\x{c}\x{97}\x{c5}\x{f4}u\x{9b}0\x{a2}o\"\x{d4}G)\x{1b}\x{8a}J\x{ea}N\x{fa}cj\x{88}8a\x{df
}V{\x{1c}\x{83},\x{ed}[\x{9c}h\x{8e}\x{85}\x{a4}\x{be}"...) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/Content/Single.pm line 32
Mojo::Content::Single::__ANON__(Mojo::Content::Single=HASH(0x562be2eb3b80), "I\x{81}\x{b0}-\x{1c}\x{fd}\x{a7}r\x{f2}*\x{88}\x{be}
\x{a1}\x{dc}\x{8f}\x{e3}m\x{aa}t\x{cd}\x{81}\x{bc}V\x{c}\x{97}\x{c5}\x{f4}u\x{9b}0\x{a2}o\"\x{d4}G)\x{1b}\x{8a}J\x{ea}N\x{fa}cj\x{88}8a\x
{df}V{\x{1c}\x{83},\x{ed}[\x{9c}h\x{8e}\x{85}\x{a4}\x{be}"...) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/EventEmitter.pm line 15
Mojo::EventEmitter::emit(Mojo::Content::Single=HASH(0x562be2eb3b80), "read", "I\x{81}\x{b0}-\x{1c}\x{fd}\x{a7}r\x{f2}*\x{88}\x{be
Updated by okurz about 2 years ago
- Due date set to 2022-11-17
- Status changed from In Progress to Feedback
- Assignee set to okurz
https://github.com/os-autoinst/openQA/pull/4884
https://openqa.opensuse.org/tests/2851985#live is a verification run to show the code in hot-patched production.
https://openqa.opensuse.org/tests/2851985#step/logs_from_installation_system/12 shows the test passing the upload attempt.
We should still ensure that the current cache service implementation does the efficient approach using a temporary directory on the same filesystem where final assets end up.
Actually we verified that https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/CacheService/Model/Cache.pm#L113 does that with
my $downloader = $self->downloader->log($log)->tmpdir($self->_realpath->child('tmp')->to_string);
so all good.
Updated by okurz about 2 years ago
Updated by okurz almost 2 years ago
- Due date deleted (
2022-11-17) - Status changed from Feedback to Resolved
PR merged. https://openqa.opensuse.org/tests/2864552 is passed
Updated by okurz almost 2 years ago
- Related to action #120405: Failed log file tests are transferred from Leap to Tumbleweed for s390x size:M added
Updated by jstehlik almost 2 years ago
Lubos Kocman state sin his e-mail, that the issues is fixed only for single test running. When multiple tests run, the issue persists. Also tumbleweed seems to be affected:
https://progress.opensuse.org/issues/120405