Project

General

Profile

action #63373

[o3][kernel][scheduler][x86_64] Dependent (child) jobs should start after uploading all of parent assets

Added by pvorel over 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
QE Kernel - QE Kernel Done
Start date:
2020-02-11
Due date:
% Done:

0%

Estimated time:
Difficulty:
medium

Description

LTP tests depend on install_ltp. On o3, child jobs start after finished tests, but that's before parent has uploaded needed dependencies. It just does not wait until ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt has been uploaded. This file cannot be expressed a dependency in vars.json. But also opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp.qcow2, which is HDD_1 for child test (PUBLISH_HDD_1 for parent) has not been uploaded yet. Paret upload needed asset at 22:45:53, but test starts at 22:10:24. Is it a setup problem or a bug in scheduler? Similar setup is on osd, where it looks ok.

install_ltp.1169614.autoinst-log.txt (https://openqa.opensuse.org/tests/1169614/file/autoinst-log.txt)

[2020-02-10T22:40:43.0499 CET] [info] +++ setup notes +++
[2020-02-10T22:40:43.0499 CET] [info] Start time: 2020-02-10 21:40:43
...
[2020-02-10T22:45:52.0451 CET] [info] Isotovideo exit status: 0
[2020-02-10T22:45:52.0478 CET] [info] +++ worker notes +++
[2020-02-10T22:45:52.0478 CET] [info] End time: 2020-02-10 21:45:52
...
[2020-02-10T22:45:53.0303 CET] [info] Uploading ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt
...
[2020-02-10T22:46:01.0131 CET] [info] Uploading opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp.qcow2

install_ltp.1169614.worker-log.txt (https://openqa.opensuse.org/tests/1169614/file/worker-log.txt)

[2020-02-10T22:45:53.0303 CET] [info] Uploading ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt
[2020-02-10T22:45:53.0303 CET] [info] Uploading ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt using multiple chunks
[2020-02-10T22:45:53.0304 CET] [info] ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt: 1 chunks
[2020-02-10T22:45:53.0304 CET] [info] ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt: chunks of 1000000 bytes each
[2020-02-10T22:45:53.0383 CET] [info] ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt: Processing chunk 1/1 avg speed ~0.336KB/s
...
[2020-02-10T22:48:23.0775 CET] [info] opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp.qcow2: Processing chunk 1128/1128 avg speed ~342.062KB/s

ltp_cpuhotplug.1169623.autoinst-log.txt (https://openqa.opensuse.org/tests/1169623/file/autoinst-log.txt)

[2020-02-10T22:10:24.0956 UTC] [info] Start time: 2020-02-10 22:10:24
...
[2020-02-10T22:10:35.0098 UTC] [debug] Found ISO, caching openSUSE-Tumbleweed-DVD-x86_64-Snapshot20200209-Media.iso
[2020-02-10T22:10:35.0102 UTC] [info] Downloading openSUSE-Tumbleweed-DVD-x86_64-Snapshot20200209-Media.iso, request #27 sent to Cache Service
[2020-02-10T22:10:45.0203 UTC] [info] Download of openSUSE-Tumbleweed-DVD-x86_64-Snapshot20200209-Media.iso processed:
[info] [#27] Cache size of "/var/lib/openqa/cache" is 12GiB, with limit 50GiB
[info] [#27] Downloading "openSUSE-Tumbleweed-DVD-x86_64-Snapshot20200209-Media.iso" from "http://openqa1-opensuse/tests/1169623/asset/iso/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20200209-Media.iso"
...
[2020-02-10T22:10:50.896 UTC] [debug] scheduling boot_ltp tests/kernel/boot_ltp.pm
Can not open runtest asset /var/lib/openqa/share/factory/other/ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt: No such file or directory at /var/lib/openqa/cache/openqa1-opensuse/tests/opensuse/lib/main_ltp.pm line 64.
Compilation failed in require at /usr/bin/isotovideo line 288.
[2020-02-10T22:10:50.896 UTC] [debug] terminating command server 4156 because test execution ended through exception
[2020-02-10T22:10:51.897 UTC] [debug] done with command server
4153: EXIT 1

ltp_cpuhotplug.1169623.worker-log.txt (https://openqa.opensuse.org/tests/1169623/file/worker-log.txt)

[2020-02-10T22:10:50.0261 UTC] [info] Preparing cgroup to start isotovideo
install_ltp.1169614.autoinst-log.txt (417 KB) install_ltp.1169614.autoinst-log.txt parent's autoinst-log.txt pvorel, 2020-02-11 07:19
install_ltp.1169614.worker-log.txt (221 KB) install_ltp.1169614.worker-log.txt parent's worker-log.txt pvorel, 2020-02-11 07:19
ltp_cpuhotplug.1169623.autoinst-log.txt (5.35 KB) ltp_cpuhotplug.1169623.autoinst-log.txt child's autoinst-log.txt pvorel, 2020-02-11 07:20
ltp_cpuhotplug.1169623.worker-log.txt (606 Bytes) ltp_cpuhotplug.1169623.worker-log.txt child's worker-log.txt pvorel, 2020-02-11 07:20
install_ltp.1169614.vars.json (3.81 KB) install_ltp.1169614.vars.json parent's vars pvorel, 2020-02-11 07:32
ltp_cpuhotplug.1169623.vars.json (3.03 KB) ltp_cpuhotplug.1169623.vars.json child's vars pvorel, 2020-02-11 07:33

Related issues

Related to openQA Tests - action #51743: [openqa] All LTP tests are failing on boot_ltp for openSUSE (o3) on [x86_64]Resolved2019-05-21

History

#1 Updated by pvorel over 1 year ago

  • Related to action #51743: [openqa] All LTP tests are failing on boot_ltp for openSUSE (o3) on [x86_64] added

#2 Updated by pvorel over 1 year ago

  • Description updated (diff)

#3 Updated by pvorel over 1 year ago

  • Subject changed from [o3][scheduler] Dependent (child) jobs should start after uploading all of parent assets to [o3][scheduler][x86_64] Dependent (child) jobs should start after uploading all of parent assets

#4 Updated by rpalethorpe over 1 year ago

IIRC the problem here is that os-autoinst generates the list of assets from the SUT (specifically from the LTP package/source). OpenQA therefor does not know which assets to expect ahead of time.

Probably the easiest solution is to create a runtest archive, then OpenQA only needs to expect one asset and can wait for it. (this is also easier for OpenQA's database).

Another problem might be that OpenQA only waits for VM image type assets. So some extra work may be required in the asset handling code.

#5 Updated by okurz over 1 year ago

  • Project changed from openQA Project to openQA Tests
  • Subject changed from [o3][scheduler][x86_64] Dependent (child) jobs should start after uploading all of parent assets to [o3][kernel][scheduler][x86_64] Dependent (child) jobs should start after uploading all of parent assets
  • Category changed from Concrete Bugs to Bugs in existing tests

rpalethorpe pretty much nailed it. I would not know what openQA can do better when it wouldn't know about the assets from the beginning. Not sure if you can even call it "parent assets" in the current way of implementation. Even more so I rather see this as an issue for "openQA tests" rather than openQA itself.

#6 Updated by pvorel over 1 year ago

  • Priority changed from Normal to High
  • Difficulty set to medium

We will have to implement it to get back testing on o3 on intel. BTW I wonder what's different on o3 (all LTP run problems are only on o3 on intel).

BTW It would be nice if LTP way of testing would be better integrated into openQA. I consider LTP related changes as an improvements, maybe some other tests might would benefit from it as well, but understand that tools team doesn't have resources for this better integration and kernel-qa neither.

#8 Updated by pvorel over 1 year ago

Although https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9747 is a general improvement it will probably not fix o3.
Mdoucha noticed that the problem is only on openqaworker7, yep, openqaworker1 and openqaworker4 are ok.

I wanted to check NFS on openqaworker7, but there is no password authentication (Permission denied (publickey)). That suggest there is something different on openqaworker7.

#9 Updated by pvorel over 1 year ago

  • Status changed from New to Feedback

openqaworker7 was fixed by okurz (thanks!):

MDoucha I have fixed the NFS mount on openqaworker7. This was an oversight by me when setting up the machine for o3 some days ago. The machine is within the o3 VLAN hence not reachable from internal SUSE same as openqaworker1 and openqaworker4 and others.

So it should be fixed in next build

#10 Updated by pvorel over 1 year ago

  • Assignee set to pvorel
  • Target version set to 445

#11 Updated by pvorel over 1 year ago

  • Status changed from Feedback to Resolved

Worker fixed: https://openqa.opensuse.org/tests/1200301
This also verifies, that the new code works.

#12 Updated by metan over 1 year ago

  • Target version changed from 445 to 457

#13 Updated by pcervinka 11 months ago

  • Target version changed from 457 to QE Kernel Done

Also available in: Atom PDF