[o3][kernel][scheduler][x86_64] Dependent (child) jobs should start after uploading all of parent assets
LTP tests depend on install_ltp. On o3, child jobs start after finished tests, but that's before parent has uploaded needed dependencies. It just does not wait until ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt has been uploaded. This file cannot be expressed a dependency in vars.json. But also opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp.qcow2, which is HDD_1 for child test (PUBLISH_HDD_1 for parent) has not been uploaded yet. Paret upload needed asset at 22:45:53, but test starts at 22:10:24. Is it a setup problem or a bug in scheduler? Similar setup is on osd, where it looks ok.
[2020-02-10T22:40:43.0499 CET] [info] +++ setup notes +++ [2020-02-10T22:40:43.0499 CET] [info] Start time: 2020-02-10 21:40:43 ... [2020-02-10T22:45:52.0451 CET] [info] Isotovideo exit status: 0 [2020-02-10T22:45:52.0478 CET] [info] +++ worker notes +++ [2020-02-10T22:45:52.0478 CET] [info] End time: 2020-02-10 21:45:52 ... [2020-02-10T22:45:53.0303 CET] [info] Uploading ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt ... [2020-02-10T22:46:01.0131 CET] [info] Uploading opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp.qcow2
[2020-02-10T22:45:53.0303 CET] [info] Uploading ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt [2020-02-10T22:45:53.0303 CET] [info] Uploading ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt using multiple chunks [2020-02-10T22:45:53.0304 CET] [info] ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt: 1 chunks [2020-02-10T22:45:53.0304 CET] [info] ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt: chunks of 1000000 bytes each [2020-02-10T22:45:53.0383 CET] [info] ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt: Processing chunk 1/1 avg speed ~0.336KB/s ... [2020-02-10T22:48:23.0775 CET] [info] opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp.qcow2: Processing chunk 1128/1128 avg speed ~342.062KB/s
[2020-02-10T22:10:24.0956 UTC] [info] Start time: 2020-02-10 22:10:24 ... [2020-02-10T22:10:35.0098 UTC] [debug] Found ISO, caching openSUSE-Tumbleweed-DVD-x86_64-Snapshot20200209-Media.iso [2020-02-10T22:10:35.0102 UTC] [info] Downloading openSUSE-Tumbleweed-DVD-x86_64-Snapshot20200209-Media.iso, request #27 sent to Cache Service [2020-02-10T22:10:45.0203 UTC] [info] Download of openSUSE-Tumbleweed-DVD-x86_64-Snapshot20200209-Media.iso processed: [info] [#27] Cache size of "/var/lib/openqa/cache" is 12GiB, with limit 50GiB [info] [#27] Downloading "openSUSE-Tumbleweed-DVD-x86_64-Snapshot20200209-Media.iso" from "http://openqa1-opensuse/tests/1169623/asset/iso/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20200209-Media.iso" ... [2020-02-10T22:10:50.896 UTC] [debug] scheduling boot_ltp tests/kernel/boot_ltp.pm Can not open runtest asset /var/lib/openqa/share/factory/other/ltp-cpuhotplug-opensuse-Tumbleweed-x86_64-20200209-DVD@64bit-with-ltp-qcow2.txt: No such file or directory at /var/lib/openqa/cache/openqa1-opensuse/tests/opensuse/lib/main_ltp.pm line 64. Compilation failed in require at /usr/bin/isotovideo line 288. [2020-02-10T22:10:50.896 UTC] [debug] terminating command server 4156 because test execution ended through exception [2020-02-10T22:10:51.897 UTC] [debug] done with command server 4153: EXIT 1
[2020-02-10T22:10:50.0261 UTC] [info] Preparing cgroup to start isotovideo
#4 Updated by rpalethorpe over 1 year ago
IIRC the problem here is that os-autoinst generates the list of assets from the SUT (specifically from the LTP package/source). OpenQA therefor does not know which assets to expect ahead of time.
Probably the easiest solution is to create a runtest archive, then OpenQA only needs to expect one asset and can wait for it. (this is also easier for OpenQA's database).
Another problem might be that OpenQA only waits for VM image type assets. So some extra work may be required in the asset handling code.
#5 Updated by okurz over 1 year ago
- Project changed from openQA Project to openQA Tests
- Subject changed from [o3][scheduler][x86_64] Dependent (child) jobs should start after uploading all of parent assets to [o3][kernel][scheduler][x86_64] Dependent (child) jobs should start after uploading all of parent assets
- Category changed from Concrete Bugs to Bugs in existing tests
rpalethorpe pretty much nailed it. I would not know what openQA can do better when it wouldn't know about the assets from the beginning. Not sure if you can even call it "parent assets" in the current way of implementation. Even more so I rather see this as an issue for "openQA tests" rather than openQA itself.
#6 Updated by pvorel over 1 year ago
- Priority changed from Normal to High
- Difficulty set to medium
We will have to implement it to get back testing on o3 on intel. BTW I wonder what's different on o3 (all LTP run problems are only on o3 on intel).
BTW It would be nice if LTP way of testing would be better integrated into openQA. I consider LTP related changes as an improvements, maybe some other tests might would benefit from it as well, but understand that tools team doesn't have resources for this better integration and kernel-qa neither.
#7 Updated by pvorel over 1 year ago
Implemented fix: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9747
#8 Updated by pvorel over 1 year ago
Although https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9747 is a general improvement it will probably not fix o3.
Mdoucha noticed that the problem is only on openqaworker7, yep, openqaworker1 and openqaworker4 are ok.
I wanted to check NFS on openqaworker7, but there is no password authentication (Permission denied (publickey)). That suggest there is something different on openqaworker7.
#9 Updated by pvorel over 1 year ago
- Status changed from New to Feedback
openqaworker7 was fixed by okurz (thanks!):
MDoucha I have fixed the NFS mount on openqaworker7. This was an oversight by me when setting up the machine for o3 some days ago. The machine is within the o3 VLAN hence not reachable from internal SUSE same as openqaworker1 and openqaworker4 and others.
So it should be fixed in next build