action #69475
closed[tools] openQA child task fails to download asset created by parent job
Added by dimstar over 4 years ago. Updated over 4 years ago.
0%
Description
Observation¶
openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-autoyast_reinstall_gnome@64bit fails in
installation
The job clone_system uploads an autoyast profile, which should then be consumed by the autoyast_reinstall_gnome test. The file seems to be there as part of the assets, but the worker fails to download it
Test suite description¶
Reproducible¶
Fails since (at least) Build 20200730
Expected result¶
Last good: 20200528 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by SLindoMansilla over 4 years ago
- Subject changed from openQA child task fails to download asset created by parent job to [y] openQA child task fails to download asset created by parent job
Updated by riafarov over 4 years ago
- Project changed from openQA Tests (public) to openQA Project (public)
- Subject changed from [y] openQA child task fails to download asset created by parent job to openQA child task fails to download asset created by parent job
- Category deleted (
Bugs in existing tests)
We have this sporadic issue on o3 once in a while, whereas generally it works. Could someone from the tools team take a look?
Updated by SLindoMansilla over 4 years ago
- Subject changed from openQA child task fails to download asset created by parent job to [tools] openQA child task fails to download asset created by parent job
Without a [label], this ticket would appear again in the income pool
Updated by mkittler over 4 years ago
- Project changed from openQA Project (public) to openQA Infrastructure (public)
- Status changed from New to In Progress
- Assignee set to mkittler
- Target version set to Ready
I've been investigating the latest job which still shows the issue: https://openqa.opensuse.org/tests/1348359
- The asset is present in the asset database and has not been cleaned up. It can be downloaded from https://openqa.opensuse.org/assets/other/01348026-autoinst.xml.
- The URL from the Yast error message seems correct (JOBTOKEN matches the one from vars.json, asset type matches, asset name matches, IP and port are likely correct as well considering there was a response).
I'm not familiar with the relevant code. It looks like this kind of asset downloads it provided the os-autoinst's command server. This HTTP server would simply search for the asset in the local filesystem (local = worker machine). That's how the path is constructed: my $path = join '/', $bmwqemu::vars{ASSETDIR}, $self->param('assettype'), $self->param('assetname');
In this case it would likely be /var/lib/openqa/share/factory/other/01348026-autoinst.xml
. Apparently /var/lib/openqa/share
is an empty directory on openqaworker7
which is the worker producing the failing jobs. When I remember correctly, /var/lib/openqa/share
is actually supposed to be an NFS mount.
So this is likely a configuration issue of one ore more o3 workers. I'll see how other workers are configured and try to restore the configuration.
It would actually be useful if the command server would log files missing on the filesystem so the exact path where the file is supposed to be located would be clear. Maybe that's also worth improving.
Updated by mkittler over 4 years ago
- Status changed from In Progress to Feedback
The line for /var/lib/openqa/share
was commented-out in /etc/fstab
. I'm wondering who did this and why. I've added the line back and immediately restored the mount point via mount --types nfs4 --options ro,fsc --source openqa1-opensuse:/ --target /var/lib/openqa/share
. The asset exists now under /var/lib/openqa/share/factory/other/01348026-autoinst.xml
and the YaST download should work again.
I've also checked other workers but all had the NFS mount present.
Updated by mkittler over 4 years ago
- Status changed from Feedback to Resolved
I guess the issue has been fixed. Just reopen the issue if the problem still occurs.
Updated by mkittler over 4 years ago
- Related to action #70723: Fix tests not to rely on `/var/lib/openqa/share` mountpoint added
Updated by dimstar over 4 years ago
- Status changed from Resolved to New
This seems to be a recurring issue - last seen in :
https://openqa.opensuse.org/tests/1377258#next_previous
The tests failing at 'installation' are usually due to this. So there are cases where it passes on.
Updated by mkittler over 4 years ago
It wasn't mounted again anymore - even though the line is not commented out in /etc/fstab
anymore.
Not sure what's wrong. Isn't
openqa1-opensuse:/ /var/lib/openqa/share nfs4 ro,fsc 0 0
in /etc/fstab
equivalent to mount --types nfs4 --options ro,fsc --source openqa1-opensuse:/ --target /var/lib/openqa/share
? And on the other workers the line in /etc/fstab
works as well.
Updated by okurz over 4 years ago
- Related to action #64941: after every reboot openqaworker7 is missing var-lib-openqa-share.mount , check dependencies of service with openqaworker1 added
Updated by okurz over 4 years ago
#64941 is a problem about openqaworker7 having an unreliable mount point configuration which is causing this. #70723 is certainly a good approach to fix this in general. The specific issue about autoyast was introduced 4 months ago. In before we had no tests running on o3 that relied on any NFS mount. I thought I had even disabled the NFS mount on all workers as well.
@mkittler as the mount configuration on openqaworker7 is unreliable and lost on every reboot anyway I suggest you revert your change and keep the mount point disabled in /etc/fstab unless you find a quick fix to keep it enabled temporarily.
The commit
commit 8ea13b8c3
Author: Rodion Iafarov <riafarov@suse.com>
Date: Wed Apr 8 19:39:41 2020 +0200
Fix assets handling for AY tests without modifications
We have fixed problem with assets for the jobs where we inject something
to the profile, but still have issue when we simply use profile from the
parent job.
See [poo#64961](https://progress.opensuse.org/issues/64961).
within os-autoinst-distri-opensuse brought back the reference to the local path in 2020-04 so I suggest for this specific ticket to go to the "[y]"-team again and code be changed to not rely on any local filesystem access of assets from an NFS mount point which is not available an all workers anyway.
Updated by mkittler over 4 years ago
Knowing about #64941 would have saved me some time. I would actually tend to mark this ticket as a duplicate of #64941.
I thought I had even disabled the NFS mount on all workers as well.
Well, I've actually checked each and every worker and they all still had the NFS mount point enabled (and hence considered openqaworker7
to be the odd one).
I suggest for this specific ticket to go to the "[y]"-team again and code be changed to not rely on any local filesystem access […]
As mentioned in #70723 I'm not sure whether it will be particularly easy for them so maybe it is better to fix this from our side.
Updated by mkittler over 4 years ago
- Is duplicate of action #64941: after every reboot openqaworker7 is missing var-lib-openqa-share.mount , check dependencies of service with openqaworker1 added