Project

General

Profile

action #69475

[tools] openQA child task fails to download asset created by parent job

Added by dimstar 12 months ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2020-07-31
Due date:
% Done:

0%

Estimated time:

Description

Observation

openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-autoyast_reinstall_gnome@64bit fails in
installation

The job clone_system uploads an autoyast profile, which should then be consumed by the autoyast_reinstall_gnome test. The file seems to be there as part of the assets, but the worker fails to download it

Test suite description

Reproducible

Fails since (at least) Build 20200730

Expected result

Last good: 20200528 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Project - action #70723: Fix tests not to rely on `/var/lib/openqa/share` mountpointResolved2020-08-31

Related to openQA Infrastructure - action #64941: after every reboot openqaworker7 is missing var-lib-openqa-share.mount , check dependencies of service with openqaworker1Resolved2020-03-272021-06-11

Is duplicate of openQA Infrastructure - action #64941: after every reboot openqaworker7 is missing var-lib-openqa-share.mount , check dependencies of service with openqaworker1Resolved2020-03-272021-06-11

History

#1 Updated by SLindoMansilla 12 months ago

  • Subject changed from openQA child task fails to download asset created by parent job to [y] openQA child task fails to download asset created by parent job

#2 Updated by riafarov 11 months ago

  • Project changed from openQA Tests to openQA Project
  • Subject changed from [y] openQA child task fails to download asset created by parent job to openQA child task fails to download asset created by parent job
  • Category deleted (Bugs in existing tests)

We have this sporadic issue on o3 once in a while, whereas generally it works. Could someone from the tools team take a look?

#3 Updated by SLindoMansilla 11 months ago

  • Subject changed from openQA child task fails to download asset created by parent job to [tools] openQA child task fails to download asset created by parent job

Without a [label], this ticket would appear again in the income pool

#4 Updated by mkittler 11 months ago

  • Project changed from openQA Project to openQA Infrastructure
  • Status changed from New to In Progress
  • Assignee set to mkittler
  • Target version set to Ready

I've been investigating the latest job which still shows the issue: https://openqa.opensuse.org/tests/1348359

  • The asset is present in the asset database and has not been cleaned up. It can be downloaded from https://openqa.opensuse.org/assets/other/01348026-autoinst.xml.
  • The URL from the Yast error message seems correct (JOBTOKEN matches the one from vars.json, asset type matches, asset name matches, IP and port are likely correct as well considering there was a response).

I'm not familiar with the relevant code. It looks like this kind of asset downloads it provided the os-autoinst's command server. This HTTP server would simply search for the asset in the local filesystem (local = worker machine). That's how the path is constructed: my $path = join '/', $bmwqemu::vars{ASSETDIR}, $self->param('assettype'), $self->param('assetname');

In this case it would likely be /var/lib/openqa/share/factory/other/01348026-autoinst.xml. Apparently /var/lib/openqa/share is an empty directory on openqaworker7 which is the worker producing the failing jobs. When I remember correctly, /var/lib/openqa/share is actually supposed to be an NFS mount.

So this is likely a configuration issue of one ore more o3 workers. I'll see how other workers are configured and try to restore the configuration.


It would actually be useful if the command server would log files missing on the filesystem so the exact path where the file is supposed to be located would be clear. Maybe that's also worth improving.

#5 Updated by mkittler 11 months ago

  • Status changed from In Progress to Feedback

The line for /var/lib/openqa/share was commented-out in /etc/fstab. I'm wondering who did this and why. I've added the line back and immediately restored the mount point via mount --types nfs4 --options ro,fsc --source openqa1-opensuse:/ --target /var/lib/openqa/share. The asset exists now under /var/lib/openqa/share/factory/other/01348026-autoinst.xml and the YaST download should work again.

I've also checked other workers but all had the NFS mount present.

#6 Updated by mkittler 11 months ago

  • Status changed from Feedback to Resolved

I guess the issue has been fixed. Just reopen the issue if the problem still occurs.

#7 Updated by mkittler 11 months ago

  • Related to action #70723: Fix tests not to rely on `/var/lib/openqa/share` mountpoint added

#8 Updated by dimstar 11 months ago

  • Status changed from Resolved to New

This seems to be a recurring issue - last seen in :
https://openqa.opensuse.org/tests/1377258#next_previous

The tests failing at 'installation' are usually due to this. So there are cases where it passes on.

#9 Updated by mkittler 11 months ago

It wasn't mounted again anymore - even though the line is not commented out in /etc/fstab anymore.

Not sure what's wrong. Isn't

openqa1-opensuse:/                         /var/lib/openqa/share   nfs4     ro,fsc                                      0  0

in /etc/fstab equivalent to mount --types nfs4 --options ro,fsc --source openqa1-opensuse:/ --target /var/lib/openqa/share? And on the other workers the line in /etc/fstab works as well.

#10 Updated by okurz 11 months ago

  • Related to action #64941: after every reboot openqaworker7 is missing var-lib-openqa-share.mount , check dependencies of service with openqaworker1 added

#11 Updated by okurz 11 months ago

#64941 is a problem about openqaworker7 having an unreliable mount point configuration which is causing this. #70723 is certainly a good approach to fix this in general. The specific issue about autoyast was introduced 4 months ago. In before we had no tests running on o3 that relied on any NFS mount. I thought I had even disabled the NFS mount on all workers as well.

mkittler as the mount configuration on openqaworker7 is unreliable and lost on every reboot anyway I suggest you revert your change and keep the mount point disabled in /etc/fstab unless you find a quick fix to keep it enabled temporarily.

The commit

commit 8ea13b8c3
Author: Rodion Iafarov <riafarov@suse.com>
Date:   Wed Apr 8 19:39:41 2020 +0200

    Fix assets handling for AY tests without modifications

    We have fixed problem with assets for the jobs where we inject something
    to the profile, but still have issue when we simply use profile from the
    parent job.

    See [poo#64961](https://progress.opensuse.org/issues/64961).

within os-autoinst-distri-opensuse brought back the reference to the local path in 2020-04 so I suggest for this specific ticket to go to the "[y]"-team again and code be changed to not rely on any local filesystem access of assets from an NFS mount point which is not available an all workers anyway.

#12 Updated by mkittler 11 months ago

Knowing about #64941 would have saved me some time. I would actually tend to mark this ticket as a duplicate of #64941.

I thought I had even disabled the NFS mount on all workers as well.

Well, I've actually checked each and every worker and they all still had the NFS mount point enabled (and hence considered openqaworker7 to be the odd one).

I suggest for this specific ticket to go to the "[y]"-team again and code be changed to not rely on any local filesystem access […]

As mentioned in #70723 I'm not sure whether it will be particularly easy for them so maybe it is better to fix this from our side.

#13 Updated by mkittler 11 months ago

  • Is duplicate of action #64941: after every reboot openqaworker7 is missing var-lib-openqa-share.mount , check dependencies of service with openqaworker1 added

#14 Updated by mkittler 11 months ago

  • Status changed from New to Resolved

Marked as duplicate of #64941 (which has target version future). I'll pick-up #70723 (which has target version ready) instead.

Also available in: Atom PDF