action #152377
closedopenQA Tests (public) - action #134438: [qe-sap] missing 15-SP2 installation media on https://openqa.suse.de/assets/repo/
[tools] SLE-15-SP2 and SLE-15-SP3 install target medias for x86_64 in the ...../assets/repo are purged
Added by jkohoutek about 1 year ago. Updated 11 months ago.
0%
Description
The gnome_hana_nvdimm tests sporadically fail at boot_from_pxe because it cannot find the SLE-15-SP2 SLE-15-SP3 install media. However, those have existed for a long time at the assets/repo/fixed and should be always accessible.
Expected Behaviour:
- The boot_from_pxe module must always be able to reach the install media.
Updated by maritawerner about 1 year ago
- Subject changed from Include SLE-15-SP2 SLE-15-SP3 install target medias for x86_64 in the OSD assets repo to [qe-sap] Include SLE-15-SP2 SLE-15-SP3 install target medias for x86_64 in the OSD assets repo
Updated by jkohoutek about 1 year ago
- Subject changed from [qe-sap] Include SLE-15-SP2 SLE-15-SP3 install target medias for x86_64 in the OSD assets repo to Include SLE-15-SP2 SLE-15-SP3 install target medias for x86_64 in the OSD assets repo
Updated by jkohoutek 12 months ago
More details:
The targets are in the https://openqa.suse.de/assets/repo/fixed:
https://openqa.suse.de/assets/repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1
https://openqa.suse.de/assets/repo/fixed/SLE-15-SP3-Online-x86_64-GM-Media1
But, when/if symlinks from .../fixed to .../assets are created, they are gopng to be cleaned up somwhere in time.
Possible causue: Isn't there some WHITE-list of things which isn't supposed to be auto cleaned up from the assests?
Updated by okurz 12 months ago
- Assignee set to okurz
- Target version set to Ready
jkohoutek wrote in #note-4:
More details:
The targets are in the https://openqa.suse.de/assets/repo/fixed:
https://openqa.suse.de/assets/repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1
https://openqa.suse.de/assets/repo/fixed/SLE-15-SP3-Online-x86_64-GM-Media1But, when/if symlinks from .../fixed to .../assets are created, they are gopng to be cleaned up somwhere in time.
why do you need to symlink to …/assets when the repos are already in …/fixed? When looking up assets from openQA then normally both paths are looked into.
Possible causue: Isn't there some WHITE-list of things which isn't supposed to be auto cleaned up from the assests?
Well, all assets below …/fixed are the ones that are not cleaned up, see http://open.qa/docs/#_asset_handling . So can you reference openQA jobs that show problems when trying to access the assets?
Updated by apappas 12 months ago
- Subject changed from Include SLE-15-SP2 SLE-15-SP3 install target medias for x86_64 in the OSD assets repo to [tools] SLE-15-SP2 SLE-15-SP3 install media for x86_64 in the assets/repo/fixed directory go in and out of existence
- Description updated (diff)
- Assignee deleted (
okurz) - Target version deleted (
Ready)
Updated by okurz 11 months ago
- Due date set to 2024-01-23
- Status changed from New to Feedback
ok, thx. I am not sure if booting iPXE from an asset in "fixed/" was actually ever properly supported. I am asking in https://suse.slack.com/archives/C02CANHLANP/p1704812941555379
who has an idea about https://openqa.suse.de/tests/13198459#step/boot_from_pxe/6 https://progress.opensuse.org/issues/152377 trying to boot assets in the "fixed/" directory over iPXE?
maybe somebody else has a clue.
Updated by okurz 11 months ago
(Antonios Pappas) Having the installation media hosted on OSD is not a hard requirement. The requirement is that the ipmi workers with nvdimm memory are able to reach the installation media
Using REPO_0=https://download.suse.de/install/SLP/SLE-15-SP2-Full-GM/x86_64/DVD1/ allows the test to run problem free
Would you think that switching all jobs to this solution is better? all NVDIMM jobs not every single job
(Oliver Kurz) Use 2., definitely. openQA assets are to be used to ensure content does not change, like builds of SLE15-SP6 but something like SLE-15-SP2-Full-GM must never change on download.suse.de so, yeah, use that
Updated by okurz 11 months ago
- Due date set to 2024-01-23
- Status changed from In Progress to Feedback
jkohoutek wrote in #note-12:
No, it's not solved until we find the root cause and what happening there.
- Why there are all other images linekd in Assests?
What do you mean by that? Do you mean why does e.g. https://openqa.suse.de/tests/13198459#downloads mention https://openqa.suse.de/tests/13198459/asset/iso/SLE-15-SP2-Installer-DVD-x86_64-GM-DVD1.iso ? That would be because this file is simply linked in the job settings with "ISO=SLE-15-SP2-Installer-DVD-x86_64-GM-DVD1.iso"
- Why this 2 still got purged from there?
repositories are never listed on assets but only individual files
- Where is documented the process of the purge?
As explained in #152377-5 that's in http://open.qa/docs/#_asset_handling
Updated by jkohoutek 11 months ago
okurz wrote in #note-14:
jkohoutek wrote in #note-12:
No, it's not solved until we find the root cause and what happening there.
- Why there are all other images linekd in Assests?
What do you mean by that? Do you mean why does e.g. https://openqa.suse.de/tests/13198459#downloads mention https://openqa.suse.de/tests/13198459/asset/iso/SLE-15-SP2-Installer-DVD-x86_64-GM-DVD1.iso ? That would be because this file is simply linked in the job settings with "ISO=SLE-15-SP2-Installer-DVD-x86_64-GM-DVD1.iso"
We are not talking abou ISOs here, but about the install targets.
More specific question: Why those persist; beside a huge list of others; https://openqa.suse.de/assets/repo/SLE-15-SP2-Full-ppc64le-GM-Media1/ and https://openqa.suse.de/assets/repo/SLE-15-SP2-Full-s390x-GM-Media1/ but when x86_64 is linked there same way as https://openqa.suse.de/assets/repo/SLE-15-SP2-Online-x86_64-GM-Media1/ it gets purged?
- Why this 2 still got purged from there?
repositories are never listed on assets but only individual files
I don't follow you here, I see a really huge load of the install targets there: https://openqa.suse.de/assets/repo/
- Where is documented the process of the purge?
As explained in #152377-5 that's in http://open.qa/docs/#_asset_handling
Updated by okurz 11 months ago · Edited
jkohoutek wrote in #note-15:
okurz wrote in #note-14:
jkohoutek wrote in #note-12:
No, it's not solved until we find the root cause and what happening there.
- Why there are all other images linekd in Assests?
What do you mean by that? Do you mean why does e.g. https://openqa.suse.de/tests/13198459#downloads mention https://openqa.suse.de/tests/13198459/asset/iso/SLE-15-SP2-Installer-DVD-x86_64-GM-DVD1.iso ? That would be because this file is simply linked in the job settings with "ISO=SLE-15-SP2-Installer-DVD-x86_64-GM-DVD1.iso"
We are not talking abou ISOs here, but about the install targets.
More specific question: Why those persist; beside a huge list of others; https://openqa.suse.de/assets/repo/SLE-15-SP2-Full-ppc64le-GM-Media1/ and https://openqa.suse.de/assets/repo/SLE-15-SP2-Full-s390x-GM-Media1/ but when x86_64 is linked there same way as https://openqa.suse.de/assets/repo/SLE-15-SP2-Online-x86_64-GM-Media1/ it gets purged?
To answer that I looked into logs on OSD and found
openqa:/var/log # zgrep 'SLE-15-SP2-Full-s390x-GM-Media1' openqa_gru*
openqa_gru:[2024-01-16T16:00:02.598510+01:00] [info] [pid:21626] Registering asset repo/SLE-15-SP2-Full-s390x-GM-Media1
openqa_gru: name => "repo/SLE-15-SP2-Full-s390x-GM-Media1",
openqa_gru:[2024-01-16T16:00:58.627253+01:00] [info] [pid:21626] Removing asset repo/SLE-15-SP2-Full-s390x-GM-Media1 (not in any group, age (1219 days) exceeds limit (7 days)
openqa_gru:[2024-01-16T16:00:58.630012+01:00] [info] [pid:21626] GRU: removed /var/lib/openqa/share/factory/repo/SLE-15-SP2-Full-s390x-GM-Media1
…
openqa_gru.9.xz:[2024-01-16T07:00:02.026605+01:00] [info] [pid:14452] Registering asset repo/SLE-15-SP2-Full-s390x-GM-Media1
openqa_gru.9.xz: name => "repo/SLE-15-SP2-Full-s390x-GM-Media1",
openqa_gru.9.xz:[2024-01-16T07:00:44.303850+01:00] [info] [pid:14452] Removing asset repo/SLE-15-SP2-Full-s390x-GM-Media1 (not in any group, age (1219 days) exceeds limit (7 days)
openqa_gru.9.xz:[2024-01-16T07:00:44.306424+01:00] [info] [pid:14452] GRU: removed /var/lib/openqa/share/factory/repo/SLE-15-SP2-Full-s390x-GM-Media1
so at least openQA does state that it would remove SLE-15-SP2-Full-s390x-GM-Media1 although the symlink seems to persist:
openqa:/var/log # ls -l /var/lib/openqa/share/factory/repo/ | grep SLE-15-SP2-Full
lrwxrwxrwx 1 geekotest nogroup 47 Aug 19 2020 SLE-15-SP2-Full-ppc64le-GM-Media1 -> fixed/SLE-15-SP2-Full-ppc64le-Build209.2-Media1
lrwxrwxrwx 1 geekotest nogroup 38 Jan 16 16:45 SLE-15-SP2-Full-s390x-GM-Media1 -> fixed/SLE-15-SP2-Full-s390x-GM-Media1/
For ppc64le it looks different though and the logfiles do not mention that symlink at all, possibly because openQA jobs only silently reference it so that symlink might never get known by openQA hence never removed?
For x86_64 I see no reference to removing any repo links:
openqa:/var/log # zgrep 'SLE-15-SP2-Online-x86_64-GM-Media1' openqa_gru* | grep -v '\<iso\>'
openqa_gru: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru-20240110.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru-20240111.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru-20240112.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru-20240113.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru-20240114.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru-20240115.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru-20240116.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.1.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.10.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.11.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.12.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.13.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.14.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.15.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.16.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.17.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.18.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.19.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.2.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.2.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.20.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.3.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.4.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.5.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.6.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.7.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.8.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
openqa_gru.9.xz: name => "repo/fixed/SLE-15-SP2-Online-x86_64-GM-Media1",
jkohoutek do you have a further idea how to proceed?
- Why this 2 still got purged from there?
repositories are never listed on assets but only individual files
I don't follow you here, I see a really huge load of the install targets there: https://openqa.suse.de/assets/repo/
Sorry, I meant they are never listed as assets on the job details page but openQA properly tracks them as assets.
- Where is documented the process of the purge?
As explained in #152377-5 that's in http://open.qa/docs/#_asset_handling
Updated by jkohoutek 11 months ago
okurz wrote in #note-16:
so at least openQA does state that it would remove SLE-15-SP2-Full-s390x-GM-Media1 although the symlink seems to persist:
openqa:/var/log # ls -l /var/lib/openqa/share/factory/repo/ | grep SLE-15-SP2-Full lrwxrwxrwx 1 geekotest nogroup 47 Aug 19 2020 SLE-15-SP2-Full-ppc64le-GM-Media1 -> fixed/SLE-15-SP2-Full-ppc64le-Build209.2-Media1 lrwxrwxrwx 1 geekotest nogroup 38 Jan 16 16:45 SLE-15-SP2-Full-s390x-GM-Media1 -> fixed/SLE-15-SP2-Full-s390x-GM-Media1/
For ppc64le it looks different though and the logfiles do not mention that symlink at all, possibly because openQA jobs only silently reference it so that symlink might never get known by openQA hence never removed?
For x86_64 I see no reference to removing any repo links:
Because we stop futile creating of them
jkohoutek do you have a further idea how to proceed?
From the behavior you find in the logs I even more think that there is white list of what shouldn't be removed somewhere and x86_64 for 15-SP2 and SP3 should be added to it.
Also, what concerns me from your finds is behavior around the s390 and the ppc64 ones:
s390 is not deleted, even when it should be and log even state it was removed - maybe the white list I mentioned before?
ppc64 is even stranger, it just lie there
Updated by mkittler 11 months ago · Edited
I've just briefly read over comments so maybe I'm missing something.
s390 is not deleted, even when it should be and log even state it was removed - maybe the white list I mentioned before?
What white-list? I don't think we have a white-list for preserving certain assets. Note that the deletion of assets can fail due to missing permissions/ownership or someone might have put it back again. Additionally, the conditions under which assets are removed are not totally trivial. So maybe the asset just hasn't expired yet. Or the cleanup task hasn't run (or hasn't reached the point where it would delete that asset). If you want me to investigate this nevertheless, please state the exact name of the asset and why you think it should have been removed.
When symlinking assets from the fixed directory (or basically from anywhere) to the regular assets directory the symlinked asset is subject to the normal cleanup which is not a bug. In case of directories symlinks are apparently followed. Maybe that's something we should not do. It would likely be trivial to implement not following symlinks. Of course that would only fix half of the problem. At least when I understand your use case correctly you want even the symlink to stay. So I guess we needed to preserve the symlink if it points to the fixed directory and it is not dangling.
By the way, on OSD the following assets are affected:
martchus@openqa:/assets/factory> find repo/ -maxdepth 1 -type l -exec readlink {} \; | grep fixed/
fixed/SLE-12-SP3-Server-DVD-s390x-GM-DVD1/
fixed/SLE-12-SP4-Server-DVD-s390x-GM-DVD1/
fixed/SLE-12-SP?-SDK-POOL-aarch64-BuildGM-Media1/
fixed/SLE-12-SP?-SDK-POOL-ppc64le-BuildGM-Media1/
fixed/SLE-12-SP?-SDK-POOL-s390x-BuildGM-Media1/
fixed/SLE-12-SP?-SDK-POOL-x86_64-BuildGM-Media1/
fixed/SLE-12-Server-DVD-s390x-GM-DVD1/
fixed/SLE-15-Installer-DVD-s390x-GM-DVD1/
fixed/SLE-15-SP1-Installer-DVD-s390x-GM-DVD1/
fixed/SLE-15-SP1-Installer-DVD-s390x-QU4-Media1
fixed/SLE-15-SP2-Full-ppc64le-Build209.2-Media1
fixed/SLE-15-SP2-Installer-DVD-s390x-GM-DVD1/
fixed/SLE-15-SP2-Online-ppc64le-GM-Media1/
fixed/SLE-15-SP3-Full-s390x-GM-Media1/
fixed/SLE-15-SP3-Full-s390x-GM-Media1/
fixed/SLE-15-SP4-Full-s390x-GM-Media1/
fixed/SLE-15-SP4-Online-x86_64-GM-Media1/
fixed/SLE-15-SP5-Online-x86_64-GM-Media1/
fixed/SLE-15-SP3-Online-aarch64-GM-Media1/
fixed/SLE-15-SP5-Full-ppc64le-GM-Media1/
fixed/SLE-15-SP4-Online-aarch64-GM-Media1/
fixed/SLE-12-SP5-Server-DVD-s390x-GM-DVD1/
fixed/SLE-15-SP4-Online-ppc64le-GM-Media1/
fixed/SLE-15-SP3-Online-x86_64-GM-Media1/
fixed/SLE-15-SP4-Online-s390x-GM-Media1/
fixed/SLE-15-SP2-Full-s390x-GM-Media1/
fixed/SLE-15-SP2-Online-s390x-GM-Media1/
fixed/SLE-15-SP3-Online-s390x-GM-Media1/
fixed/SLE-15-SP5-Online-s390x-GM-Media1/
Updated by mkittler 11 months ago · Edited
Would something like that help? https://github.com/os-autoinst/openQA/pull/5428
(Of course that change would only work for relative symlinks as-is and still needed testing.)
It also looks like our cleanup code would preserve dangling symlinks forever (as it skips assets which don't exist). I haven't tested that but it might also explain why some assets are not deleted.
Updated by okurz 11 months ago
- Status changed from Feedback to In Progress
mkittler wrote in #note-19:
In case of directories symlinks are apparently followed. Maybe that's something we should not do. It would likely be trivial to implement not following symlinks.
Yes, I guess that's a safe choice so that openQA only ever attempts to delete content in share/factory/repo directly and not any other repository or the fixed sub-directories. Of course given proper permissions any assets pointed to outside the openQA project directory should not be possible to delete by openQA anyway.
So mkittler can you do the change to only remove symlinks but not follow symlinks?
Of course that would only fix half of the problem. At least when I understand your use case correctly you want even the symlink to stay. So I guess we needed to preserve the symlink if it points to the fixed directory and it is not dangling.
that sounds like something we shouldn't need to change when the use of fixed assets directly works. As the original use case was resolved by using assets from download.suse.de IMHO as should be done I assume this change would not be needed here.
Updated by okurz 11 months ago
- Assignee changed from okurz to mkittler
ok, so assigning to mkittler for now to work on https://github.com/os-autoinst/openQA/pull/5428 and other potentially related improvements in our code.
Updated by mkittler 11 months ago
I've just updated https://github.com/os-autoinst/openQA/pull/5428 so have a look and give feedback whether it'll help in your case.
Updated by apappas 11 months ago
- Status changed from Feedback to In Progress
This is still failing https://openqa.suse.de/tests/13311194#step/boot_from_pxe/7
The output is a bit clobbered but AFAICT from the video, everything was entered correctly before the SUT responded.
Updated by okurz 11 months ago
apappas wrote in #note-25:
This is still failing https://openqa.suse.de/tests/13311194#step/boot_from_pxe/7
The output is a bit clobbered but AFAICT from the video, everything was entered correctly before the SUT responded.
The problem here is that "SLE-15-SP1-Installer-DVD-x86_64-GM-Media1" does not exist on OSD.
openqa:/var/log # zgrep 'SLE-15-SP1-Installer-DVD-x86_64-GM-Media1' openqa_gru*
shows no match so I assume no one ever recreated it there in place. And to remind what was discussed in https://suse.slack.com/archives/C02CANHLANP/p1704812941555379
(Oliver Kurz) who has an idea about https://openqa.suse.de/tests/13198459#step/boot_from_pxe/6 https://progress.opensuse.org/issues/152377 trying to boot assets in the "fixed/" directory over iPXE?
(Antonios Pappas) Since this is not officially supported, I tried using the download.suse.de sources and it works. 1. Having the installation media hosted on OSD is not a hard requirement. The requirement is that the ipmi workers with nvdimm memory are able to reach the installation media
- Using REPO_0=https://download.suse.de/install/SLP/SLE-15-SP2-Full-GM/x86_64/DVD1/ allows the test to run problem free Would you think that switching all jobs to this solution is better? all NVDIMM jobs not every single job (Oliver Kurz) Use 2., definitely. openQA assets are to be used to ensure content does not change, like builds of SLE15-SP6 but something like SLE-15-SP2-Full-GM must never change on download.suse.de so, yeah, use that
So your step 2. needs to be followed. I suggest you create a separate ticket within the scope of qe-sap for that and follow-up there. Here we can then focus on the inconsistencies in asset handling and cleanup.
Updated by okurz 11 months ago
- Due date changed from 2024-01-23 to 2024-02-02
@mkittler as discussed in the unblock please extend the documentation regarding handling of symlinks incorporating your latest changes. Also consider reading the complete documentation parts covering cleanup starting with http://open.qa/docs/#basic_cleanup . After that we can crosscheck if OSD behaves as we expect it.
Updated by mkittler 11 months ago · Edited
- Status changed from In Progress to Feedback
PR for extending the documentation: https://github.com/os-autoinst/openQA/pull/5436
… so I assume no one ever recreated it there in place
That's very likely so I suggest you recreate the asset and its symlink. Then we'll see whether it is no preserved (it should not be preserved as my latest changes have been deployed).
Updated by okurz 11 months ago
- Due date deleted (
2024-02-02) - Status changed from Feedback to Resolved
https://github.com/os-autoinst/openQA/pull/5436 merged.
With that we have the asset cleanup made more consistent. We do not plan further changes right now. And as explained in #152377-26 we consider re-creating any assets that should/could be used from dist.suse.de directly anyway as out of scope for us.
If you still see open points where we can help or which we overlooked feel welcome to reopen and tell us explicitly what you expect.