Project

General

Profile

Actions

action #151138

closed

[openQA][aarch64][media] 15-SP6 Build39.1 media for aarch64 cleaned up while suse.asia instance was still using it size:S

Added by waynechen55 6 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Support
Target version:
Start date:
2023-11-20
Due date:
2023-12-05
% Done:

0%

Estimated time:

Description

Observation

Test run failed because media can not be found, https://openqa.suse.de/assets/repo/SLE-15-SP6-Full-aarch64-Build39.1-Media1/.

Steps to reproduce

  • Check the latest 15-SP6 media Build39.1

Impact

All aarch64 test run failed, for example:
test1
test2
test3

Problem

15-SP6 Build39.1 aarch64 full media does not exist on OSD:
https://openqa.suse.de/assets/repo/SLE-15-SP6-Full-aarch64-Build39.1-Media1/

Suggestions

  • Check media availability from https://openqa.suse.de/admin/assets as well as locally under /var/lib/openqa/share/factory/repo/SLE-15-SP6-Full-aarch64-Build39.1-Media1/ and other builds for the same product
  • Check journal of openqa-gru service for cleanup of "SLE-15-SP6-Full-aarch64-Build39.1-Media1"
  • Check available storage on OSD. Maybe the issue is just a symptom of job groups not configured big enough if we have more available space?
  • Ensure the reporter and we understand that relying on openQA assets from any external service is a bad idea

Workaround

n/a

Out of scope

Fix openqa.qa2.suse.asia to always have access to such assets from OSD

Actions #1

Updated by waynechen55 6 months ago

  • Priority changed from Normal to Urgent
Actions #2

Updated by waynechen55 6 months ago

aarch64 Beta1 testing can not be done without this media.

Actions #3

Updated by okurz 6 months ago

  • Project changed from openQA Infrastructure to openQA Project
  • Category set to Support
  • Target version set to Ready
Actions #4

Updated by tinita 6 months ago

@waynechen55 your links are pointing to openqa.qa2.suse.asia, e.g. http://openqa.qa2.suse.asia/tests/65261#step/guest_installation_run/21
And your links to the assets are pointing to openqa.suse.de.
We're not sure how that is connected to openqa.suse.de.
Btw, http://openqa.qa2.suse.asia/changelog was last updated on Sep 11.

Actions #5

Updated by livdywan 6 months ago

  • Status changed from New to In Progress
  • Assignee set to livdywan

I'll take the ticket and try and confirm what's expected here

Actions #6

Updated by waynechen55 6 months ago

The test suite run on openqa.qa2.suse.asia, but it uses full media on OSD for each new build. So for Build39.1, media https://openqa.suse.de/assets/repo/SLE-15-SP6-Full-aarch64-Build39.1-Media1/ is needed. I do not think this is news, because it is always like this even dated back to 15-SPx.

Actions #7

Updated by waynechen55 6 months ago

tinita wrote in #note-4:

@waynechen55 your links are pointing to openqa.qa2.suse.asia, e.g. http://openqa.qa2.suse.asia/tests/65261#step/guest_installation_run/21
And your links to the assets are pointing to openqa.suse.de.
We're not sure how that is connected to openqa.suse.de.
Btw, http://openqa.qa2.suse.asia/changelog was last updated on Sep 11.

The test suite run on openqa.qa2.suse.asia, but it uses full media on OSD for each new build. So for Build39.1, media https://openqa.suse.de/assets/repo/SLE-15-SP6-Full-aarch64-Build39.1-Media1/ is needed. I do not think this is news, because it is always like this even dated back to 15-SPx.

Actions #8

Updated by livdywan 6 months ago

waynechen55 wrote in #note-7:

tinita wrote in #note-4:

@waynechen55 your links are pointing to openqa.qa2.suse.asia, e.g. http://openqa.qa2.suse.asia/tests/65261#step/guest_installation_run/21
And your links to the assets are pointing to openqa.suse.de.
We're not sure how that is connected to openqa.suse.de.
Btw, http://openqa.qa2.suse.asia/changelog was last updated on Sep 11.

The test suite run on openqa.qa2.suse.asia, but it uses full media on OSD for each new build. So for Build39.1, media https://openqa.suse.de/assets/repo/SLE-15-SP6-Full-aarch64-Build39.1-Media1/ is needed. I do not think this is news, because it is always like this even dated back to 15-SPx.

I'll re-phrase this a little. We as a team aren't aware of this setup and it looks like just another instance we don't maintain. So if you need help we need to know how it's supposed to work.

So I suggest

  1. Please ensure the instance is updated to ensure it's not affected by known bugs
  2. Clarify how assets are synched from osd

Then we confirm if there's an openQA bug here, or what the issue with the asset handling is.

Actions #10

Updated by waynechen55 6 months ago · Edited

livdywan wrote in #note-8:

waynechen55 wrote in #note-7:

tinita wrote in #note-4:

@waynechen55 your links are pointing to openqa.qa2.suse.asia, e.g. http://openqa.qa2.suse.asia/tests/65261#step/guest_installation_run/21
And your links to the assets are pointing to openqa.suse.de.
We're not sure how that is connected to openqa.suse.de.
Btw, http://openqa.qa2.suse.asia/changelog was last updated on Sep 11.

The test suite run on openqa.qa2.suse.asia, but it uses full media on OSD for each new build. So for Build39.1, media https://openqa.suse.de/assets/repo/SLE-15-SP6-Full-aarch64-Build39.1-Media1/ is needed. I do not think this is news, because it is always like this even dated back to 15-SPx.

I'll re-phrase this a little. We as a team aren't aware of this setup and it looks like just another instance we don't maintain. So if you need help we need to know how it's supposed to work.

So I suggest

  1. Please ensure the instance is updated to ensure it's not affected by known bugs
  2. Clarify how assets are synched from osd

Then we confirm if there's an openQA bug here, or what the issue with the asset handling is.

  1. There is an openQA instance in Beijing, http://openqa.qa2.suse.asia/, which is maintained locally. So there is regular update.
  2. Take this test http://openqa.qa2.suse.asia/tests/65261 as an example. It runs on openqa.qa2.suse.aisa and it installs virtual machine at the final step. The installation uses full media for Build39.1, namely https://openqa.suse.de/assets/repo/SLE-15-SP6-Full-aarch64-Build39.1-Media1 on OSD. But the media is empty, which leads to test run failure. So my request to make sure the mounted media is not empty.
  3. Actually, test run with the last Build37.1 went very well. The corresponding media https://openqa.suse.de/assets/repo/SLE-15-SP6-Full-aarch64-Build37.1-Media1/ exists on OSD. But corresponding media for Build39.1 is empty.
  4. So if the Build39.1 media is empty, it may indicate a OSD problem with regard to storage or syncing which needs to be solved by you or someone else.
Actions #11

Updated by openqa_review 6 months ago

  • Due date set to 2023-12-05

Setting due date based on mean cycle time of SUSE QE Tools

Actions #12

Updated by waynechen55 6 months ago

Can you help fix this asap ? @livdywan If not, what prevents you from proceeding ?

Actions #13

Updated by livdywan 6 months ago

waynechen55 wrote in #note-12:

Can you help fix this asap ? @livdywan If not, what prevents you from proceeding ?

Maybe you overlooked my question. I'm not clear how the sync happens. If OSD is not aware of those jobs you're running it might be cleaning up the assets because I don't think we have any logic that checks what is being used on other instances.

A possible work-around could be to increase storage limits in affected groups. Or maybe you need to ensure the assets are always present on the same instance. I don't know why or if that's not already the case.

Actions #14

Updated by waynechen55 6 months ago · Edited

livdywan wrote in #note-13:

waynechen55 wrote in #note-12:

Can you help fix this asap ? @livdywan If not, what prevents you from proceeding ?

Maybe you overlooked my question. I'm not clear how the sync happens. If OSD is not aware of those jobs you're running it might be cleaning up the assets because I don't think we have any logic that checks what is being used on other instances.

A possible work-around could be to increase storage limits in affected groups. Or maybe you need to ensure the assets are always present on the same instance. I don't know why or if that's not already the case.

  • Because the Build39.1 just came out not long ago and those test suites on Beijing openQA are triggered hours later after its delivery, so I think OSD just cleans it up due to some other reasons, for example, storage limit.
  • Whether OSD is aware of the Beijing openQA instance or not, it will not lead to this issue, because the elapsed time after Build39.1 delivery but before I spotted the issue is too short. I do not think it is long enough for OSD to clean up the asset. It more looks like other reasons, for example, storage limit. Maybe you can tell me how long OSD will wait before clean asset up if it is not being used.
  • I am not very sure about the "sync" between OSD and Beijing openQA. If what I said above in the second item is true, then it is not "sync" issue anyway. If it is not the case, I will try to confirm whether there is "sync" which may indicate a bug in OSD. So back to the above item in the first place, do think it is true ?
  • Additionally, Build37.1 still exists and not cleaned up by OSD and test run with Build37.1 had already finished almost week ago. But newer Build39.1 was cleaned up, so it looks more obvious that it has nothing to do with "sync". OSD must clean newer Build39.1 for some other reasons.
Actions #15

Updated by okurz 6 months ago

  • Subject changed from [openQA][aarch64][media] 15-SP6 Build39.1 full media does not exist for aarch64 to [openQA][aarch64][media] 15-SP6 Build39.1 full media does not exist for aarch64 size:S
  • Description updated (diff)
Actions #16

Updated by waynechen55 6 months ago · Edited

I recalled that I had ever opened a similar issue #119215 which was fixed and resolved by @okurz. It is also about full media for aarch64. You @livdywan can also have a look. It might be helpful.

Actions #17

Updated by livdywan 6 months ago

waynechen55 wrote in #note-16:

I recalled that I had ever opened a similar issue #119215 which was fixed and resolved by @okurz. It is also about full media for aarch64. You @livdywan can also have a look. It might be helpful.

So I just took a look at the product log and it shows 4 days ago geekotest scheduled SLE 15-SP6 Full aarch64 39.1 SLE-15-SP6-Full-aarch64-Build39.1-Media1.iso. The first scheduled job is https://openqa.suse.de/tests/12836935, which ran successfully and shows https://openqa.suse.de/tests/12836935/asset/iso/SLE-15-SP6-Full-aarch64-Build39.1-Media1.iso ~10 hours earlier than the failing jobs you linked. The SLE 15 job group currently has no size limit, the YAST job group has 240GB. The used size looks to be identical to the limit so I would assume things have been cleaned up too fast. As I mentioned before, short of having any sort of sync you can increase the limit.

I hope that's clearer. I don't know that I can suggest much else at this point.

Actions #18

Updated by waynechen55 5 months ago

@livdywan When you think we can have this full media Build39.1 back ? Tomorrow or next week ?

Actions #19

Updated by livdywan 5 months ago

waynechen55 wrote in #note-18:

@livdywan When you think we can have this full media Build39.1 back ? Tomorrow or next week ?

Publishing images again is outside of the Tools team's scope. I'm happy to be part of a public Slack thread, but otherwise don't know what to offer besides the points we discussed.

Actions #20

Updated by waynechen55 5 months ago

Hope there is more robust persistent solution instead of increasing storage limit every time and praying for next build.

Actions #21

Updated by waynechen55 5 months ago

By the way, the iso media is still there https://openqa.suse.de/assets/iso/SLE-15-SP6-Full-aarch64-Build39.1-Media1.iso. What got cleaned up is mounted repo https://openqa.suse.de/assets/repo/SLE-15-SP6-Full-aarch64-Build39.1-Media1/. I was wondering maybe you can just mounted the iso to folder again to help solve the problem. @livdywan

Actions #22

Updated by okurz 5 months ago

waynechen55 wrote in #note-20:

Hope there is more robust persistent solution instead of increasing storage limit every time and praying for next build.

Yes. A more robust solution would be to not rely on assets from OSD unless for tests that exclusively run on OSD. openQA needs to clean up assets because we have to work with the space we have. We can't keep assets for a potential external system as openQA does not know when the assets wouldn't be needed anymore. The best approach would be to sync over assets from IBS to any openQA instance that uses assets

waynechen55 wrote in #note-21:

By the way, the iso media is still there https://openqa.suse.de/assets/iso/SLE-15-SP6-Full-aarch64-Build39.1-Media1.iso. What got cleaned up is mounted repo https://openqa.suse.de/assets/repo/SLE-15-SP6-Full-aarch64-Build39.1-Media1/. I was wondering maybe you can just mounted the iso to folder again to help solve the problem. @livdywan

That's not a "mounted repo", it's what was synced over from IBS or directly extracted from the ISO. If 39.1 is still the latest build published on IBS then you could re-execute the sync calls that are also visible on the openQA webUI. Compare to https://openqa.opensuse.org/admin/obs_rsync

Actions #23

Updated by waynechen55 5 months ago

To my understanding, cleanup should clean the oldest asset up in the first place instead the latest one. But in this case, it cleaned the latest Build39.1 up instead any older ones.

Actions #24

Updated by livdywan 5 months ago

  • Priority changed from Urgent to Normal

waynechen55 wrote in #note-23:

To my understanding, cleanup should clean the oldest asset up in the first place instead the latest one. But in this case, it cleaned the latest Build39.1 up instead any older ones.

That can happen if older ones are still used by jobs.

I'm lowering the priority since we've discussed how openQA handles assets and it works as expected.

@waynechen55 Do you find that our docs on asset clean-up are missing anything we discussed here?

Actions #25

Updated by livdywan 5 months ago

  • Subject changed from [openQA][aarch64][media] 15-SP6 Build39.1 full media does not exist for aarch64 size:S to [openQA][aarch64][media] 15-SP6 Build39.1 media for aarch64 cleaned up while suse.asia instance was still using it size:S
  • Status changed from In Progress to Feedback

I'm also clarifying the title. Since the file was not missing. It was cleaned up.

Actions #26

Updated by livdywan 5 months ago

  • Status changed from Feedback to Resolved

I will consider this done on our end. Feel free to file a follow-up ticket, for example "Extend docs regarding asset handling" or simply a draft PR to extend the docs yourself.

Actions #27

Updated by waynechen55 5 months ago

Actions

Also available in: Atom PDF