action #156394
closed[tools] some Automatic investigation jobs for job 13642310 only run part of the test modules NOT ALL: [is it by design?] size:M
0%
Description
Observation¶
Hello tools team experts, https://openqa.suse.de/tests/13642310#comments
Based on the test comments, Automatic investigation jobs for job 13642310 are passed, but NOT all test modules are scheduled.
The the failed job is failed at system_prepare
, however the Automatic investigation jobs stop at first_boot
Can you please help check? thanks!
Acceptance criteria¶
- AC1: openqa-investigate jobs run more or less the original test module schedule w/o publishing assets
- AC2: os-autoinst+openQA must continue to be ignorant of os-autoinst/scripts specifics
Suggestions¶
- Use a different variable value for the publish variables and adapt the handling in os-autoinst+openQA accordingly to accept a value like "none" to not impact the schedule but still not publish anything and/or adapt the test code accordingly
- Ensure that still nothing is published
Updated by tinita 9 months ago
- Status changed from New to In Progress
- Assignee set to tinita
I did a normal openqa-clone-job on that job on friday, and in that case the full job was run.
I don't have the job url anymore, as the osd crash got in the way.
So I will check if/what openqa-investigate does differently.
Updated by tinita 9 months ago · Edited
- Status changed from In Progress to Feedback
This happens because we empty vars that start with PUBLISH_
in the investigate jobs. See also #89281
The according test modules after first_boot are only scheduled when PUBLISH_HDD_*
is set.
We have to discuss what to do here.
Compare
https://openqa.suse.de/tests/13708860#details (with PUBLISH_HDD_1)
and https://openqa.suse.de/tests/13708906# (without PUBLISH_HDD_1)
Updated by tinita 9 months ago
- Related to action #89281: Prevent investigation jobs to do any asset uploads to prevent overriding production assets added
Updated by tinita 9 months ago
The line where this happens is:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/main_common.pm#L2575
load_create_hdd_tests if (get_var("STORE_HDD_1") || get_var("PUBLISH_HDD_1")) && !get_var('PUBLIC_CLOUD');
@rfan1 It is a bit unexpected that a test schedules certain modules depending on a PUBLISH_ variable. Do you think it would make sence to use a different variable for that?
However, there are more occurrences where something like this happens.
@okurz we thought that we could use the presence of the OPENQA_INVESTIGATE_ORIGIN variable for disabling all publishing
Updated by okurz 9 months ago
tinita wrote in #note-7:
@okurz we thought that we could use the presence of the OPENQA_INVESTIGATE_ORIGIN variable for disabling all publishing
Hm, so where is the removal of publish variables currently done and where would you read OPENQA_INVESTIGATE_ORIGIN to disable publishing? So far I assume only https://github.com/os-autoinst/scripts/ knows about "openqa-investigate", isn't it? What we should sustain is that os-autoinst and openQA do not know about os-autoinst/scripts and shouldn't need to know about it.
Updated by rfan1 9 months ago
tinita wrote in #note-7:
The line where this happens is:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/main_common.pm#L2575load_create_hdd_tests if (get_var("STORE_HDD_1") || get_var("PUBLISH_HDD_1")) && !get_var('PUBLIC_CLOUD');
@rfan1 It is a bit unexpected that a test schedules certain modules depending on a PUBLISH_ variable. Do you think it would make sence to use a different variable for that?
Thanks Tina!
IMO, it should be great if we can run the investigation jobs with "PUBLISH_*" to "none".
However, there are more occurrences where something like this happens.@okurz we thought that we could use the presence of the OPENQA_INVESTIGATE_ORIGIN variable for disabling all publishing
Updated by okurz 9 months ago
@rfan1 your quoting style is a bit broken marking it hard to distinguish who wrote what.
rfan1 wrote in #note-9:
[…]
IMO, it should be great if we can run the investigation jobs with "PUBLISH_*" to "none".
good idea, we can take a look into that.
However, there are more occurrences where something like this happens.
You mean other test scenarios that also have an incomplete schedule but not related to PUBLISH_*? If yes then please reference those scenarios as well.
@okurz we thought that we could use the presence of the OPENQA_INVESTIGATE_ORIGIN variable for disabling all publishing
Yes, but as I stated what we should sustain is that os-autoinst and openQA do not know about os-autoinst/scripts and shouldn't need to know about it. Meaning that also os-autoinst code must not use the variable OPENQA_INVESTIGATE_ORIGIN.
Updated by tinita 9 months ago
okurz wrote in #note-10:
However, there are more occurrences where something like this happens.
You mean other test scenarios that also have an incomplete schedule but not related to PUBLISH_*? If yes then please reference those scenarios as well.
That's a quote from me.
I mean that there are other occurrences in the code like loadtest(...) if get_var("PUBLISH_HDD_1")
Updated by tinita 9 months ago
I ran a job with both vars set to none
and at least it failed in the same way as the original job: https://openqa.suse.de/tests/13725407#details
Updated by okurz 9 months ago
- Subject changed from [tools] some Automatic investigation jobs for job 13642310 only run part of the test modules NOT ALL: [is it by design?] to [tools] some Automatic investigation jobs for job 13642310 only run part of the test modules NOT ALL: [is it by design?] size:M
- Description updated (diff)
Updated by ybonatakis 9 months ago
- Status changed from Workable to In Progress
- Assignee set to ybonatakis
Updated by ybonatakis 9 months ago
I took a look and i read the comments above. is this about modify the https://github.com/os-autoinst/scripts/pull/69/files#diff-f73cf39a07f6cf8cdb453862496919d06df16d07e58b274e68ea148dd1f7dae5R32 to set none
or whatever to the script? because from the meeting I thought it has to do something with openQA+os-autoinst
Updated by openqa_review 9 months ago
- Due date set to 2024-03-23
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ybonatakis 9 months ago
https://github.com/os-autoinst/scripts/pull/299
I guess this is one thing it takes to solve the problem with the scheduling.
Now, it should be something in openQA regarding the upload assets.
Updated by ybonatakis 9 months ago
I see that on https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker/Job.pm#L436 there is a check file. if it is not the whole upload, i think it is skipped. Can someone with more experience confirm that?
Updated by okurz 9 months ago
ybonatakis wrote in #note-21:
I see that on https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker/Job.pm#L436 there is a check file. if it is not the whole upload, i think it is skipped. Can someone with more experience confirm that?
I can confirm that the code silently skips assets specified that can not be uploaded. But I suggest you try it out by simply triggering an openQA job with PUBLISH_HDD_1=none
or similar and see what happens.
Updated by livdywan 9 months ago
Action points to try discussed in conversation:
- Clone a job relying on PUBLISH_HDD_1 as-is
- Also clone the same job with PUBLISH_HDD_1=none
- Make a one-line change in OpenQa/IsoTovideo/Utils.pm in os-autoinst to skip any files called none when compiling the list of assets
- See how 14-isotovideo.t behaves with different asset filenames and trivial changes to the logic
Updated by ybonatakis 9 months ago
- Status changed from In Progress to Feedback
https://github.com/os-autoinst/os-autoinst/pull/2470 with this simple line the test seems to not upload anything when none or None is used in PUBLISH_HDD_*
Updated by ybonatakis 9 months ago
changes on os-autoinst merged. there is still https://github.com/os-autoinst/scripts/pull/299 open
Updated by ybonatakis 9 months ago
- Status changed from Feedback to Resolved