Project

General

Profile

Actions

action #156394

closed

[tools] some Automatic investigation jobs for job 13642310 only run part of the test modules NOT ALL: [is it by design?] size:M

Added by rfan1 about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-03-01
Due date:
2024-03-23
% Done:

0%

Estimated time:

Description

Observation

Hello tools team experts, https://openqa.suse.de/tests/13642310#comments

Based on the test comments, Automatic investigation jobs for job 13642310 are passed, but NOT all test modules are scheduled.

The the failed job is failed at system_prepare, however the Automatic investigation jobs stop at first_boot

Can you please help check? thanks!

Acceptance criteria

  • AC1: openqa-investigate jobs run more or less the original test module schedule w/o publishing assets
  • AC2: os-autoinst+openQA must continue to be ignorant of os-autoinst/scripts specifics

Suggestions

  • Use a different variable value for the publish variables and adapt the handling in os-autoinst+openQA accordingly to accept a value like "none" to not impact the schedule but still not publish anything and/or adapt the test code accordingly
  • Ensure that still nothing is published

Related issues 1 (0 open1 closed)

Related to openQA Project - action #89281: Prevent investigation jobs to do any asset uploads to prevent overriding production assetsResolvedmkittler2021-03-01

Actions
Actions #1

Updated by okurz about 2 months ago

  • Category set to Bugs in existing tests
  • Priority changed from Normal to High
  • Target version set to Ready
Actions #2

Updated by szarate about 2 months ago

  • Project changed from openQA Tests to openQA Project
  • Category changed from Bugs in existing tests to Regressions/Crashes
Actions #3

Updated by rfan1 about 2 months ago

  • Description updated (diff)
Actions #4

Updated by tinita about 2 months ago

  • Status changed from New to In Progress
  • Assignee set to tinita

I did a normal openqa-clone-job on that job on friday, and in that case the full job was run.
I don't have the job url anymore, as the osd crash got in the way.
So I will check if/what openqa-investigate does differently.

Actions #5

Updated by tinita about 2 months ago · Edited

  • Status changed from In Progress to Feedback

This happens because we empty vars that start with PUBLISH_ in the investigate jobs. See also #89281
The according test modules after first_boot are only scheduled when PUBLISH_HDD_* is set.
We have to discuss what to do here.
Compare
https://openqa.suse.de/tests/13708860#details (with PUBLISH_HDD_1)
and https://openqa.suse.de/tests/13708906# (without PUBLISH_HDD_1)

Actions #6

Updated by tinita about 2 months ago

  • Related to action #89281: Prevent investigation jobs to do any asset uploads to prevent overriding production assets added
Actions #7

Updated by tinita about 2 months ago

The line where this happens is:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/main_common.pm#L2575

    load_create_hdd_tests if (get_var("STORE_HDD_1") || get_var("PUBLISH_HDD_1")) && !get_var('PUBLIC_CLOUD');

@rfan1 It is a bit unexpected that a test schedules certain modules depending on a PUBLISH_ variable. Do you think it would make sence to use a different variable for that?
However, there are more occurrences where something like this happens.

@okurz we thought that we could use the presence of the OPENQA_INVESTIGATE_ORIGIN variable for disabling all publishing

Actions #8

Updated by okurz about 2 months ago

tinita wrote in #note-7:

@okurz we thought that we could use the presence of the OPENQA_INVESTIGATE_ORIGIN variable for disabling all publishing

Hm, so where is the removal of publish variables currently done and where would you read OPENQA_INVESTIGATE_ORIGIN to disable publishing? So far I assume only https://github.com/os-autoinst/scripts/ knows about "openqa-investigate", isn't it? What we should sustain is that os-autoinst and openQA do not know about os-autoinst/scripts and shouldn't need to know about it.

Actions #9

Updated by rfan1 about 2 months ago

tinita wrote in #note-7:

The line where this happens is:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/main_common.pm#L2575

    load_create_hdd_tests if (get_var("STORE_HDD_1") || get_var("PUBLISH_HDD_1")) && !get_var('PUBLIC_CLOUD');

@rfan1 It is a bit unexpected that a test schedules certain modules depending on a PUBLISH_ variable. Do you think it would make sence to use a different variable for that?
Thanks Tina!
IMO, it should be great if we can run the investigation jobs with "PUBLISH_*" to "none".
However, there are more occurrences where something like this happens.

@okurz we thought that we could use the presence of the OPENQA_INVESTIGATE_ORIGIN variable for disabling all publishing

Actions #10

Updated by okurz about 2 months ago

@rfan1 your quoting style is a bit broken marking it hard to distinguish who wrote what.

rfan1 wrote in #note-9:

[…]
IMO, it should be great if we can run the investigation jobs with "PUBLISH_*" to "none".

good idea, we can take a look into that.

However, there are more occurrences where something like this happens.

You mean other test scenarios that also have an incomplete schedule but not related to PUBLISH_*? If yes then please reference those scenarios as well.

@okurz we thought that we could use the presence of the OPENQA_INVESTIGATE_ORIGIN variable for disabling all publishing

Yes, but as I stated what we should sustain is that os-autoinst and openQA do not know about os-autoinst/scripts and shouldn't need to know about it. Meaning that also os-autoinst code must not use the variable OPENQA_INVESTIGATE_ORIGIN.

Actions #11

Updated by tinita about 2 months ago

okurz wrote in #note-10:

However, there are more occurrences where something like this happens.

You mean other test scenarios that also have an incomplete schedule but not related to PUBLISH_*? If yes then please reference those scenarios as well.

That's a quote from me.
I mean that there are other occurrences in the code like loadtest(...) if get_var("PUBLISH_HDD_1")

Actions #12

Updated by tinita about 2 months ago · Edited

@rfan1 take care to always add a blank line between quotes and your own text. otherwise redmine reads everything as a quote

Actions #13

Updated by rfan1 about 2 months ago

tinita wrote in #note-12:

@rfan1 take care to always add a blank line between quotes and your own text. otherwise redmine reads everything as a quote

Got it!

Actions #14

Updated by tinita about 2 months ago

I ran a job with both vars set to none and at least it failed in the same way as the original job: https://openqa.suse.de/tests/13725407#details

Actions #15

Updated by okurz about 2 months ago

  • Subject changed from [tools] some Automatic investigation jobs for job 13642310 only run part of the test modules NOT ALL: [is it by design?] to [tools] some Automatic investigation jobs for job 13642310 only run part of the test modules NOT ALL: [is it by design?] size:M
  • Description updated (diff)
Actions #16

Updated by tinita about 2 months ago

  • Status changed from Feedback to Workable
  • Assignee deleted (tinita)

This can also be one by someone else

Actions #17

Updated by ybonatakis about 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to ybonatakis
Actions #18

Updated by ybonatakis about 2 months ago

I took a look and i read the comments above. is this about modify the https://github.com/os-autoinst/scripts/pull/69/files#diff-f73cf39a07f6cf8cdb453862496919d06df16d07e58b274e68ea148dd1f7dae5R32 to set none or whatever to the script? because from the meeting I thought it has to do something with openQA+os-autoinst

Actions #19

Updated by openqa_review about 2 months ago

  • Due date set to 2024-03-23

Setting due date based on mean cycle time of SUSE QE Tools

Actions #20

Updated by ybonatakis about 2 months ago

https://github.com/os-autoinst/scripts/pull/299
I guess this is one thing it takes to solve the problem with the scheduling.
Now, it should be something in openQA regarding the upload assets.

Actions #21

Updated by ybonatakis about 2 months ago

I see that on https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker/Job.pm#L436 there is a check file. if it is not the whole upload, i think it is skipped. Can someone with more experience confirm that?

Actions #22

Updated by okurz about 2 months ago

ybonatakis wrote in #note-21:

I see that on https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker/Job.pm#L436 there is a check file. if it is not the whole upload, i think it is skipped. Can someone with more experience confirm that?

I can confirm that the code silently skips assets specified that can not be uploaded. But I suggest you try it out by simply triggering an openQA job with PUBLISH_HDD_1=none or similar and see what happens.

Actions #23

Updated by livdywan about 2 months ago

Action points to try discussed in conversation:

  • Clone a job relying on PUBLISH_HDD_1 as-is
    • Also clone the same job with PUBLISH_HDD_1=none
  • Make a one-line change in OpenQa/IsoTovideo/Utils.pm in os-autoinst to skip any files called none when compiling the list of assets
  • See how 14-isotovideo.t behaves with different asset filenames and trivial changes to the logic
Actions #24

Updated by ybonatakis about 2 months ago

  • Status changed from In Progress to Feedback

https://github.com/os-autoinst/os-autoinst/pull/2470 with this simple line the test seems to not upload anything when none or None is used in PUBLISH_HDD_*

Actions #25

Updated by ybonatakis about 1 month ago

changes on os-autoinst merged. there is still https://github.com/os-autoinst/scripts/pull/299 open

Actions #26

Updated by ybonatakis about 1 month ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF