Project

General

Profile

Actions

action #167395

closed

QA (public) - coordination #162890: [saga][epic] feature discoverability

coordination #162896: [epic] Job triggering on jobless openQA instances

Ensure only the tested revision of devel:openQA packages are submitted to openSUSE:Factory size:M

Added by okurz 3 months ago. Updated 9 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2024-09-25
Due date:
2024-12-12
% Done:

0%

Estimated time:

Description

Motivation

A problem related to #166658 showed up in Tumbleweed openQA-in-openQA tests, reported in https://bugzilla.suse.com/show_bug.cgi?id=1230953 , but not devel:openQA based openQA-in-openQA tests in https://openqa.opensuse.org/group_overview/24. We conducted a deeper analysis in #167335 identifying a point for improvement:

The state of the OBS repository can change after tests were triggered/monitored. We should copy a specific revision, e.g. osc -r $rev co into devel:openQA:tested to make sure we only submit a version which we triggered tests for. However in that case it can still be that while openQA-in-openQA are running but have not yet installed packages that devel:openQA receives an update. This would become problematic if the revision we copied is actually faulty, the newer version picked up in tests ends up with passed tests but then we submit the faulty packages. We maybe need to combine copying a fixed revision with also disabling OBS services or build of packages until the pipeline completes.

Acceptance criteria

  • AC1: Given devel:openQA contains an os-autoinst package revision N When trigger+monitor+submit on jenkins.qa.suse.de is triggered Then revision N is submitted to openSUSE:Factory
  • AC2: Given devel:openQA contains an os-autoinst package revision N And trigger+monitor+submit on jenkins.qa.suse.de is triggered When a new pull request is merged in either https://github.com/os-autoinst/openQA/ or https://github.com/os-autoinst/os-autoinst/ And trigger+monitor+submit is still running Then trigger+monitor+submit does Not submit package revision N+1 And a new trigger+monitor+submit workflow is triggered for package revision N+1
  • AC3: Given devel:openQA contains an os-autoinst package revision N And trigger+monitor+submit on jenkins.qa.suse.de is triggered When a new pull request is merged in either https://github.com/os-autoinst/openQA/ or https://github.com/os-autoinst/os-autoinst/ And trigger+monitor+submit is still running Then still revision N is submitted to openSUSE:Factory

Suggestions

Further details

Alternatives to the idea mentioned in the motivation are

  1. When submitting, we need to make sure the OBS repository hasn't changed in the meantime to submit only what we have tested. This means we would not be able to submit anything if we frequently update the OBS repo.
  2. An alternative to avoid this would be to save the OBS repo upfront (e.g. make a branch) so we can later always submit the exact version we have tested.
  3. Disable the build or services while we are testing in our pipelines until we have copied into devel:openQA:tested
  4. Or https://en.opensuse.org/openSUSE:Build_Service_Tips_and_Tricks , "Disable build of packages", osc api -X POST "/source/PROJECT/PACKAGE?cmd=set_flag&flag=build&status=disable" # and later ...

Related issues 2 (1 open1 closed)

Related to openQA Project (public) - action #174451: openQA-in-openQA tests can get stuck with an inconsistent repositoryNew2024-12-16

Actions
Copied from openQA Project (public) - action #167335: Conduct "lessons learned" with Five Why analysis for GRU git cloning related errorsResolvedokurz2024-09-25

Actions
Actions #1

Updated by okurz 3 months ago

  • Copied from action #167335: Conduct "lessons learned" with Five Why analysis for GRU git cloning related errors added
Actions #2

Updated by livdywan about 2 months ago

  • Subject changed from Ensure only the tested revision of devel:openQA packages are submitted to openSUSE:Factory to Ensure only the tested revision of devel:openQA packages are submitted to openSUSE:Factory size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by mkittler about 2 months ago

  • Target version changed from Tools - Next to Ready
Actions #4

Updated by okurz about 1 month ago

  • Parent task set to #162896
Actions #5

Updated by mkittler about 1 month ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
Actions #6

Updated by mkittler about 1 month ago · Edited

We could do the following:

  1. Save the current revision of the set of packages (auto_submit_packages=${packages:-$($osc ls "$dst_project" | grep -v '\-test$')}) via e.g. osc log devel:openQA openQA | grep/sed/… in trigger-openqa_in_openqa like we already save job_post_response.
  2. Submit that revision via osc -r $rev co in os-autoinst-obs-auto-submit. We need to use the same approach for copying artifacts as in monitor-openQA_in_openQA-TW.

The problem with this is that the revision saved in 1. might not be the revision we have actually tested. Until the test is scheduled and reaches the point where packages are installed from OBS a new version might have been published on OBS.

It would probably make more sense if we do the following:

  1. Keep the triggering as-is.
  2. Upload the exact package versions that were installed during the test run.
  3. Download the log from 2. in scripts/os-autoinst-obs-auto-submit. For this we need to know the job IDs be we have it in job_post_response and just need to restore them like in monitor-openQA_in_openQA-TW.
  4. Check whether that version matches what we still have on OBS. If not, either skip the submission or try to figure out what revision had this version and submit that via osc -r $rev co.

Note that step 2 makes actually only sense for the scenarios that install packages from OBS. So within the set of jobs that we start only some jobs will contain the information about the package versions. This is presumably ok. Considering that there are multiple scenarios that install from OBS there can actually be the case when those tests did not all run on the same version of every package. Maybe we should then in step 4 simply reject the whole submission (will hopefully not happen very often).

Actions #7

Updated by openqa_review about 1 month ago

  • Due date set to 2024-11-28

Setting due date based on mean cycle time of SUSE QE Tools

Actions #8

Updated by mkittler about 1 month ago · Edited

We discussed that it would be better to look into the "branching" alternative (mentioned as 2nd suggestion in the ticket description).

So I invoked to create a "fixed" copy of some packages in our devel:openQA project under devel:openQA:testing¹:

osc linkpac --current --disable-build devel:openQA os-autoinst devel:openQA:testing
osc linkpac --current --disable-build devel:openQA openQA devel:openQA:testing

It looks like --disable-build simply sets the build flag in the config of the package to false. The publish flag is still true, though.

Unfortunately this kind of setup doesn't work. OBS says "Repository has been published" but https://download.opensuse.org/repositories/devel:/openQA:/testing/openSUSE_Tumbleweed has not been populated so far.

So also I tried:

osc branch --force --disable-build devel:openQA openQA devel:openQA:testing

But the repo is still not populated.

I guess I'll have to research how this can be done or ask for help. We could of course also download the repo manually and serve it via some HTTP server outside the scope of OBS.

EDIT: Asked on Slack: https://suse.slack.com/archives/C02BXKBMXNV/p1731605574673919


¹ Probably we should create and delete this project on the fly. The trigger script would delete it when not already there and otherwise abort. The monitoring/submission script would delete the project when done. This way the presence of the project would act as a flag to know whether testing/submissions are still pending (and be able to avoid triggering new tests/submissions if there is still one pending).

Actions #9

Updated by mkittler about 1 month ago

It turns out that one needs to use osc release for this (thanks to @bmwiedemann pointing this out to me).

So the following does the trick:

for package in os-autoinst openQA; do osc release --target-project devel:openQA:testing --no-delay --target-repository=openSUSE_Tumbleweed -r openSUSE_Tumbleweed -a x86_64 devel:openQA "$package"; done

This command populated https://download.opensuse.org/repositories/devel:/openQA:/testing/openSUSE_Tumbleweed/x86_64/ with no delay (so I suppose --no-delay works as advertised).

Actions #10

Updated by mkittler about 1 month ago

PR for being able to use the snapshot project in openQA-in-openQA tests: https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/215

Actions #12

Updated by mkittler about 1 month ago

  • Status changed from In Progress to Feedback
Actions #13

Updated by okurz 22 days ago

  • Due date changed from 2024-11-28 to 2024-12-05

hack-week due-date bump :)

Actions #14

Updated by mkittler 22 days ago

  • Status changed from Feedback to In Progress

Looks like tests are still triggered successfully and now use the new intermediate project as expected: https://openqa.opensuse.org/tests/4667473#step/openqa_webui/1

The monitoring also worked.

The submission didn't work: http://jenkins.qa.suse.de/job/submit-openQA-TW-to-oS_Fctry/lastBuild/console
The log doesn't show the updated contents of https://raw.githubusercontent.com/os-autoinst/scripts/master/os-autoinst-obs-auto-submit (the staging_project variable is missing). Therefore https://build.opensuse.org/project/show/devel:openQA:testing also wasn't cleaned up. The cleanup wouldn't have worked anyway because the submission ran into an early return (because the latest version was already in Factory). This PR should fix that https://github.com/os-autoinst/scripts/pull/357.

I triggered the submission again. Now it had the new version but it fails with:

++ dirname bash
+ . ./_common
bash: line 18: ./_common: No such file or directory
Build step 'Execute shell' marked build as failure
``

I suppose we should use a proper checkout of the repo here.
Actions #15

Updated by mkittler 22 days ago

I configured the submission step on Jenkins so that it'll run the scripts from Git via https://github.com/os-autoinst/scripts/pull/357.

I also added made it invoke a final cleanup of the staging repo so we never end up in the situation that a missing cleanup prevents further triggering.

Actions #16

Updated by mkittler 22 days ago

The 2nd PR was merged and submissions still work, e.g. https://build.opensuse.org/requests/1226911.

Actions #17

Updated by mkittler 22 days ago

I've just noticed that with my changes the throttling we put in place on http://jenkins.qa.suse.de/job/submit-openQA-TW-to-oS_Fctry/configure will now effectively bring all those jobs to a halt. Only if the submission job runs (with success or not) we would cleanup devel:openQA:testing and only then new tests will be triggered. Not sure how to resolve this because the whole point of this improvement is to synchronize these jobs (to avoid the problem mentioned in the ticket description). We could still let openQA-in-openQA jobs run but not overriding/using devel:openQA:testing and also not to submit anything (not sure how to prevent the submission job).

Actions #18

Updated by mkittler 21 days ago

  • Status changed from In Progress to Feedback

PR to fix issue mentioned in previous comment: https://github.com/os-autoinst/scripts/pull/358

I have also noticed that https://build.opensuse.org/project/show/devel:openQA:testing wasn't cleaned up yesterday after the submission - even though it looks like it was done in the logs:

Latest revision already in openSUSE:Factory
+ rc=0
+ cleanup-obs-project devel:openQA:testing 'I am sure'
+ exit 0
…
Finished: SUCCESS

So maybe it actually was cleaned up but then the next trigger job populated it again and now - due to the throttling - it simply hasn't been cleaned up yet. That would mean everything is fine. I'll have a look at it (to verify that theory) in the next days.

Actions #19

Updated by mkittler 20 days ago

The PR has been merged and it new openQA-in-openQA tests have been triggered - without overriding devel:openQA:testing. Let's see how it behaves when the throttle for doing the next submission expires. (It'll expire only this evening.)

Actions #20

Updated by mkittler 16 days ago · Edited

  • Status changed from Feedback to In Progress

The PR has been merged and it new openQA-in-openQA tests have been triggered - without overriding devel:openQA:testing. Let's see how it behaves when the throttle for doing the next submission expires. (It'll expire only this evening.)

Looks the SR has been created after the throttle expired: https://build.opensuse.org/requests/1227377

devel:openQA:testing has also been cleaned up because the packages it contains now are only one day old (created by http://jenkins.qa.suse.de/job/trigger-openQA_in_openQA-TW/33121/).

Yesterday there was no new submission because http://jenkins.qa.suse.de/job/submit-openQA-TW-to-oS_Fctry/1125/ ended early with:

Skipping submission, reason: 
Only triggering tests from devel:openQA (not overriding devel:openQA:testing and doing a submission) because openQA-in-openQA tests or a submission is still pending (as devel:openQA:testing still contains packages)
+ rc=0
+ cleanup-obs-project devel:openQA:testing 'I am sure'
+ exit 0

So also that part works - although I'm wondering whether skipping submissions that way is a good idea. We only run a submission job every 3 days and now some of these will not do anything. I'm not sure how the throttle actually behaves. If a throttled submission job always gets replaced by a new one when a preceding triggering/monitoring job finishes and thus triggers a new submission job we have a problem because then we would always run into the Skipping submission … case.

It is also wrong that the job still did the cleanup. This should be fixed now, though.

Actions #21

Updated by mkittler 16 days ago

The only way to solve this I can currently think of is to remove the throttle on Jenkins level. Then we would check the date of last active SR within the submission job and skip if it is too young. This way the skipped submissions will not factor into the throttling.

Actions #22

Updated by mkittler 16 days ago

  • Status changed from In Progress to Feedback
Actions #23

Updated by mkittler 15 days ago

The PR was merged and I added rm -f job_post_* before restoring artifacts to avoid leftover files in the working directory. I think it works and we got e.g. https://build.opensuse.org/requests/1228300 now (based on devel:openQA:testing).

Actions #24

Updated by mkittler 15 days ago

Looks like the throttle within the script works (only " days" is missing in the output):

[submit-openQA-TW-to-oS_Fctry] $ /bin/sh -xe /tmp/jenkins13543617303560206947.sh
+ os-autoinst-obs-auto-submit
Skipping submission, there is still a pending SR younger than 2.
+ rc=0

Except when it doesn't work:

[submit-openQA-TW-to-oS_Fctry] $ /bin/sh -xe /tmp/jenkins484091042308331852.sh
+ os-autoinst-obs-auto-submit
Exception ignored in: <osc.util.safewriter.SafeWriter object at 0x7ff503457978>
BrokenPipeError: [Errno 32] Broken pipe
Retrying up to 3 more times after sleeping 3s …
Exception ignored in: <osc.util.safewriter.SafeWriter object at 0x7fe5cfb98978>
BrokenPipeError: [Errno 32] Broken pipe
Retrying up to 2 more times after sleeping 6s …
Exception ignored in: <osc.util.safewriter.SafeWriter object at 0x7f3d3176f978>
BrokenPipeError: [Errno 32] Broken pipe
Retrying up to 1 more times after sleeping 12s …
Exception ignored in: <osc.util.safewriter.SafeWriter object at 0x7f678aaaa978>
BrokenPipeError: [Errno 32] Broken pipe
ok
A    devel:openQA:testing
A    devel:openQA:testing/openQA
…

This error happened two times.

I couldn't reproduce the problem outside of Jenkins so far, e.g. martchus@jenkins:~> sudo -u jenkins osc request list --project devel:openQA:tested --type submit --state new,review --mine --days 2 works just fine.

Actions #26

Updated by okurz 13 days ago

  • Due date changed from 2024-12-05 to 2024-12-12

merged. Looks good to me

Actions #27

Updated by mkittler 9 days ago

  • Status changed from Feedback to Resolved

It looks like this is now working, indeed - including the case when there's already a SR:

+ os-autoinst-obs-auto-submit
Skipping submission, there is still a pending SR younger than 2 days.
+ rc=0
+ [[ -e job_post_skip_submission ]]
+ cleanup-obs-project devel:openQA:testing 'I am sure'
+ exit 0

(http://jenkins.qa.suse.de/job/submit-openQA-TW-to-oS_Fctry/1206/console)

Actions #28

Updated by mkittler 3 days ago

  • Related to action #174451: openQA-in-openQA tests can get stuck with an inconsistent repository added
Actions

Also available in: Atom PDF