action #181526
closedosd-deployment fails during 'check openQA-in-openQA tests' size:S
0%
Description
Observation¶
The last three (example, April 28, 2025 at 7:04:00 AM GMT+2)) failed during the 'check openQA-in-openQA tests':
...
Using docker image sha256:7b3d93d7ed077116fe2913df717a9ae08fabcb595e6b19683a499513d3d72a85 for registry.opensuse.org/home/okurz/container/ca/containers/tumbleweed:curl-jq-ssh-retry with digest registry.opensuse.org/home/okurz/container/ca/containers/tumbleweed@sha256:13781db5e15f86d1f58ea29ea2cf0546fe72b6d78fe6e9a3f0d0e39b979d4a90 ...
$ retry --retries 7 --sleep 300 sh -c 'curl -S -s http://jenkins.qa.suse.de/job/monitor-openQA_in_openQA-TW/lastCompletedBuild/api/json | jq -e "select(.result==\"SUCCESS\")" >/dev/null'
Retrying up to 7 more times after sleeping 300s …
Retrying up to 6 more times after sleeping 300s …
Retrying up to 5 more times after sleeping 300s …
Retrying up to 4 more times after sleeping 300s …
Retrying up to 3 more times after sleeping 300s …
Retrying up to 2 more times after sleeping 300s …
Retrying up to 1 more times after sleeping 300s …
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: exit code 1
Last good pipeline run was two days ago:
https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/1695209
Acceptance Criteria¶
- AC1: osd-deployment does not fail on any openQA-in-openQA checks as long as openQA-in-openQA tests are actually all passed
- AC2: We have ensured that there is at least a ticket planned for ensuring jenkins pipelines don't get infinitely stuck
Suggestions¶
- Relevant jobs were not failing here
- Packages in OBS that Jenkins monitors were missing, though
- openQA-in-openQA checks are actually fine so look into the actual check against jenkins. Better use a check of openQA results directly, see #181526-5
Updated by livdywan 10 days ago · Edited
- Priority changed from Urgent to High
Analysis for what's going on
-
https://gitlab.suse.de/openqa/osd-deployment/-/jobs/4231800 fails
retry --retries 7 --sleep 300 sh -c 'curl -S -s http://jenkins.qa.suse.de/job/monitor-openQA_in_openQA-TW/lastCompletedBuild/api/json | jq -e "select(.result==\"SUCCESS\")" >/dev/null'
-
http://jenkins.qa.suse.de/job/monitor-openQA_in_openQA-TW/lastCompletedBuild/api/json syas
FAILURE
- Jenkins pipeline stuck in a loop http://jenkins.qa.suse.de/job/trigger-openQA_in_openQA-TW/36283/console
+ echo 'Wait until all packages are published under devel:openQA:testing' Wait until all packages are published under devel:openQA:testing + osc prjresults --watch --xml -r openSUSE_Tumbleweed -a x86_64 devel:openQA:testing <resultlist state="16c665b6c06c94aff179d88ca33cc6e6"> <result project="devel:openQA:testing" repository="openSUSE_Tumbleweed" arch="x86_64" code="blocked" state="blocked" details="Releasing package openQA" dirty="true"/> </resultlist>
- https://build.opensuse.org/project/repository_state/devel:openQA:testing/openSUSE_Tumbleweed No (published) packages in the OBS project
- Packages are copied, not built
- Manual copy of packages worked before
Steps
- Should we delete packages? https://build.opensuse.org/project/show/devel:openQA:testing
- Ask someone who knows OBS for help with what the state of the repo is before handling it
Mitigations
- DONE Disable https://gitlab.suse.de/openqa/osd-deployment
Can we improve (out of scope for this ticket)
- Will we get gitea features once OBS enables git hosting https://docs.gitea.com/usage/actions/quickstart Maybe we can drop jenkins
Updated by livdywan 10 days ago · Edited
Looking more into the passing jenkins job http://jenkins.qa.suse.de/job/monitor-openQA_in_openQA-TW/30760/console
++ jq -r '.ids[]' job_post_response
jq: error (at job_post_response:2): Cannot index number with string "ids"
parse error: Invalid numeric literal at line 2, column 7
[...]
+ echo 'Result of job 5026991: failed
Result of job 5026991: failed`
So an openQA job failed https://openqa.opensuse.org/tests/5026991 but we never check the next build/job i.e. there is already a new, passing job https://openqa.opensuse.org/tests/5027063
Steps
- Adjust openQA job group config @nicksinger @mkittler
- Prepare a new "full" job group
- https://openqa.opensuse.org/parent_group_overview/12#grouped_by_group
- Maybe we want to check https://openqa.opensuse.org/group_overview/24.json in osd-deployment (instead of OBS) @livdywan
- https://gitlab.suse.de/openqa/osd-deployment/-/blob/master/.gitlab-ci.yml?ref_type=heads#L157
- Would be good to check relevant git history for the original reasoning
- We have a MR but no linked ticket or further discussion https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/9
Updated by livdywan 10 days ago
- Copied to action #181574: [spike][timeboxed:11h] Re-consider openQA-in-openQA OBS/Jenkins setup for staged packages size:S added
Updated by okurz 10 days ago
- Related to action #181580: jenkins doesn't notify us on broken builds/runs - should there be e-mails or such? size:S added
Updated by openqa_review 9 days ago
- Due date set to 2025-05-14
Setting due date based on mean cycle time of SUSE QE Tools
Updated by livdywan 9 days ago
https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/67
Not sure this will work as-is. But as I've spent some time trying to figure out if I get the right behavior locally, I pushed what I have anyway.
Updated by okurz 7 days ago
https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/67 merged. I triggered a new pipeline and it's all good.
Updated by livdywan 7 days ago
- Status changed from In Progress to Feedback
I re-enabled the pipeline and triggered another run just to be sure.