Project

General

Profile

Actions

action #181526

closed

osd-deployment fails during 'check openQA-in-openQA tests' size:S

Added by robert.richardson 11 days ago. Updated 7 days ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Start date:
2025-04-28
Due date:
2025-05-14
% Done:

0%

Estimated time:

Description

Observation

The last three (example, April 28, 2025 at 7:04:00 AM GMT+2)) failed during the 'check openQA-in-openQA tests':

...
Using docker image sha256:7b3d93d7ed077116fe2913df717a9ae08fabcb595e6b19683a499513d3d72a85 for registry.opensuse.org/home/okurz/container/ca/containers/tumbleweed:curl-jq-ssh-retry with digest registry.opensuse.org/home/okurz/container/ca/containers/tumbleweed@sha256:13781db5e15f86d1f58ea29ea2cf0546fe72b6d78fe6e9a3f0d0e39b979d4a90 ...
$ retry --retries 7 --sleep 300 sh -c 'curl -S -s http://jenkins.qa.suse.de/job/monitor-openQA_in_openQA-TW/lastCompletedBuild/api/json | jq -e "select(.result==\"SUCCESS\")" >/dev/null'
Retrying up to 7 more times after sleeping 300s …
Retrying up to 6 more times after sleeping 300s …
Retrying up to 5 more times after sleeping 300s …
Retrying up to 4 more times after sleeping 300s …
Retrying up to 3 more times after sleeping 300s …
Retrying up to 2 more times after sleeping 300s …
Retrying up to 1 more times after sleeping 300s …
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: exit code 1

Last good pipeline run was two days ago:
https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/1695209

Acceptance Criteria

  • AC1: osd-deployment does not fail on any openQA-in-openQA checks as long as openQA-in-openQA tests are actually all passed
  • AC2: We have ensured that there is at least a ticket planned for ensuring jenkins pipelines don't get infinitely stuck

Suggestions

  • Relevant jobs were not failing here
    • Packages in OBS that Jenkins monitors were missing, though
  • openQA-in-openQA checks are actually fine so look into the actual check against jenkins. Better use a check of openQA results directly, see #181526-5

Related issues 2 (2 open0 closed)

Related to openQA Infrastructure (public) - action #181580: jenkins doesn't notify us on broken builds/runs - should there be e-mails or such? size:SWorkable2025-04-29

Actions
Copied to openQA Project (public) - action #181574: [spike][timeboxed:11h] Re-consider openQA-in-openQA OBS/Jenkins setup for staged packages size:SWorkable

Actions
Actions #1

Updated by okurz 10 days ago · Edited

  • Priority changed from High to Urgent

Again/still failing today and another email alert but about monitor-pre-deploy

Actions #2

Updated by livdywan 10 days ago

  • Status changed from New to In Progress
  • Assignee set to livdywan

Taking a look. No idea yet but at least to check mitigations, maybe check if gitlab/jenkins needs to be temporarily disabled to handle alerts.

Actions #3

Updated by livdywan 10 days ago · Edited

  • Priority changed from Urgent to High

Analysis for what's going on

Steps

Mitigations

Can we improve (out of scope for this ticket)

Actions #4

Updated by livdywan 10 days ago · Edited

Looking more into the passing jenkins job http://jenkins.qa.suse.de/job/monitor-openQA_in_openQA-TW/30760/console

++ jq -r '.ids[]' job_post_response
jq: error (at job_post_response:2): Cannot index number with string "ids"
parse error: Invalid numeric literal at line 2, column 7
[...]
+ echo 'Result of job 5026991: failed
Result of job 5026991: failed`

So an openQA job failed https://openqa.opensuse.org/tests/5026991 but we never check the next build/job i.e. there is already a new, passing job https://openqa.opensuse.org/tests/5027063

Steps

Actions #5

Updated by livdywan 10 days ago

  • Copied to action #181574: [spike][timeboxed:11h] Re-consider openQA-in-openQA OBS/Jenkins setup for staged packages size:S added
Actions #6

Updated by okurz 10 days ago

  • Subject changed from osd-deployment fails during 'check openQA-in-openQA tests' to osd-deployment fails during 'check openQA-in-openQA tests' size:S
  • Description updated (diff)
Actions #7

Updated by okurz 10 days ago

  • Related to action #181580: jenkins doesn't notify us on broken builds/runs - should there be e-mails or such? size:S added
Actions #8

Updated by openqa_review 9 days ago

  • Due date set to 2025-05-14

Setting due date based on mean cycle time of SUSE QE Tools

Actions #9

Updated by livdywan 9 days ago

https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/67

Not sure this will work as-is. But as I've spent some time trying to figure out if I get the right behavior locally, I pushed what I have anyway.

Actions #10

Updated by okurz 7 days ago

https://gitlab.suse.de/openqa/osd-deployment/-/merge_requests/67 merged. I triggered a new pipeline and it's all good.

Actions #11

Updated by livdywan 7 days ago

  • Status changed from In Progress to Feedback

I re-enabled the pipeline and triggered another run just to be sure.

Actions #12

Updated by livdywan 7 days ago

  • Status changed from Feedback to Resolved

Ah, I see we already have a passing one from a moment ago. So I'd say we're good.

Actions

Also available in: Atom PDF