Project

General

Profile

Actions

action #157018

closed

[sporadic] Build failed in Jenkins: submit-openQA-TW-to-oS_Fctry - Error 503: Service Unavailable size:S

Added by tinita about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-03-11
Due date:
% Done:

0%

Estimated time:

Description

Observation

Date: Sat, 9 Mar 2024 03:49:48 +0100 (CET)

See <http://jenkins.qa.suse.de/job/submit-openQA-TW-to-oS_Fctry/1001/display/redirect>

Changes:


------------------------------------------
[...truncated 4.20 MiB...]
  <result project="devel:openQA:tested" repository="openSUSE_Factory" arch="x86_64" code="blocked" state="blocked">
    <status package="openQA" code="blocked">
+ echo 'Waiting while openQA is in progress'
Waiting while openQA is in progress

...
4.6.1709822711.90519fe6
4.6.1709822711.90519fe6
4.6.1709822711.90519fe6
4.6.1709822711.90519fe6' openSUSE:Factory
Server returned an error: HTTP Error 503: Service Unavailable

Acceptance criteria

  • AC1: Short unavailabilities of OBS are covered with retry

Suggestions

Actions #1

Updated by tinita about 2 months ago

  • Category set to Regressions/Crashes

Observation

Date: Sat, 9 Mar 2024 03:49:48 +0100 (CET)

See <http://jenkins.qa.suse.de/job/submit-openQA-TW-to-oS_Fctry/1001/display/redirect>

Changes:


------------------------------------------
[...truncated 4.20 MiB...]
  <result project="devel:openQA:tested" repository="openSUSE_Factory" arch="x86_64" code="blocked" state="blocked">
    <status package="openQA" code="blocked">
+ echo 'Waiting while openQA is in progress'
Waiting while openQA is in progress

...
4.6.1709822711.90519fe6
4.6.1709822711.90519fe6
4.6.1709822711.90519fe6
4.6.1709822711.90519fe6' openSUSE:Factory
Server returned an error: HTTP Error 503: Service Unavailable
Actions #2

Updated by okurz about 2 months ago

  • Target version set to Ready
Actions #3

Updated by okurz about 2 months ago

  • Tags set to reactive work, sporadic
Actions #4

Updated by okurz about 1 month ago

  • Subject changed from Build failed in Jenkins: submit-openQA-TW-to-oS_Fctry - Error 503: Service Unavailable to [sporadic] Build failed in Jenkins: submit-openQA-TW-to-oS_Fctry - Error 503: Service Unavailable size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #5

Updated by livdywan about 1 month ago

  • Status changed from Workable to In Progress
  • Assignee set to livdywan

Suggestions

I assume os-autoinst-obs-auto-submit:38 is the relevant code. Let's see if I can propose a trivial fix.

Actions #6

Updated by livdywan about 1 month ago

livdywan wrote in #note-5:

Suggestions

I assume os-autoinst-obs-auto-submit:38 is the relevant code. Let's see if I can propose a trivial fix.

https://github.com/os-autoinst/scripts/pull/300 correction, it is the osc call

Actions #7

Updated by openqa_review about 1 month ago

  • Due date set to 2024-03-29

Setting due date based on mean cycle time of SUSE QE Tools

Actions #8

Updated by livdywan about 1 month ago

  • Status changed from In Progress to Feedback

https://github.com/os-autoinst/scripts/pull/300 correction, it is the osc call

Merged. Let's see if it works fine. Would probably resolve it soon since we likely won't see another outage.

Actions #10

Updated by livdywan about 1 month ago

tinita wrote in #note-9:

Had to revert it: https://github.com/os-autoinst/scripts/pull/301

Apparently I was looking at the dependencies but did not include the change. So hopefully that should sort it :-D

Actions #11

Updated by jbaier_cz about 1 month ago

  • Status changed from Feedback to Workable
  • Priority changed from Normal to High

Still need some update though, see the error:

+ retry -e osc co --server-side-source-service-files devel:openQA/openQA
retry: unrecognized option '--server-side-source-service-files'
usage: /usr/bin/retry [options] [cmd...]
options:
    -h,--help                   Show this help
    -r,--retries=RETRIES        How many retries to do on command failure after
                                the initial try. Defaults to 3.
    -s,--sleep=SLEEP            How many seconds to sleep between retries.
                                Defaults to 3 seconds.
    -e,--exponential[=FACTOR]   Enable simple exponential back-off algorithm.
                                Disabled by default, factor defaults to 2
                                (binary exponential back-off). 
Actions #12

Updated by okurz about 1 month ago

  • Status changed from Workable to In Progress

New PR was created and merged and verified in jenkins. There is still one problematic entry:

+++ retry -e -- osc cat openSUSE:Factory/openQA/openQA.changes
+++ grep 'Update to version'
+++ head -n1
Retrying up to 3 more times after sleeping 3s …
Retrying up to 2 more times after sleeping 6s …
Retrying up to 1 more times after sleeping 12s …

there shouldn't have been a retry here. Apparently there is a SIGPIPE due to the head. Try to reproduce with

retry -r0 -e -- osc cat openSUSE:Factory/openQA/openQA.changes | grep 'Update to version' | head -n1; echo "${PIPESTATUS[@]}"
- Update to version 4.6.1710762624.7d0dd225:
1 141 0

The grep 'Update to version' | head -n1 can actually be simplified to grep -m1 'Update to version' but that does not yet fix the original problem:

retry -r0 -e -- osc cat openSUSE:Factory/openQA/openQA.changes | grep -m1 'Update to version'; echo "${PIPESTATUS[@]}"
- Update to version 4.6.1710762624.7d0dd225:
1 0

which gets rid of the sigpipe of grep but keeps the failure of osc cat. Then I found one other possibility:

grep -m1 'Update to version' <(retry -r0 -e -- osc cat openSUSE:Factory/openQA/openQA.changes); echo "${PIPESTATUS[@]}"
- Update to version 4.6.1710762624.7d0dd225:
0

created https://github.com/os-autoinst/scripts/pull/306

Actions #13

Updated by livdywan about 1 month ago

okurz wrote in #note-12:

New PR was created and merged and verified in jenkins. There is still one problematic entry:

+++ retry -e -- osc cat openSUSE:Factory/openQA/openQA.changes
+++ grep 'Update to version'
+++ head -n1
Retrying up to 3 more times after sleeping 3s …
Retrying up to 2 more times after sleeping 6s …
Retrying up to 1 more times after sleeping 12s …

there shouldn't have been a retry here. Apparently there is a SIGPIPE due to the head. Try to reproduce with

retry -r0 -e -- osc cat openSUSE:Factory/openQA/openQA.changes | grep 'Update to version' | head -n1; echo "${PIPESTATUS[@]}"
- Update to version 4.6.1710762624.7d0dd225:
1 141 0

The grep 'Update to version' | head -n1 can actually be simplified to grep -m1 'Update to version' but that does not yet fix the original problem:

retry -r0 -e -- osc cat openSUSE:Factory/openQA/openQA.changes | grep -m1 'Update to version'; echo "${PIPESTATUS[@]}"
- Update to version 4.6.1710762624.7d0dd225:
1 0

which gets rid of the sigpipe of grep but keeps the failure of osc cat. Then I found one other possibility:

grep -m1 'Update to version' <(retry -r0 -e -- osc cat openSUSE:Factory/openQA/openQA.changes); echo "${PIPESTATUS[@]}"
- Update to version 4.6.1710762624.7d0dd225:
0

created https://github.com/os-autoinst/scripts/pull/306

bash: line 158: prefix: unbound variable

Apparently this broke elsewhere now.

Actions #15

Updated by okurz about 1 month ago

  • Assignee changed from livdywan to okurz

I wonder why http://jenkins.qa.suse.de/job/submit-openQA-TW-to-oS_Fctry/1010/console still mentions multiple "Retrying", looking into that. and the Retrying up to 2 more times after sleeping 6s … line is doubled. there seems to be some retry process going on in the background as there are lines like "Retrying up to 1 more times after sleeping 12s …" just intermixed with other content. ok, the process redirection seems to be a bad idea.

Actions #16

Updated by okurz about 1 month ago

This reproduces the problem

0 $ grep -m1 'Update to version' <(retry -s0 -e -- osc cat openSUSE:Factory/openQA/openQA.changes)
- Update to version 4.6.1710845353.23e79984:
0 $ Retrying up to 3 more times after sleeping 0s …
Retrying up to 2 more times after sleeping 0s …
Retrying up to 1 more times after sleeping 0s …

so one can see the grep returning fine but then in the background retry output still piles up. In before I used retry -r0 so no retries would have been executed. With -s0 we still execute the retries but with no sleep time in between.

Actions #17

Updated by okurz about 1 month ago

  • Status changed from In Progress to Feedback
Actions #18

Updated by okurz about 1 month ago · Edited

  • Due date deleted (2024-03-29)
  • Status changed from Feedback to In Progress
Actions #19

Updated by openqa_review about 1 month ago

  • Due date set to 2024-04-04

Setting due date based on mean cycle time of SUSE QE Tools

Actions #21

Updated by okurz about 1 month ago

  • Due date deleted (2024-04-04)
  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF