Project

General

Profile

Actions

action #163781

closed

Jobs randomly fail with unspecified "api failure", there should be more details in the error message size:S

Added by MDoucha 5 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2024-07-11
Due date:
% Done:

0%

Estimated time:

Description

https://progress.opensuse.org/issues/163781

Jobs randomly fail with unspecified "api failure", there should be more details in the error message size:S

Observation

A few kernel jobs have failed during upload phase with quite non-descript reason: "api failure". As a result, there's no autoinst-log.txt nor worker-log.txt.
https://openqa.suse.de/tests/14897579
https://openqa.suse.de/tests/14897580
https://openqa.suse.de/tests/14895759

Acceptance criteria

  • AC1: No jobs can fail with unspecified reason "api failure" without more details
  • AC2: API failures are still handled and shown via the reason field

Suggestions


Related issues 2 (1 open1 closed)

Related to openQA Infrastructure - action #162038: No HTTP Response on OSD on 10-06-2024 - auto_review:".*timestamp mismatch - check whether clocks on the local host and the web UI host are in sync":retry size:SResolvednicksinger2024-06-10

Actions
Related to openQA Project - action #164418: Distinguish "timestamp mismatch" from cases where webUI is slow or where clocks are really differingNew2024-06-10

Actions
Actions #1

Updated by livdywan 5 months ago

  • Is duplicate of action #162038: No HTTP Response on OSD on 10-06-2024 - auto_review:".*timestamp mismatch - check whether clocks on the local host and the web UI host are in sync":retry size:S added
Actions #2

Updated by okurz 5 months ago

  • Category set to Feature requests
  • Target version set to Tools - Next
Actions #3

Updated by okurz 5 months ago

  • Subject changed from Jobs randomly fail with unspecified "api failure" to Jobs randomly fail with unspecified "api failure", there should be more details in the error message
Actions #4

Updated by nicksinger 4 months ago

  • Status changed from New to Resolved

I validated that the openQA changes are deployed and applied my config change manually (including restarting services) for now until our pipelines work again. Until now we don't see the new error message which is expected and good. We discussed that this should be sufficient for now and other alerts (e.g. number of new incomplete jobs) should alert us if the situation gets worse.

Actions #5

Updated by okurz 4 months ago

  • Assignee set to nicksinger
  • Target version changed from Tools - Next to Ready
Actions #6

Updated by nicksinger 4 months ago

  • Is duplicate of deleted (action #162038: No HTTP Response on OSD on 10-06-2024 - auto_review:".*timestamp mismatch - check whether clocks on the local host and the web UI host are in sync":retry size:S)
Actions #7

Updated by nicksinger 4 months ago

  • Related to action #162038: No HTTP Response on OSD on 10-06-2024 - auto_review:".*timestamp mismatch - check whether clocks on the local host and the web UI host are in sync":retry size:S added
Actions #8

Updated by nicksinger 4 months ago

  • Status changed from Resolved to New

nicksinger wrote in #note-4:

I validated that the openQA changes are deployed and applied my config change manually (including restarting services) for now until our pipelines work again. Until now we don't see the new error message which is expected and good. We discussed that this should be sufficient for now and other alerts (e.g. number of new incomplete jobs) should alert us if the situation gets worse.

Seems like progress/redmine just took my last comment from the other ticket (https://progress.opensuse.org/issues/162038) and applied it here as well which is obviously not changing anything in here -> reopening

Actions #9

Updated by nicksinger 4 months ago

  • Assignee deleted (nicksinger)
Actions #10

Updated by nicksinger 4 months ago

  • Related to action #164418: Distinguish "timestamp mismatch" from cases where webUI is slow or where clocks are really differing added
Actions #11

Updated by okurz 4 months ago

  • Target version changed from Ready to Tools - Next
Actions #12

Updated by tinita 4 months ago

  • Subject changed from Jobs randomly fail with unspecified "api failure", there should be more details in the error message to Jobs randomly fail with unspecified "api failure", there should be more details in the error message size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #13

Updated by tinita 3 months ago

  • Target version changed from Tools - Next to Ready
Actions #14

Updated by okurz 3 months ago

  • Priority changed from Normal to Low
Actions #15

Updated by mkittler 3 months ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
Actions #16

Updated by mkittler 3 months ago

  • Status changed from In Progress to Feedback
Actions #17

Updated by mkittler 2 months ago

  • Status changed from Feedback to Resolved

With the PR merged I don't think we'll see jobs with just "api failure" anymore. If I missed cases we can reopen the ticket. I cannot check the cases of the jobs mentioned in the ticket description specifically because they're 404.

Actions

Also available in: Atom PDF