action #133583
closed
qem-bot approve incidents failed in gitlab CI, reason unkown size:M
Added by okurz over 1 year ago.
Updated over 1 year ago.
Description
Observation¶
https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1724368 says
2023-07-31 10:36:47 INFO * SUSE:Maintenance:30033:304153
2023-07-31 10:36:47 INFO Accepting review for SUSE:Maintenance:29819:304034
2023-07-31 10:36:47 INFO Accepting review for SUSE:Maintenance:29993:304113
2023-07-31 10:36:47 INFO Received 'Not Found'. Request 304113 removed or problem on OBS side, ignoring
2023-07-31 10:36:47 INFO Accepting review for SUSE:Maintenance:29994:304114
…
2023-07-31 10:36:48 INFO Received 'Not Found'. Request 304153 removed or problem on OBS side, ignoring
2023-07-31 10:36:48 INFO End of bot run
++ let 'sleep=BACKOFF_FACTOR*2**count'
++ let count+=1
++ (( count > MAX_RETRIES ))
++ exit 100
Uploading artifacts for failed job 00:01
Uploading artifacts...
bot_*.log: found 3 matching artifact files and directories
Uploading artifacts as "archive" to coordinator... 201 Created id=1724368 responseStatus=201 Created token=64_LuS46
Cleaning up project directory and file based variables 00:01
ERROR: Job failed: exit code 100
but I could not identify the underlying cause
Acceptance criteria¶
- AC1: Those CI jobs no longer run into the issue mentioned under observation or at least retry a reasonably amount of times or ignore the error for good
- AC2: The logs make it clear whether an error is fatal or has been ignored or when retries happened
Suggestions¶
Rollback steps¶
Files
I asked in #help-obs
.
Looking at the qem-bot code, the API request for 304153 and others must have returned 404, although they are older and must have existed already.
- Subject changed from qem-bot approve incidents failed in gitlab CI, reason unkown to qem-bot approve incidents failed in gitlab CI, reason unkown size:M
- Description updated (diff)
- Status changed from New to Workable
- Description updated (diff)
- Priority changed from Normal to Urgent
- Status changed from Workable to In Progress
- Assignee set to livdywan
Ack. I'm starting by improving the error handling and then see which way to go
Looking at the docs of POST /request/id, it's possible that they changed the 403 to a 404, so it could be really just a "request is not in review state" which means it's likely already approved.
edit: but if I try it out for https://build.suse.de/request/show/304153 , I get a "403 The request is neither in state review nor new", so maybe that's just wrong documentation.
Looking at the docs of POST /request/id, it's possible that they changed the 403 to a 404
The "Not found" example in the docs says Couldn't find request with id '120'
, though?
We can at least get the headers from the HTTPError. Interestingly the unit test for 404 is using the wrong strings but it's expecting it to fail. So I'm for now assuming that is the behavior that was desired:
https://github.com/openSUSE/qem-bot/pull/129
Getting the response body would be even better I guess, but not sure how to get that.
livdywan wrote:
Looking at the docs of POST /request/id, it's possible that they changed the 403 to a 404
The "Not found" example in the docs says Couldn't find request with id '120'
, though?
Are you looking at the actual documentation for POST /request/id? It looks like what you are quoting is for the GET request.
tinita wrote:
livdywan wrote:
Looking at the docs of POST /request/id, it's possible that they changed the 403 to a 404
The "Not found" example in the docs says Couldn't find request with id '120'
, though?
Are you looking at the actual documentation for POST /request/id? It looks like what you are quoting is for the GET request.
Yes. You can select the example in the combo box. GET only has one possible outcome.
For GET /request/id
we see this documentation:
404
Not Found
Media type

application/xml; charset=utf-8
Example Value
Schema
<?xml version="1.0" encoding="UTF-8"?>
<status code="not_found">
<summary>Couldn't find request with id '5'</summary>
</status>
For POST /request/id
we see this documentation:
404
Not Found
Media type

application/xml; charset=utf-8
Examples

Request Not Modifiable
Example Value
Schema
<?xml version="1.0" encoding="UTF-8"?>
<status code="request_not_modifiable">
<summary>request is not in review state</summary>
</status>
And since in qem-bot we are doing a POST request, this should be the relevant one.
When I try it for 304153 out I get:
# Headers
status: 403 Forbidden
x-opensuse-errorcode: review_change_state_no_permission
...
# Body
<status code="review_change_state_no_permission">
<summary>The request is neither in state review nor new</summary>
</status>
So I'm assuming we're fine, because the request was accepted before, it's just that the documentation talks about a 404, and apparently that's what we're getting, but I get a 403 with a similar error, so to be sure we could just log the body for now.
- Due date set to 2023-08-19
Setting due date based on mean cycle time of SUSE QE Tools
- Status changed from In Progress to Feedback
livdywan wrote:
https://github.com/openSUSE/qem-bot/pull/129
Getting the response body would be even better I guess, but not sure how to get that.
FYI we do log the body now, and from here on we can hopefully disambiguate the errors we're getting
- Priority changed from Urgent to High
2023-08-04 18:15:24 ERROR Received error 401, reason: 'Unauthorized' for Request 304387 - problem on OBS side
The most recent ones from 4 days ago look like so. Otherwise no failures at all right now. Maybe it's fair to say it's High but not Urgent. Unfortunately we don't know what changed in the meantime.
- Status changed from Feedback to Resolved
We can probably resolve this. There's a minimum feasible improvement which should help us if this or a similar issue happens again.
Also available in: Atom
PDF