Project

General

Profile

action #113087

[qa-tools][qem-bot] malformed data in smelt incident causes smelt sync fail

Added by osukup about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2022-06-27
Due date:
% Done:

0%

Estimated time:

Description

https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1032830

ERROR: Expecting value: line 1 column 1 (char 0)
8889Traceback (most recent call last):
8890  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/loader/smelt.py", line 64, in get_incident
8891    inc_result = requests.get(SMELT, params={"query": query}, verify=False).json()
8892  File "/usr/lib/python3.6/site-packages/requests/models.py", line 898, in json
8893    return complexjson.loads(self.text, **kwargs)
8894  File "/usr/lib64/python3.6/site-packages/simplejson/__init__.py", line 525, in loads
8895    return _default_decoder.decode(s)
8896  File "/usr/lib64/python3.6/site-packages/simplejson/decoder.py", line 370, in decode
8897    obj, end = self.raw_decode(s)
8898  File "/usr/lib64/python3.6/site-packages/simplejson/decoder.py", line 400, in raw_decode
8899    return self.scan_once(s, idx=_w(s, idx).end())
8900simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

History

#1 Updated by okurz about 2 months ago

  • Tags set to reactive work
  • Priority changed from Normal to High
  • Target version set to Ready

As a training exercise would be great if osukup only helps others to resolve the problem :)

#2 Updated by okurz about 2 months ago

Not sure how to reproduce. Could this be a sporadic error? When I call ./bot-ng.py -c metadata -t dummy smelt-sync locally I don't see such errors. I see many lines like

INFO: Getting info about incident 24742 from SMELT
INFO: Getting info about incident 24743 from SMELT
INFO: Getting info about incident 24744 from SMELT
…
INFO: Getting info about incident 24748 from SMELT

#3 Updated by okurz about 2 months ago

  • Assignee set to okurz
  • Priority changed from High to Urgent

#4 Updated by okurz about 2 months ago

  • Status changed from New to In Progress

#5 Updated by osukup about 2 months ago

PR#47 isn't fix for issue + it probably caused problem in repohash code by changing way of processing connection error in requests.get .. fixed in PR#48

to reproduce ... use responses to mock request with broken json, basically write test for this scenario first

#6 Updated by openqa_review about 2 months ago

  • Due date set to 2022-07-16

Setting due date based on mean cycle time of SUSE QE Tools

#7 Updated by okurz about 1 month ago

  • Priority changed from Urgent to High

osukup wrote:

PR#47 isn't fix for issue + it probably caused problem in repohash code by changing way of processing connection error in requests.get .. fixed in PR#48

to reproduce ... use responses to mock request with broken json, basically write test for this scenario first

Yes, PR#47 was not providing a fix, only ensured error handling and retries. Thank you for providing the fix of the missing connection error exception. Not sure what you mean with your second sentence though.

#8 Updated by okurz about 1 month ago

https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1041351#L9260 shows a recent failure. Now we have much better details:

RROR: 503 Server Error: Service Unavailable for url: https://smelt.suse.de/graphql?query=%7Bincidents%28incidentId%3A+24853%29+%7B+edges+%7B+node+%7Bemu+project+repositories+%7B+edges+%7B+node+%7B+name+%7D+%7D+%7D+requestSet%28kind%3A+%22RR%22%29+%7B+edges+%7B+node+%7B+requestId+status+%7B+name+%7D+reviewSet+%7B+edges+%7B+node+%7B+assignedByGroup+%7B+name+%7D+status+%7B+name+%7D+%7D+%7D+%7D+%7D+%7D+%7D+packages+%7B+edges+%7B+node+%7B+name+%7D+%7D+%7D+%7D+%7D+%7D+%7D
Traceback (most recent call last):
  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/loader/smelt.py", line 32, in get_json
    return requests.get(SMELT, params={"query": query}, verify=False).json()
  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/requests.py", line 18, in get
    return s.get(url, **kwargs)
  File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 650, in send
    r = dispatch_hook('response', hooks, r, **kwargs)
  File "/usr/lib/python3.6/site-packages/requests/hooks.py", line 31, in dispatch_hook
    _hook_data = hook(hook_data, **kwargs)
  File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/requests.py", line 12, in <lambda>
    lambda response, *args, **kwargs: response.raise_for_status()
  File "/usr/lib/python3.6/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)

ok so if smelt is just unresponsive then we know what to do, just retry harder and more often :)

#10 Updated by okurz about 1 month ago

  • Status changed from In Progress to Feedback

https://github.com/openSUSE/qem-bot/pull/49 merged, monitoring over the next days

#11 Updated by okurz about 1 month ago

  • Due date deleted (2022-07-16)
  • Status changed from Feedback to Resolved

No more problems for now, should be good.

Also available in: Atom PDF