action #152470
closedopenqa-service fetch_openqa_bugs "requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='bugzilla.suse.com', port=443)"
0%
Description
Observation¶
Daily email with subject
"Cron root@openqa-service (date; fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log"
and content
Exception occured while fetching bsc#1212271
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn
conn.connect()
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 309, in connect
conn = self._new_conn()
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 167, in _new_conn
% (self.host, self.timeout),
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7f70ef00c438>, 'Connection to bugzilla.suse.com timed out. (connect timeout=10)')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='bugzilla.suse.com', port=443): Max retries exceeded with url: /rest/bug/1212271?api_key=XXX (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f70ef00c438>, 'Connection to bugzilla.suse.com timed out. (connect timeout=10)'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/bin/fetch_openqa_bugs", line 62, in <module>
raise e
File "/usr/bin/fetch_openqa_bugs", line 48, in <module>
issue = issue_fetcher.get_issue(bugid)
File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/__init__.py", line 88, in get_issue
return self.prefix_table[prefix](self.conf, bugid)
File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/__init__.py", line 24, in __init__
self.fetch(conf)
File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/bugzilla_issue.py", line 27, in fetch
req = rest_get_bug(issue_id)
File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/bugzilla_issue.py", line 25, in rest_get_bug
return requests.get(url, params=get_params, timeout=10)
File "/usr/lib/python3.6/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 532, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 504, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='bugzilla.suse.com', port=443): Max retries exceeded with url: /rest/bug/1212271?api_key=XXX (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f70ef00c438>, 'Connection to bugzilla.suse.com timed out. (connect timeout=10)'))
first occurence seems to be since 2023-12-07 04:20. Could this be related to changes in the network infrastructure?
Steps to reproduce¶
ssh root@openqa-service.suse.de
and execute commands from /etc/crontab
Suggestions¶
- Reproduce the error, fix it
- Create new bugzilla API key
Updated by nicksinger over 1 year ago
- Status changed from New to In Progress
- Assignee set to nicksinger
Updated by nicksinger over 1 year ago
- Priority changed from High to Normal
could not reproduce manually, also the last entry in /tmp/fetch_openqa_bugs_osd.log
doesn't show the issue. Therefore this seems to be sporadic and depending on external factors (e.g. a nightly downtime of some network component in between). Given the sporadic nature I will set the priority lower because eventually the script will be executed again and no information is at risk to be lost. Not sure how a proper fix would look like. Most likely some later retry but I have to think about where (in the script itself, in the cronjob by e.g. using our "retry command", by executing the cronjob just more often).
Updated by openqa_review over 1 year ago
- Due date set to 2023-12-27
Setting due date based on mean cycle time of SUSE QE Tools
Updated by nicksinger over 1 year ago
- Status changed from In Progress to Feedback
Updated by nicksinger over 1 year ago
- Status changed from Feedback to Resolved
Package was updated already and changes are now present on the host itself:
openqa-service:/usr/lib/python3.6/site-packages/openqa_bugfetcher # grep -ri "timeout="
issues/github_issue.py: req = requests.get(url, auth=auth, timeout=60)
issues/jira_issue.py: req = requests.get(url, auth=(cred["user"], cred["pass"]), timeout=60)
issues/progress_issue.py: req = requests.get(url, headers={"X-Redmine-API-Key": conf["progress"]["api_key"]}, timeout=60)
issues/bugzilla_issue.py: return requests.get(url, params=get_params, timeout=60)
issues/bugzilla_issue.py: req = requests.get(url, timeout=60)
issues/__init__.py: return requests.get(url, params=get_params, timeout=60)
if it happens again we have to think of a more sophisticated solution. As reference for future ideas; I looked into https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html#module-urllib3.util.retry which can be used as HTTPAdapter for python-requests but it requires a major restructuring of the code and I wasn't able to confirm that it actually catches and retries connection timeouts (I think it doesn't but the docs state otherwise).
Updated by nicksinger about 1 year ago
- Status changed from Resolved to New
- Assignee deleted (
nicksinger)
We see problems with the bugfetcher script again. I can manually reproduce this on my machine and also by executing the script on openqa-services itself. It seems that redmine/progress is really slow and regular browser-connections time out after 45 seconds. Curl seems to be even quicker and our script only waits 60s. Not sure how to approach this now.
Updated by okurz about 1 year ago
- Related to tickets #133532: Update to Redmine 5 added
Updated by livdywan about 1 year ago
- Related to action #154546: Cron fetch_openqa_bugs refused or timed out trying to fetch individual tickets added
Updated by okurz about 1 year ago
- Due date deleted (
2023-12-27)
I am quite sure this is related to recent work on the redmine instance related to #133532. So what we can do is report the problem of performance which I can now easily reproduce manually and wait for that to be resolved. In the meantime we could try with much longer retry and waiting periods or a partial shutdown of services to mitigate.
Updated by okurz about 1 year ago
- Status changed from New to Resolved
- Assignee set to nicksinger
I have conducted the cron scripts on openqa-service
*/10 * * * * root (date;fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log
1 */10 * * * root (date;fetch_openqa_bugs /etc/openqa/bugfetcher_o3.conf) > /tmp/fetch_openqa_bugs_o3.log
and they were quickly executing so I assume that changes to progress.o.o fixed that. Setting ticket back to previous status.