action #152470
closedopenqa-service fetch_openqa_bugs "requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='bugzilla.suse.com', port=443)"
0%
Description
Observation¶
Daily email with subject
"Cron root@openqa-service (date; fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log"
and content
Exception occured while fetching bsc#1212271
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
chunked=chunked,
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn
conn.connect()
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 309, in connect
conn = self._new_conn()
File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 167, in _new_conn
% (self.host, self.timeout),
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7f70ef00c438>, 'Connection to bugzilla.suse.com timed out. (connect timeout=10)')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 727, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='bugzilla.suse.com', port=443): Max retries exceeded with url: /rest/bug/1212271?api_key=XXX (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f70ef00c438>, 'Connection to bugzilla.suse.com timed out. (connect timeout=10)'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/bin/fetch_openqa_bugs", line 62, in <module>
raise e
File "/usr/bin/fetch_openqa_bugs", line 48, in <module>
issue = issue_fetcher.get_issue(bugid)
File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/__init__.py", line 88, in get_issue
return self.prefix_table[prefix](self.conf, bugid)
File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/__init__.py", line 24, in __init__
self.fetch(conf)
File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/bugzilla_issue.py", line 27, in fetch
req = rest_get_bug(issue_id)
File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/bugzilla_issue.py", line 25, in rest_get_bug
return requests.get(url, params=get_params, timeout=10)
File "/usr/lib/python3.6/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 532, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 504, in send
raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='bugzilla.suse.com', port=443): Max retries exceeded with url: /rest/bug/1212271?api_key=XXX (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f70ef00c438>, 'Connection to bugzilla.suse.com timed out. (connect timeout=10)'))
first occurence seems to be since 2023-12-07 04:20. Could this be related to changes in the network infrastructure?
Steps to reproduce¶
ssh root@openqa-service.suse.de
and execute commands from /etc/crontab
Suggestions¶
- Reproduce the error, fix it
- Create new bugzilla API key
Updated by nicksinger about 1 year ago
- Status changed from New to In Progress
- Assignee set to nicksinger
Updated by nicksinger about 1 year ago
- Priority changed from High to Normal
could not reproduce manually, also the last entry in /tmp/fetch_openqa_bugs_osd.log
doesn't show the issue. Therefore this seems to be sporadic and depending on external factors (e.g. a nightly downtime of some network component in between). Given the sporadic nature I will set the priority lower because eventually the script will be executed again and no information is at risk to be lost. Not sure how a proper fix would look like. Most likely some later retry but I have to think about where (in the script itself, in the cronjob by e.g. using our "retry command", by executing the cronjob just more often).
Updated by openqa_review about 1 year ago
- Due date set to 2023-12-27
Setting due date based on mean cycle time of SUSE QE Tools
Updated by nicksinger about 1 year ago
- Status changed from In Progress to Feedback
Updated by nicksinger about 1 year ago
- Status changed from Feedback to Resolved
Package was updated already and changes are now present on the host itself:
openqa-service:/usr/lib/python3.6/site-packages/openqa_bugfetcher # grep -ri "timeout="
issues/github_issue.py: req = requests.get(url, auth=auth, timeout=60)
issues/jira_issue.py: req = requests.get(url, auth=(cred["user"], cred["pass"]), timeout=60)
issues/progress_issue.py: req = requests.get(url, headers={"X-Redmine-API-Key": conf["progress"]["api_key"]}, timeout=60)
issues/bugzilla_issue.py: return requests.get(url, params=get_params, timeout=60)
issues/bugzilla_issue.py: req = requests.get(url, timeout=60)
issues/__init__.py: return requests.get(url, params=get_params, timeout=60)
if it happens again we have to think of a more sophisticated solution. As reference for future ideas; I looked into https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html#module-urllib3.util.retry which can be used as HTTPAdapter for python-requests but it requires a major restructuring of the code and I wasn't able to confirm that it actually catches and retries connection timeouts (I think it doesn't but the docs state otherwise).
Updated by nicksinger 11 months ago
- Status changed from Resolved to New
- Assignee deleted (
nicksinger)
We see problems with the bugfetcher script again. I can manually reproduce this on my machine and also by executing the script on openqa-services itself. It seems that redmine/progress is really slow and regular browser-connections time out after 45 seconds. Curl seems to be even quicker and our script only waits 60s. Not sure how to approach this now.
Updated by okurz 11 months ago
- Related to tickets #133532: Update to Redmine 5 added
Updated by livdywan 11 months ago
- Related to action #154546: Cron fetch_openqa_bugs refused or timed out trying to fetch individual tickets added
Updated by okurz 11 months ago
- Due date deleted (
2023-12-27)
I am quite sure this is related to recent work on the redmine instance related to #133532. So what we can do is report the problem of performance which I can now easily reproduce manually and wait for that to be resolved. In the meantime we could try with much longer retry and waiting periods or a partial shutdown of services to mitigate.
Updated by okurz 11 months ago
- Status changed from New to Resolved
- Assignee set to nicksinger
I have conducted the cron scripts on openqa-service
*/10 * * * * root (date;fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log
1 */10 * * * root (date;fetch_openqa_bugs /etc/openqa/bugfetcher_o3.conf) > /tmp/fetch_openqa_bugs_o3.log
and they were quickly executing so I assume that changes to progress.o.o fixed that. Setting ticket back to previous status.