Project

General

Profile

Actions

action #152470

closed

openqa-service fetch_openqa_bugs "requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='bugzilla.suse.com', port=443)"

Added by okurz 11 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2023-12-12
Due date:
% Done:

0%

Estimated time:

Description

Observation

Daily email with subject
"Cron root@openqa-service (date; fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log"

and content

Exception occured while fetching bsc#1212271
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 84, in create_connection
    raise err
  File "/usr/lib/python3.6/site-packages/urllib3/util/connection.py", line 74, in create_connection
    sock.connect(sa)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
    chunked=chunked,
  File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 381, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 978, in _validate_conn
    conn.connect()
  File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 309, in connect
    conn = self._new_conn()
  File "/usr/lib/python3.6/site-packages/urllib3/connection.py", line 167, in _new_conn
    % (self.host, self.timeout),
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPSConnection object at 0x7f70ef00c438>, 'Connection to bugzilla.suse.com timed out. (connect timeout=10)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 727, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 439, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='bugzilla.suse.com', port=443): Max retries exceeded with url: /rest/bug/1212271?api_key=XXX (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f70ef00c438>, 'Connection to bugzilla.suse.com timed out. (connect timeout=10)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/fetch_openqa_bugs", line 62, in <module>
    raise e
  File "/usr/bin/fetch_openqa_bugs", line 48, in <module>
    issue = issue_fetcher.get_issue(bugid)
  File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/__init__.py", line 88, in get_issue
    return self.prefix_table[prefix](self.conf, bugid)
  File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/__init__.py", line 24, in __init__
    self.fetch(conf)
  File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/bugzilla_issue.py", line 27, in fetch
    req = rest_get_bug(issue_id)
  File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/bugzilla_issue.py", line 25, in rest_get_bug
    return requests.get(url, params=get_params, timeout=10)
  File "/usr/lib/python3.6/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3.6/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 532, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3.6/site-packages/requests/sessions.py", line 645, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3.6/site-packages/requests/adapters.py", line 504, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='bugzilla.suse.com', port=443): Max retries exceeded with url: /rest/bug/1212271?api_key=XXX (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f70ef00c438>, 'Connection to bugzilla.suse.com timed out. (connect timeout=10)'))

first occurence seems to be since 2023-12-07 04:20. Could this be related to changes in the network infrastructure?

Steps to reproduce

ssh root@openqa-service.suse.de and execute commands from /etc/crontab

Suggestions

  • Reproduce the error, fix it
  • Create new bugzilla API key

Related issues 2 (0 open2 closed)

Related to openSUSE admin - tickets #133532: Update to Redmine 5Resolvedcrameleon2023-07-29

Actions
Related to openQA Infrastructure - action #154546: Cron fetch_openqa_bugs refused or timed out trying to fetch individual ticketsResolvedokurz2023-10-23

Actions
Actions #1

Updated by okurz 11 months ago

  • Description updated (diff)
Actions #2

Updated by nicksinger 11 months ago

  • Status changed from New to In Progress
  • Assignee set to nicksinger
Actions #3

Updated by nicksinger 11 months ago

  • Priority changed from High to Normal

could not reproduce manually, also the last entry in /tmp/fetch_openqa_bugs_osd.log doesn't show the issue. Therefore this seems to be sporadic and depending on external factors (e.g. a nightly downtime of some network component in between). Given the sporadic nature I will set the priority lower because eventually the script will be executed again and no information is at risk to be lost. Not sure how a proper fix would look like. Most likely some later retry but I have to think about where (in the script itself, in the cronjob by e.g. using our "retry command", by executing the cronjob just more often).

Actions #4

Updated by openqa_review 11 months ago

  • Due date set to 2023-12-27

Setting due date based on mean cycle time of SUSE QE Tools

Actions #5

Updated by nicksinger 11 months ago

  • Status changed from In Progress to Feedback
Actions #6

Updated by nicksinger 11 months ago

  • Status changed from Feedback to Resolved

Package was updated already and changes are now present on the host itself:

openqa-service:/usr/lib/python3.6/site-packages/openqa_bugfetcher # grep -ri "timeout="
issues/github_issue.py:            req = requests.get(url, auth=auth, timeout=60)
issues/jira_issue.py:        req = requests.get(url, auth=(cred["user"], cred["pass"]), timeout=60)
issues/progress_issue.py:        req = requests.get(url, headers={"X-Redmine-API-Key": conf["progress"]["api_key"]}, timeout=60)
issues/bugzilla_issue.py:                return requests.get(url, params=get_params, timeout=60)
issues/bugzilla_issue.py:            req = requests.get(url, timeout=60)
issues/__init__.py:            return requests.get(url, params=get_params, timeout=60)

if it happens again we have to think of a more sophisticated solution. As reference for future ideas; I looked into https://urllib3.readthedocs.io/en/latest/reference/urllib3.util.html#module-urllib3.util.retry which can be used as HTTPAdapter for python-requests but it requires a major restructuring of the code and I wasn't able to confirm that it actually catches and retries connection timeouts (I think it doesn't but the docs state otherwise).

Actions #7

Updated by nicksinger 10 months ago

  • Status changed from Resolved to New
  • Assignee deleted (nicksinger)

We see problems with the bugfetcher script again. I can manually reproduce this on my machine and also by executing the script on openqa-services itself. It seems that redmine/progress is really slow and regular browser-connections time out after 45 seconds. Curl seems to be even quicker and our script only waits 60s. Not sure how to approach this now.

Actions #8

Updated by okurz 10 months ago

Actions #9

Updated by livdywan 10 months ago

  • Related to action #154546: Cron fetch_openqa_bugs refused or timed out trying to fetch individual tickets added
Actions #10

Updated by okurz 10 months ago

  • Due date deleted (2023-12-27)

I am quite sure this is related to recent work on the redmine instance related to #133532. So what we can do is report the problem of performance which I can now easily reproduce manually and wait for that to be resolved. In the meantime we could try with much longer retry and waiting periods or a partial shutdown of services to mitigate.

Actions #11

Updated by okurz 10 months ago

  • Status changed from New to Resolved
  • Assignee set to nicksinger

I have conducted the cron scripts on openqa-service

*/10     *       *       *       *       root  (date;fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log
1        */10    *       *       *       root  (date;fetch_openqa_bugs /etc/openqa/bugfetcher_o3.conf) > /tmp/fetch_openqa_bugs_o3.log

and they were quickly executing so I assume that changes to progress.o.o fixed that. Setting ticket back to previous status.

Actions

Also available in: Atom PDF