Project

General

Profile

Actions

action #73633

closed

OSD partially unresponsive, triggering 500 responses, spotty response visible in monitoring panels but no alert triggered (yet)

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2020-10-20
Due date:
2020-11-17
% Done:

0%

Estimated time:

Description

Observation

https://monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1603190156643&to=1603196975018
shows that at around 2020-10-20 12:39 the HTTP response time from osd increased and users reported spotty connection and 500 responses "unresponsive" during that time, e.g. in https://chat.suse.de/channel/testing?msg=aix9KNXwkWowTd7FA . The spotty response is visible in our monitoring panels we no alert triggered so far in grafana because we do not want the unspecific "No Data" alerts.

Cause, solution and test


Files

ip6tables-save.firewalld.txt (5.98 KB) ip6tables-save.firewalld.txt Dump without default route over v6 nicksinger, 2020-11-04 15:00
ip6tables-save.susefirewall.txt (3.73 KB) ip6tables-save.susefirewall.txt Dump with default route over v6 nicksinger, 2020-11-04 15:00

Related issues 7 (0 open7 closed)

Related to openQA Infrastructure - action #75016: [osd-admins][alert] Failed systemd services alert (workers): os-autoinst-openvswitch.service (and var-lib-openqa-share.mount) on openqaworker-arm-2 and othersResolvedmkittler2020-10-21

Actions
Related to openQA Infrastructure - action #75055: grenache-1 can't connect to webui's over IPv4 onlyResolvednicksinger2020-10-22

Actions
Related to openQA Infrastructure - action #76828: big job queue for ppc as powerqaworker-qam-1.qa and malbec.arch and qa-power8-5-kvm were not activeResolvedokurz2020-10-31

Actions
Related to openQA Infrastructure - action #68095: Migrate osd workers from SuSEfirewall2 to firewalldResolvedmkittler2020-06-15

Actions
Related to openQA Infrastructure - action #80128: openqaworker-arm-2 fails to download from openqaResolvednicksinger2020-11-21

Actions
Has duplicate openQA Infrastructure - action #77995: worker instances on grenache-1 seem to fail (sometimes?) to connect to web-uis Rejected2020-11-16

Actions
Copied to openQA Infrastructure - action #78127: follow-up to #73633 - lessons learned and suggestionsResolvedokurz

Actions
Actions

Also available in: Atom PDF