Project

General

Profile

Actions

action #126212

closed

openqa.suse.de response times very slow. No alert fired size:M

Added by nicksinger almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2023-03-20
Due date:
% Done:

0%

Estimated time:

Description

Observation

Tina observed very slow responses from the OSD webui at 10:33 CET. Shortly after we got asked in #eng-testing.
The higher load can be well seen in grafana too: https://stats.openqa-monitor.qa.suse.de/d/Webuinew/webui-summary-new?orgId=1&from=1679293281205&to=1679306017314
We received no apache response time alerts as far as I can tell.

Acceptance Criteria

  • AC1: It is known that our alert thresholds are sensible

Suggestions

  • Check what caused the high load e.g. by analyzing the apache log in /var/log/apache2
  • Remediate the offender (e.g. fixing a script, blocking an IP, etc)
  • Check why the apache response time alert was not firing and check if something needs to be fixed
    • Apache Response Time should have fired?
    • Maybe the alert was too relaxed and didn't trigger "yet"?
    • Should be 10s but even the index page w/o additional ajax took longer? We don't have numbers, though?

Related issues 5 (1 open4 closed)

Related to openQA Project (public) - action #124649: Spotty responses from OSD 2023-02-15Resolvedlivdywan2023-02-15

Actions
Related to openQA Infrastructure (public) - action #116722: openqa.suse.de is not reachable 2022-09-18, no ping response, postgreSQL OOM and kernel panics size:MResolvedmkittler2022-09-18

Actions
Related to openQA Project (public) - action #112859: Conduct Five Whys for "[alert][osd] openqa.suse.de is not reachable anymore, response times > 30s, multiple alerts over the weekend"Resolvedokurz2022-06-22

Actions
Related to openQA Project (public) - coordination #112961: [epic] Followup to "openqa.suse.de is not reachable anymore, response times > 30s, multiple alerts over the weekend"New2022-06-22

Actions
Related to openQA Infrastructure (public) - action #127052: [alert] Apache Response Time alert followed by DatasourceNoData for min_apache_response size:MResolvedokurz2023-04-01

Actions
Actions

Also available in: Atom PDF