Actions
action #160598
closed[alert] s390zl12: CPU load alert openQA s390zl12 salt cpu_load_alert_s390zl12 worker size:S
Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
Could be related to #158170? Did we allow to much instances?
Summary
System Load too high for a longer time, see https://progress.opensuse.org/issues/150983
Description
System Load is considered too high for a longer time. Machine possibly overloaded. Especially when there are too many openQA worker instances configured openQA tests would become flaky and showing lost characters or repeated characters in VNC typing.
Take a look which processes make the machine busy and look for corresponding openQA tests failing due to this situation and handle accordingly, e.g. retrigger the openQA tests after mitigating the root cause.
See
https://progress.opensuse.org/issues/150983
for details.
Values
B=79.57516129032257 C=1
Labels
alertname s390zl12: CPU load alert
grafana_folder openQA
host s390zl12
hostname s390zl12
origin salt
rule_uid cpu_load_alert_s390zl12
type worker
The issue is resolved at this moment, so no rollback steps needed and normal priority for now.
Suggestions¶
- Consider reducing the worker slots
- Check that the alert threshold is good, or adjust it
- Take a look at the logs from the timeframe of the alert firing
Actions