action #94312
closed[Alerting] web UI: Too many Minion job failures alert - likely due to openqa-client declared deprecated
0%
Description
Observation¶
Alert email received on 2021-06-20 01:39Z. Details about alert on https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=19&orgId=1
Too many Minion jobs have failed on openqa.suse.de Review the failed jobs on https://openqa.suse.de/minion/jobs?state=failed and create a ticket if there's not already one and the failed jobs aren't just a symptom of a bigger problem (e.g. database outage). After investigation remove the failed jobs (possibly keeping one instance of a failure kind around). For the general log of the Minion job queue, checkout journalctl -fu openqa-gru.service
and /var/log/openqa_gru
on openqa.suse.de.
Details from openqa-gru on osd:
-- Logs begin at Sun 2021-06-20 03:30:00 CEST, end at Mon 2021-06-21 10:29:02 CEST. --
Jun 21 05:46:46 openqa openqa-gru[1446]: WARNING: openqa-client is deprecated and planned to be removed in the future. Please use openqa-cli>
Jun 21 05:46:48 openqa openqa-gru[1446]: https://openqa.suse.de/tests/6297977 : Unknown issue, to be reviewed -> https://openqa.suse.de/test>
Jun 21 05:46:48 openqa openqa-gru[1446]: Likely the error is within this log excerpt, last lines before shutdown:
Jun 21 05:46:48 openqa openqa-gru[1446]: ---
Jun 21 05:46:48 openqa openqa-gru[1446]: [2021-06-21T05:46:17.919 CEST] [info] ::: backend::baseclass::die_handler: Backend process died, ba>
Jun 21 05:46:48 openqa openqa-gru[1446]: Virtio terminal and svirt serial terminal do not support send_key. Use
Jun 21 05:46:48 openqa openqa-gru[1446]: type_string (possibly with an ANSI/XTERM escape sequence), or switch to a
Jun 21 05:46:48 openqa openqa-gru[1446]: console which sends key presses, not terminal codes.
Jun 21 05:46:48 openqa openqa-gru[1446]: at /usr/lib/os-autoinst/consoles/serial_screen.pm line 68.
Jun 21 05:46:48 openqa openqa-gru[1446]: consoles::serial_screen::send_key(consoles::serial_screen=HASH(0x56014f31dde0), HASH(0x56>
Jun 21 05:46:48 openqa openqa-gru[1446]: backend::baseclass::bouncer(backend::qemu=HASH(0x560150b54ce0), "send_key", HASH(0x56014f>
Jun 21 05:46:48 openqa openqa-gru[1446]: backend::baseclass::send_key(backend::qemu=HASH(0x560150b54ce0), HASH(0x56014f27f2c0)) ca>
Jun 21 05:46:48 openqa openqa-gru[1446]: backend::baseclass::handle_command(backend::qemu=HASH(0x560150b54ce0), HASH(0x56014f2ebb5>
Jun 21 05:46:48 openqa openqa-gru[1446]: backend::baseclass::check_socket(backend::qemu=HASH(0x560150b54ce0), IO::Handle=GLOB(0x56>
Jun 21 05:46:48 openqa openqa-gru[1446]: backend::qemu::check_socket(backend::qemu=HASH(0x560150b54ce0), IO::Handle=GLOB(0x56014ef>
Jun 21 05:46:48 openqa openqa-gru[1446]: eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 191
Jun 21 05:46:48 openqa openqa-gru[1446]: ---
Jun 21 05:46:48 openqa openqa-gru[1446]: 1 unknown issues to be reviewed:
Jun 21 05:46:48 openqa openqa-gru[1446]: - https://openqa.suse.de/tests/6297977 backend died: Virtio terminal and svirt serial ter
Jun 21 05:47:25 openqa openqa-gru[1446]: WARNING: openqa-client is deprecated and planned to be removed in the future. Please use openqa-cli>
Jun 21 05:47:25 openqa openqa-gru[1446]: https://openqa.suse.de/tests/6299083 : Unknown issue, to be reviewed -> https://openqa.suse.de/test>
Jun 21 05:47:25 openqa openqa-gru[1446]: Likely the error is within this log excerpt, last lines before shutdown:
Jun 21 05:47:25 openqa openqa-gru[1446]: ---
Jun 21 05:47:25 openqa openqa-gru[1446]: [2.0K blob data]
Jun 21 05:47:25 openqa openqa-gru[1446]: [1.2K blob data]
Jun 21 05:47:25 openqa openqa-gru[1446]:
Jun 21 05:47:25 openqa openqa-gru[1446]: [2021-06-21T05:47:22.783 CEST] [debug] git fetch: remote: Total 0 (delta 0), reused 0 (delta 0), pa>
Jun 21 05:47:25 openqa openqa-gru[1446]:
Jun 21 05:47:25 openqa openqa-gru[1446]: Could not find '7fe5802d21307dc25430282daefba72e8342a49f' in complete history at /usr/lib/os-autoin>
Jun 21 05:47:25 openqa openqa-gru[1446]: ---
so likely two problems:
- openqa-client used and should be openqa-cli instead
- unable to find git hashes in history
Acceptance criteria¶
- AC1: Look where openqa-client is used and replace by openqa-cli, likely in github.com/os-autoinst/scripts/
- AC2: Ensure no out-of-the-ordinary minion alerts are still there
- AC3: Alert fixed
Suggestions¶
- Start by replacing uses of openqa-client in github.com/os-autoinst/scripts/ with openqa-cli
Rollback¶
- Unpause alert "web UI: Too many Minion job failures alert" again on https://stats.openqa-monitor.qa.suse.de/alerting/list?state=not_ok
Updated by okurz over 3 years ago
- Status changed from Workable to In Progress
you created https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/509 which I merged. You are welcome to branch out all further ideas for improvement into other tickets if you don't plan to solve them right within this ticket, e.g. how to improve the dependency handling of scripts from https://github.com/os-autoinst/scripts
Updated by mkittler over 3 years ago
PR for migrating to openqa-cli: https://github.com/os-autoinst/scripts/pull/81
SR for failures due to missing dependency on OSD: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/509
I've also cleaned up some other failures which have piled up. (We have other tickets for them.)
Updated by openqa_review over 3 years ago
- Due date set to 2021-07-06
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler over 3 years ago
- Status changed from In Progress to Resolved
PR and SR have been merged.
I retried the failing jobs and they pass now. I resumed the alert and it is green now.
Updated by okurz over 3 years ago
ok, nice. What about the "unable to find git hashes in history"?