Project

General

Profile

Actions

action #94312

closed

[Alerting] web UI: Too many Minion job failures alert - likely due to openqa-client declared deprecated

Added by okurz almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2021-06-21
Due date:
2021-07-06
% Done:

0%

Estimated time:

Description

Observation

Alert email received on 2021-06-20 01:39Z. Details about alert on https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=19&orgId=1

Too many Minion jobs have failed on openqa.suse.de Review the failed jobs on https://openqa.suse.de/minion/jobs?state=failed and create a ticket if there's not already one and the failed jobs aren't just a symptom of a bigger problem (e.g. database outage). After investigation remove the failed jobs (possibly keeping one instance of a failure kind around). For the general log of the Minion job queue, checkout journalctl -fu openqa-gru.service and /var/log/openqa_gru on openqa.suse.de.

Details from openqa-gru on osd:

-- Logs begin at Sun 2021-06-20 03:30:00 CEST, end at Mon 2021-06-21 10:29:02 CEST. --
Jun 21 05:46:46 openqa openqa-gru[1446]: WARNING: openqa-client is deprecated and planned to be removed in the future. Please use openqa-cli>
Jun 21 05:46:48 openqa openqa-gru[1446]: https://openqa.suse.de/tests/6297977 : Unknown issue, to be reviewed -> https://openqa.suse.de/test>
Jun 21 05:46:48 openqa openqa-gru[1446]: Likely the error is within this log excerpt, last lines before shutdown:
Jun 21 05:46:48 openqa openqa-gru[1446]: ---
Jun 21 05:46:48 openqa openqa-gru[1446]: [2021-06-21T05:46:17.919 CEST] [info] ::: backend::baseclass::die_handler: Backend process died, ba>
Jun 21 05:46:48 openqa openqa-gru[1446]:   Virtio terminal and svirt serial terminal do not support send_key. Use
Jun 21 05:46:48 openqa openqa-gru[1446]:   type_string (possibly with an ANSI/XTERM escape sequence), or switch to a
Jun 21 05:46:48 openqa openqa-gru[1446]:   console which sends key presses, not terminal codes.
Jun 21 05:46:48 openqa openqa-gru[1446]:    at /usr/lib/os-autoinst/consoles/serial_screen.pm line 68.
Jun 21 05:46:48 openqa openqa-gru[1446]:           consoles::serial_screen::send_key(consoles::serial_screen=HASH(0x56014f31dde0), HASH(0x56>
Jun 21 05:46:48 openqa openqa-gru[1446]:           backend::baseclass::bouncer(backend::qemu=HASH(0x560150b54ce0), "send_key", HASH(0x56014f>
Jun 21 05:46:48 openqa openqa-gru[1446]:           backend::baseclass::send_key(backend::qemu=HASH(0x560150b54ce0), HASH(0x56014f27f2c0)) ca>
Jun 21 05:46:48 openqa openqa-gru[1446]:           backend::baseclass::handle_command(backend::qemu=HASH(0x560150b54ce0), HASH(0x56014f2ebb5>
Jun 21 05:46:48 openqa openqa-gru[1446]:           backend::baseclass::check_socket(backend::qemu=HASH(0x560150b54ce0), IO::Handle=GLOB(0x56>
Jun 21 05:46:48 openqa openqa-gru[1446]:           backend::qemu::check_socket(backend::qemu=HASH(0x560150b54ce0), IO::Handle=GLOB(0x56014ef>
Jun 21 05:46:48 openqa openqa-gru[1446]:           eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 191
Jun 21 05:46:48 openqa openqa-gru[1446]: ---
Jun 21 05:46:48 openqa openqa-gru[1446]: 1 unknown issues to be reviewed:
Jun 21 05:46:48 openqa openqa-gru[1446]:  - https://openqa.suse.de/tests/6297977 backend died: Virtio terminal and svirt serial ter
Jun 21 05:47:25 openqa openqa-gru[1446]: WARNING: openqa-client is deprecated and planned to be removed in the future. Please use openqa-cli>
Jun 21 05:47:25 openqa openqa-gru[1446]: https://openqa.suse.de/tests/6299083 : Unknown issue, to be reviewed -> https://openqa.suse.de/test>
Jun 21 05:47:25 openqa openqa-gru[1446]: Likely the error is within this log excerpt, last lines before shutdown:
Jun 21 05:47:25 openqa openqa-gru[1446]: ---
Jun 21 05:47:25 openqa openqa-gru[1446]: [2.0K blob data]
Jun 21 05:47:25 openqa openqa-gru[1446]: [1.2K blob data]
Jun 21 05:47:25 openqa openqa-gru[1446]:   
Jun 21 05:47:25 openqa openqa-gru[1446]: [2021-06-21T05:47:22.783 CEST] [debug] git fetch: remote: Total 0 (delta 0), reused 0 (delta 0), pa>
Jun 21 05:47:25 openqa openqa-gru[1446]:   
Jun 21 05:47:25 openqa openqa-gru[1446]: Could not find '7fe5802d21307dc25430282daefba72e8342a49f' in complete history at /usr/lib/os-autoin>
Jun 21 05:47:25 openqa openqa-gru[1446]: ---

so likely two problems:

  • openqa-client used and should be openqa-cli instead
  • unable to find git hashes in history

Acceptance criteria

  • AC1: Look where openqa-client is used and replace by openqa-cli, likely in github.com/os-autoinst/scripts/
  • AC2: Ensure no out-of-the-ordinary minion alerts are still there
  • AC3: Alert fixed

Suggestions

  • Start by replacing uses of openqa-client in github.com/os-autoinst/scripts/ with openqa-cli

Rollback

Actions #1

Updated by okurz almost 3 years ago

  • Description updated (diff)
Actions #2

Updated by mkittler almost 3 years ago

  • Assignee set to mkittler
Actions #3

Updated by okurz almost 3 years ago

  • Status changed from Workable to In Progress

you created https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/509 which I merged. You are welcome to branch out all further ideas for improvement into other tickets if you don't plan to solve them right within this ticket, e.g. how to improve the dependency handling of scripts from https://github.com/os-autoinst/scripts

Actions #4

Updated by mkittler almost 3 years ago

PR for migrating to openqa-cli: https://github.com/os-autoinst/scripts/pull/81
SR for failures due to missing dependency on OSD: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/509


I've also cleaned up some other failures which have piled up. (We have other tickets for them.)

Actions #5

Updated by openqa_review almost 3 years ago

  • Due date set to 2021-07-06

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by mkittler almost 3 years ago

  • Status changed from In Progress to Resolved

PR and SR have been merged.

I retried the failing jobs and they pass now. I resumed the alert and it is green now.

Actions #7

Updated by okurz almost 3 years ago

ok, nice. What about the "unable to find git hashes in history"?

Actions

Also available in: Atom PDF