Project

General

Profile

action #94312

Updated by okurz almost 3 years ago

## Observation 

 Alert email received on 2021-06-20 01:39Z. Details about alert on https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=19&orgId=1 

 Too many Minion jobs have failed on openqa.suse.de Review the failed jobs on https://openqa.suse.de/minion/jobs?state=failed and create a ticket if there's not already one and the failed jobs aren't just a symptom of a bigger problem (e.g. database outage). After investigation remove the failed jobs (possibly keeping one instance of a failure kind around). For the general log of the Minion job queue, checkout `journalctl -fu openqa-gru.service` and `/var/log/openqa_gru` on openqa.suse.de. 

 Details from openqa-gru on osd: 

 ``` 
 -- Logs begin at Sun 2021-06-20 03:30:00 CEST, end at Mon 2021-06-21 10:29:02 CEST. -- 
 Jun 21 05:46:46 openqa openqa-gru[1446]: WARNING: openqa-client is deprecated and planned to be removed in the future. Please use openqa-cli> 
 Jun 21 05:46:48 openqa openqa-gru[1446]: https://openqa.suse.de/tests/6297977 : Unknown issue, to be reviewed -> https://openqa.suse.de/test> 
 Jun 21 05:46:48 openqa openqa-gru[1446]: Likely the error is within this log excerpt, last lines before shutdown: 
 Jun 21 05:46:48 openqa openqa-gru[1446]: --- 
 Jun 21 05:46:48 openqa openqa-gru[1446]: [2021-06-21T05:46:17.919 CEST] [info] ::: backend::baseclass::die_handler: Backend process died, ba> 
 Jun 21 05:46:48 openqa openqa-gru[1446]:     Virtio terminal and svirt serial terminal do not support send_key. Use 
 Jun 21 05:46:48 openqa openqa-gru[1446]:     type_string (possibly with an ANSI/XTERM escape sequence), or switch to a 
 Jun 21 05:46:48 openqa openqa-gru[1446]:     console which sends key presses, not terminal codes. 
 Jun 21 05:46:48 openqa openqa-gru[1446]:      at /usr/lib/os-autoinst/consoles/serial_screen.pm line 68. 
 Jun 21 05:46:48 openqa openqa-gru[1446]:             consoles::serial_screen::send_key(consoles::serial_screen=HASH(0x56014f31dde0), HASH(0x56> 
 Jun 21 05:46:48 openqa openqa-gru[1446]:             backend::baseclass::bouncer(backend::qemu=HASH(0x560150b54ce0), "send_key", HASH(0x56014f> 
 Jun 21 05:46:48 openqa openqa-gru[1446]:             backend::baseclass::send_key(backend::qemu=HASH(0x560150b54ce0), HASH(0x56014f27f2c0)) ca> 
 Jun 21 05:46:48 openqa openqa-gru[1446]:             backend::baseclass::handle_command(backend::qemu=HASH(0x560150b54ce0), HASH(0x56014f2ebb5> 
 Jun 21 05:46:48 openqa openqa-gru[1446]:             backend::baseclass::check_socket(backend::qemu=HASH(0x560150b54ce0), IO::Handle=GLOB(0x56> 
 Jun 21 05:46:48 openqa openqa-gru[1446]:             backend::qemu::check_socket(backend::qemu=HASH(0x560150b54ce0), IO::Handle=GLOB(0x56014ef> 
 Jun 21 05:46:48 openqa openqa-gru[1446]:             eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 191 
 Jun 21 05:46:48 openqa openqa-gru[1446]: --- 
 Jun 21 05:46:48 openqa openqa-gru[1446]: 1 unknown issues to be reviewed: 
 Jun 21 05:46:48 openqa openqa-gru[1446]:    - https://openqa.suse.de/tests/6297977 backend died: Virtio terminal and svirt serial ter 
 Jun 21 05:47:25 openqa openqa-gru[1446]: WARNING: openqa-client is deprecated and planned to be removed in the future. Please use openqa-cli> 
 Jun 21 05:47:25 openqa openqa-gru[1446]: https://openqa.suse.de/tests/6299083 : Unknown issue, to be reviewed -> https://openqa.suse.de/test> 
 Jun 21 05:47:25 openqa openqa-gru[1446]: Likely the error is within this log excerpt, last lines before shutdown: 
 Jun 21 05:47:25 openqa openqa-gru[1446]: --- 
 Jun 21 05:47:25 openqa openqa-gru[1446]: [2.0K blob data] 
 Jun 21 05:47:25 openqa openqa-gru[1446]: [1.2K blob data] 
 Jun 21 05:47:25 openqa openqa-gru[1446]:    
 Jun 21 05:47:25 openqa openqa-gru[1446]: [2021-06-21T05:47:22.783 CEST] [debug] git fetch: remote: Total 0 (delta 0), reused 0 (delta 0), pa> 
 Jun 21 05:47:25 openqa openqa-gru[1446]:    
 Jun 21 05:47:25 openqa openqa-gru[1446]: Could not find '7fe5802d21307dc25430282daefba72e8342a49f' in complete history at /usr/lib/os-autoin> 
 Jun 21 05:47:25 openqa openqa-gru[1446]: --- 
 ``` 

 so likely two problems: 

 * openqa-client used and should be openqa-cli instead 
 * unable to find git hashes in history 

 ## Acceptance criteria 
 * **AC1:** Look where openqa-client is used and replace by openqa-cli, likely in github.com/os-autoinst/scripts/ 
 * **AC2:** Ensure no out-of-the-ordinary minion alerts are still there 
 * **AC3:** Alert fixed 

 ## Suggestions 
 * Start by replacing uses of openqa-client in github.com/os-autoinst/scripts/ with openqa-cli 

 ## Rollback 
 * Unpause alert "web UI: Too many Minion job failures alert" again on https://stats.openqa-monitor.qa.suse.de/alerting/list?state=not_ok

Back