Project

General

Profile

Actions

action #95299

closed

Tests timeout with reason 'setup exceeded MAX_SETUP_TIME' on osd ppc64le workers auto_review:"Result: timeout":retry size:M

Added by dzedro over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-07-09
Due date:
% Done:

0%

Estimated time:

Description

Observation

I don't know why this ppc64le jobs timed out, cache service ?
Should the regex be more specific or can it be generic for any timeout ?

https://openqa.suse.de/tests/6393939
https://openqa.suse.de/tests/6393941
https://openqa.suse.de/tests/6393945

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#95299

ssh osd "sudo -u geekotest psql --command=\"select jobs.id,result_dir,t_finished,host,instance from jobs join workers on jobs.assigned_worker_id=workers.id where reason ~ 'timeout: setup exceeded' order by t_finished;\" openqa"

Expected result

A log https://openqa.suse.de/tests/6416847/logfile?filename=autoinst-log.txt shows how it should look:

[2021-07-12T10:36:56.0008 CEST] [info] [pid:64283] Download of SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso processed:
[info] [#106833]
Cache size of "/var/lib/openqa/cache" is 44GiB, with limit 50GiB
[info] [#106833]
Downloading "SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" from "http://openqa.suse.de/tests/6416847/asset/iso/SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso"
[info] [#106833]
Content of "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" has not changed, updating last use

[2021-07-12T10:36:56.0100 CEST] [info] [pid:64283] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #106839 sent to Cache Service
[2021-07-12T10:37:01.0168 CEST] [info] [pid:64283] Output of rsync:
[info] [#106839] Calling: rsync -avHP rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/
receiving incremental file list

sent 1,992 bytes  received 2,581,652 bytes  1,722,429.33 bytes/sec
total size is 12,978,495,757  speedup is 5,023.33

[2021-07-12T10:37:01.0168 CEST] [info] [pid:64283] Finished to rsync tests
[2021-07-12T10:37:01.0172 CEST] [debug] [pid:70271] +++ worker notes +++

where one can see the output from the rsync call.

Problem

Problem seems to be specific to ppc64le workers, maybe only "malbec" now. Maybe specific to test syncing. The cacheservice-minion logs do not mention the test rsync request at all.

Suggestions

  • Start with calling git grep on the source code of openQA for the log messages that we see in the jobs mentioned above (or see comments below), to see which debug messages should be expected before the timeout
  • If you identify that there could be helpful log messages, e.g. to be able to distinguish if a request was received by the minion service or not, add it

Related issues 3 (1 open2 closed)

Related to openQA Project - coordination #96185: [epic] Multimachine failure rate increasedResolvedokurz2021-07-29

Actions
Copied to openQA Project - action #96254: Tests timeout with MAX_SETUP_TIME - Add an alert if there is any non-restarted job exceeding max_setup_timeNew

Actions
Copied to openQA Infrastructure - action #96257: Tests timeout with MAX_SETUP_TIME on osd - Apply a higher MAX_SETUP_TIME applicable for the *complete* OSD infrastructure, e.g. add to every worker config entryResolvedmkittler

Actions
Actions #1

Updated by okurz over 3 years ago

  • Project changed from openQA Tests to openQA Project
  • Subject changed from Test timeout auto_review:"Result: timeout":retry to Tests timeout with reason 'setup exceeded MAX_SETUP_TIME' on osd ppc64le workers auto_review:"Result: timeout":retry
  • Description updated (diff)
  • Category set to Regressions/Crashes
  • Priority changed from Normal to High
  • Target version set to Ready

https://github.com/os-autoinst/scripts/#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger has an example "steps to reproduce" that can be added to the ticket. Because if auto-review matches this ticket to jobs we can also search for jobs that match the ticket and get good statistics.

dzedro wrote:

I don't know why this ppc64le jobs timed out, cache service ?
Should the regex be more specific or can it be generic for any timeout ?

If it would work then I consider the regex a bit too generic as that means that effectively every job that reproducibly times out would be stuck in a loop, constantly being retriggered.
However, "timeout_exceeded" is a specific result and so far OSD is configured only to trigger auto-review on "incomplete" and "failed" jobs, see https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/server.sls#L69
You could easily extend that by a line like

          job_done_hook_timeout_exceeded: env host=openqa.suse.de /opt/os-autoinst-scripts/openqa-label-known-issues-hook

or set as described in
https://github.com/os-autoinst/openQA/blob/master/etc/openqa/openqa.ini#L76

          auto_clone_regex: '^(cache failure|terminated prematurely|timeout):'

but I don't know if that actually works for "timeout_exceeded".

Now to the real issue:

I looked in the database and found:

openqa=> select jobs.id,result_dir,t_finished,host,instance from jobs join workers on jobs.assigned_worker_id=workers.id where reason ~ 'timeout: setup exceeded' order by t_finished;
   id    |                                                                                                        result_dir                                                                                                         |     t_finished      |    host    | instance 
---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+------------+----------
 6296933 | 06296933-sle-15-SP1-Server-DVD-Updates-s390x-sle_image_on_sle_host:investigate:last_good_tests_and_build:70109c8d5237d35ada19cb80202f02e8dde380ae+20210615-1@s390x-kvm-sle12                                              | 2021-06-20 03:38:59 | grenache-1 |       14
 6296810 | 06296810-sle-15-Server-DVD-Updates-s390x-engines_and_tools:investigate:last_good_tests_and_build:1f2c0d54fa6bde064609447df532c9eada86d16d+20210619-1@s390x-kvm-sle12                                                      | 2021-06-20 03:39:56 | grenache-1 |       45
 6296934 | 06296934-sle-15-SP2-Server-DVD-Updates-s390x-sle_image_on_sle_host:investigate:retry@s390x-kvm-sle12                                                                                                                      | 2021-06-20 03:39:56 | grenache-1 |       15
 6296932 | 06296932-sle-15-SP1-Server-DVD-Updates-s390x-sle_image_on_sle_host:investigate:last_good_build:20210615-1@s390x-kvm-sle12                                                                                                 | 2021-06-20 03:41:00 | grenache-1 |       49
 6296938 | 06296938-sle-12-SP4-Server-DVD-Updates-s390x-engines_and_tools:investigate:retry@s390x-kvm-sle12                                                                                                                          | 2021-06-20 03:41:00 | grenache-1 |       34
 6296930 | 06296930-sle-15-SP1-Server-DVD-Updates-s390x-sle_image_on_sle_host:investigate:retry@s390x-kvm-sle12                                                                                                                      | 2021-06-20 03:41:00 | grenache-1 |       47
 6296931 | 06296931-sle-15-SP1-Server-DVD-Updates-s390x-sle_image_on_sle_host:investigate:last_good_tests:70109c8d5237d35ada19cb80202f02e8dde380ae@s390x-kvm-sle12                                                                   | 2021-06-20 03:41:01 | grenache-1 |       48
 6296935 | 06296935-sle-15-SP2-Server-DVD-Updates-s390x-sle_image_on_sle_host:investigate:last_good_tests:f88d0c029e1fe4c2ecaab5f75149bd21713a5eb5@s390x-kvm-sle12                                                                   | 2021-06-20 03:47:53 | grenache-1 |       35
 6296936 | 06296936-sle-15-SP2-Server-DVD-Updates-s390x-sle_image_on_sle_host:investigate:last_good_build:20210615-1@s390x-kvm-sle12                                                                                                 | 2021-06-20 03:49:20 | grenache-1 |       31
 6296937 | 06296937-sle-15-SP2-Server-DVD-Updates-s390x-sle_image_on_sle_host:investigate:last_good_tests_and_build:f88d0c029e1fe4c2ecaab5f75149bd21713a5eb5+20210615-1@s390x-kvm-sle12                                              | 2021-06-20 03:50:51 | grenache-1 |       36
 6295749 | 06295749-sle-15-SP3-Server-DVD-Incidents-Kernel-KOTD-s390x-Build5.3.18-254.1.g0483fe0-ltp_syscalls_ipc@s390x-kvm-sle12                                                                                                    | 2021-06-20 03:52:24 | grenache-1 |       13
 6295750 | 06295750-sle-15-SP3-Server-DVD-Incidents-Kernel-KOTD-s390x-Build5.3.18-254.1.g0483fe0-ltp_syscalls_debug_pagealloc@s390x-kvm-sle12                                                                                        | 2021-06-20 03:53:29 | grenache-1 |       44
 6295752 | 06295752-sle-15-SP3-Server-DVD-Incidents-Kernel-KOTD-s390x-Build5.3.18-254.1.g0483fe0-ltp_sched@s390x-kvm-sle12                                                                                                           | 2021-06-20 03:55:44 | grenache-1 |       32
 6295751 | 06295751-sle-15-SP3-Server-DVD-Incidents-Kernel-KOTD-s390x-Build5.3.18-254.1.g0483fe0-ltp_syscalls@s390x-kvm-sle12                                                                                                        | 2021-06-20 03:55:45 | grenache-1 |       37
 6295753 | 06295753-sle-15-SP3-Server-DVD-Incidents-Kernel-KOTD-s390x-Build5.3.18-254.1.g0483fe0-ltp_openposix@s390x-kvm-sle12                                                                                                       | 2021-06-20 03:57:15 | grenache-1 |       33
 6295754 | 06295754-sle-15-SP3-Server-DVD-Incidents-Kernel-KOTD-s390x-Build5.3.18-254.1.g0483fe0-ltp_dio@s390x-kvm-sle12                                                                                                             | 2021-06-20 04:07:42 | grenache-1 |       12
 6295755 | 06295755-sle-15-SP3-Server-DVD-Incidents-Kernel-KOTD-s390x-Build5.3.18-254.1.g0483fe0-kernel-live-patching@s390x-kvm-sle12                                                                                                | 2021-06-20 04:14:29 | grenache-1 |       46
 6269112 | 06269112-sle-15-SP3-Server-DVD-SAP-Incidents-x86_64-qam-sles4sap_online_dvd_gnome_hana_nvdimm:investigate:last_good_build::19935:habootstrap-formula@64bit-ipmi-nvdimm                                                    | 2021-06-20 04:25:53 | grenache-1 |       11
 6296940 | 06296940-sle-15-SP3-Server-DVD-Updates-s390x-sle_image_on_sle_host:investigate:retry@s390x-kvm-sle12                                                                                                                      | 2021-06-20 04:39:27 | grenache-1 |       14
 6296941 | 06296941-sle-15-SP3-Server-DVD-Updates-s390x-sle_image_on_sle_host:investigate:last_good_tests:f88d0c029e1fe4c2ecaab5f75149bd21713a5eb5@s390x-kvm-sle12                                                                   | 2021-06-20 04:40:55 | grenache-1 |       15
 6296942 | 06296942-sle-15-SP3-Server-DVD-Updates-s390x-sle_image_on_sle_host:investigate:last_good_build:20210615-1@s390x-kvm-sle12                                                                                                 | 2021-06-20 04:40:55 | grenache-1 |       45
 6296943 | 06296943-sle-15-SP3-Server-DVD-Updates-s390x-sle_image_on_sle_host:investigate:last_good_tests_and_build:f88d0c029e1fe4c2ecaab5f75149bd21713a5eb5+20210615-1@s390x-kvm-sle12                                              | 2021-06-20 04:42:26 | grenache-1 |       34
 6295811 | 06295811-sle-15-SP3-Server-DVD-SAP-Incidents-x86_64-qam-sles4sap_online_dvd_gnome_hana_nvdimm:investigate:last_good_tests_and_build:abc2a9e0439943c86ce1a1a6bc8777fda2bf6804+:19935:habootstrap-formula@64bit-ipmi-nvdimm | 2021-06-20 05:26:57 | grenache-1 |       11
 6296143 | 06296143-sle-15-SP3-Server-DVD-SAP-Incidents-x86_64-qam-sles4sap_online_dvd_gnome_hana_nvdimm:investigate:retry@64bit-ipmi-nvdimm                                                                                         | 2021-06-20 06:28:23 | grenache-1 |       11
 6264554 | 06264554-sle-15-Server-DVD-SAP-Incidents-x86_64-qam-sles4sap_online_dvd_gnome_hana_nvdimm:investigate:retry@64bit-ipmi-nvdimm                                                                                             | 2021-06-20 07:29:50 | grenache-1 |       11
 6289148 | 06289148-sle-15-SP1-Server-DVD-SAP-Incidents-x86_64-qam-sles4sap_online_dvd_gnome_hana_nvdimm:investigate:last_good_build::19983:xterm@64bit-ipmi-nvdimm                                                                  | 2021-06-20 08:31:19 | grenache-1 |       11
 6289149 | 06289149-sle-15-SP1-Server-DVD-SAP-Incidents-x86_64-qam-sles4sap_online_dvd_gnome_hana_nvdimm:investigate:last_good_tests_and_build:8fe545bc1a5752e32689ea7a594c57503b80345c+:19983:xterm@64bit-ipmi-nvdimm               | 2021-06-20 09:32:49 | grenache-1 |       11
 6289736 | 06289736-sle-15-SP1-Server-DVD-SAP-Incidents-x86_64-qam-sles4sap_online_dvd_gnome_hana_nvdimm:investigate:last_good_build::19983:xterm@64bit-ipmi-nvdimm                                                                  | 2021-06-20 10:34:20 | grenache-1 |       11
 6289737 | 06289737-sle-15-SP1-Server-DVD-SAP-Incidents-x86_64-qam-sles4sap_online_dvd_gnome_hana_nvdimm:investigate:last_good_tests_and_build:8fe545bc1a5752e32689ea7a594c57503b80345c+:19983:xterm@64bit-ipmi-nvdimm               | 2021-06-20 11:35:50 | grenache-1 |       11
 6297272 | 06297272-sle-12-SP4-Server-DVD-Updates-s390x-Build20210621-1-qam-minimal+base@s390x-kvm-sle12                                                                                                                             | 2021-06-21 00:17:47 | grenache-1 |       46
 6297415 | 06297415-sle-12-SP5-Server-DVD-Updates-s390x-Build20210621-1-qam-minimal+base@s390x-kvm-sle15                                                                                                                             | 2021-06-21 00:18:05 | grenache-1 |       33
 6297416 | 06297416-sle-12-SP5-Server-DVD-Updates-s390x-Build20210621-1-qam-gnome@s390x-kvm-sle15                                                                                                                                    | 2021-06-21 00:18:05 | grenache-1 |       44
 6297417 | 06297417-sle-12-SP5-Server-DVD-Updates-s390x-Build20210621-1-mru-install-minimal-with-addons@s390x-kvm-sle15                                                                                                              | 2021-06-21 00:18:05 | grenache-1 |       37
 6297564 | 06297564-sle-15-Server-DVD-Updates-s390x-Build20210621-1-mru-install-minimal-with-addons@s390x-kvm-sle12                                                                                                                  | 2021-06-21 00:18:21 | grenache-1 |       34
 6297717 | 06297717-sle-15-SP1-Server-DVD-Updates-s390x-Build20210621-1-qam-minimal+base@s390x-kvm-sle12                                                                                                                             | 2021-06-21 00:18:39 | grenache-1 |       31
 6297719 | 06297719-sle-15-SP1-Server-DVD-Updates-s390x-Build20210621-1-mru-install-minimal-with-addons@s390x-kvm-sle12                                                                                                              | 2021-06-21 00:18:39 | grenache-1 |       45
 6297886 | 06297886-sle-15-SP2-Server-DVD-Updates-s390x-Build20210621-1-qam-gnome@s390x-kvm-sle12                                                                                                                                    | 2021-06-21 00:19:00 | grenache-1 |       47
 6297885 | 06297885-sle-15-SP2-Server-DVD-Updates-s390x-Build20210621-1-qam-minimal+base@s390x-kvm-sle12                                                                                                                             | 2021-06-21 00:19:01 | grenache-1 |       49
 6298045 | 06298045-sle-15-SP3-Server-DVD-Updates-s390x-Build20210621-1-qam-gnome@s390x-kvm-sle12                                                                                                                                    | 2021-06-21 00:19:22 | grenache-1 |       32
 6298858 | 06298858-sle-12-SP4-Server-DVD-Incidents-Kernel-KOTD-s390x-Build4.12.14-97.1.g759e1c1-ltp_cve_git@s390x-kvm-sle12                                                                                                         | 2021-06-21 01:19:04 | grenache-1 |       35
 6298046 | 06298046-sle-15-SP3-Server-DVD-Updates-s390x-Build20210621-1-mru-install-minimal-with-addons@s390x-kvm-sle12                                                                                                              | 2021-06-21 01:19:04 | grenache-1 |       15
 6298988 | 06298988-sle-15-SP1-Server-DVD-Updates-s390x-Build20210621-1-qam-gnome@s390x-kvm-sle12                                                                                                                                    | 2021-06-21 01:19:05 | grenache-1 |       45
 6298678 | 06298678-sle-15-Server-DVD-Incidents-Kernel-KOTD-s390x-Build4.12.14-132.1.g1edcf88-ltp_cve_git@s390x-kvm-sle12                                                                                                            | 2021-06-21 01:19:05 | grenache-1 |       46
 6298679 | 06298679-sle-15-Server-DVD-Incidents-Kernel-KOTD-s390x-Build4.12.14-132.1.g1edcf88-install_ltp+sle+Server-DVD-Incidents-Kernel-KOTD@s390x-kvm-sle12                                                                       | 2021-06-21 01:19:06 | grenache-1 |       14
 6298986 | 06298986-sle-15-Server-DVD-Updates-s390x-Build20210621-1-qam-gnome@s390x-kvm-sle12                                                                                                                                        | 2021-06-21 01:19:06 | grenache-1 |       36
 6298987 | 06298987-sle-15-Server-DVD-Updates-s390x-Build20210621-1-qam-minimal+base@s390x-kvm-sle12                                                                                                                                 | 2021-06-21 01:19:06 | grenache-1 |       33
 6298614 | 06298614-sle-15-SP1-Server-DVD-Incidents-Kernel-KOTD-s390x-Build4.12.14-51.1.g5bd5080-install_ltp+sle+Server-DVD-Incidents-Kernel-KOTD@s390x-kvm-sle12                                                                    | 2021-06-21 01:19:07 | grenache-1 |       37
 6298613 | 06298613-sle-15-SP1-Server-DVD-Incidents-Kernel-KOTD-s390x-Build4.12.14-51.1.g5bd5080-ltp_cve_git@s390x-kvm-sle12                                                                                                         | 2021-06-21 01:19:07 | grenache-1 |       34
 6298859 | 06298859-sle-12-SP4-Server-DVD-Incidents-Kernel-KOTD-s390x-Build4.12.14-97.1.g759e1c1-install_ltp+sle+Server-DVD-Incidents-Kernel-KOTD@s390x-kvm-sle12                                                                    | 2021-06-21 01:19:08 | grenache-1 |       12
 6298974 | 06298974-sle-12-SP4-Server-DVD-Updates-s390x-Build20210621-1-mru-install-minimal-with-addons@s390x-kvm-sle12                                                                                                              | 2021-06-21 01:19:09 | grenache-1 |       31
 6298989 | 06298989-sle-15-SP2-Server-DVD-Updates-s390x-Build20210621-1-mru-install-minimal-with-addons@s390x-kvm-sle12                                                                                                              | 2021-06-21 01:20:36 | grenache-1 |       49
 6299002 | 06299002-sle-15-SP3-Server-DVD-Updates-s390x-Build20210621-1-qam-minimal+base@s390x-kvm-sle12                                                                                                                             | 2021-06-21 01:20:37 | grenache-1 |       47
 6299013 | 06299013-sle-15-SP3-Migration-from-SLE12-SPx-s390x-Build187.1-offline_sles12sp3_ltss_pscc_asmm-lgm_all_full_lock@s390x-kvm-sle12                                                                                          | 2021-06-21 01:47:31 | grenache-1 |       48
 6299014 | 06299014-sle-15-SP3-Migration-from-SLE12-SPx-s390x-Build187.1-offline_sles12sp5_media_base_full@s390x-kvm-sle12                                                                                                           | 2021-06-21 01:54:33 | grenache-1 |       13
 6299023 | 06299023-sle-12-SP4-Server-DVD-Updates-s390x-Build20210621-1-qam-gnome@s390x-kvm-sle12                                                                                                                                    | 2021-06-21 02:19:23 | grenache-1 |       15
 6390578 | 06390578-sle-15-SP3-Server-DVD-Incidents-Install-ppc64le-BuildMR:244267:nodejs14-qam-incidentinstall@ppc64le                                                                                                              | 2021-07-07 13:12:54 | malbec     |        2
 6391210 | 06391210-sle-15-SP2-Server-DVD-Incidents-Install-ppc64le-Build:20300:nodejs12-qam-incidentinstall@ppc64le                                                                                                                 | 2021-07-07 13:26:14 | malbec     |        4
 6391605 | 06391605-sle-15-SP1-Server-DVD-SAP-Incidents-Install-ppc64le-Build:20294:dtb-aarch64-qam-incidentinstall-sap@ppc64le-2g                                                                                                   | 2021-07-07 13:27:18 | malbec     |        3
 6392106 | 06392106-sle-15-SP3-Server-DVD-Incidents-Install-ppc64le-BuildMR:244278:dovecot23-qam-incidentinstall@ppc64le                                                                                                             | 2021-07-07 14:14:44 | malbec     |        2
 6392642 | 06392642-sle-12-SP4-Server-DVD-SAP-Incidents-Install-ppc64le-Build:20298:linuxptp-qam-incidentinstall-sap@ppc64le-2g                                                                                                      | 2021-07-07 14:27:00 | malbec     |        1
 6393089 | 06393089-sle-15-SP3-Server-DVD-Incidents-Install-ppc64le-Build:20116:python-cachetools-qam-incidentinstall@ppc64le                                                                                                        | 2021-07-07 17:13:38 | malbec     |        2
 6393125 | 06393125-sle-15-SP3-Server-DVD-Incidents-Minimal-ppc64le-Build:20293:dtb-aarch64-qam-minimal-full@ppc64le                                                                                                                 | 2021-07-07 17:13:44 | malbec     |        1
 6393261 | 06393261-sle-15-SP3-Server-DVD-Incidents-Kernel-Base-ppc64le-Build:20293:dtb-aarch64-ltp_syscalls_base@ppc64le-virtio                                                                                                     | 2021-07-07 17:14:06 | malbec     |        4
 6393207 | 06393207-sle-15-SP3-Server-DVD-Incidents-Kernel-ppc64le-Build:20293:dtb-aarch64-ltp_fs_bind@ppc64le-virtio                                                                                                                | 2021-07-07 17:31:44 | malbec     |        3
 6394011 | 06394011-sle-15-SP1-Server-DVD-Incidents-Minimal-ppc64le-Build:20294:dtb-aarch64-qam-minimal@ppc64le                                                                                                                      | 2021-07-07 19:15:10 | malbec     |        4
 6393945 | 06393945-sle-15-SP1-Server-DVD-Incidents-Kernel-ppc64le-Build:20294:dtb-aarch64-ltp_containers@ppc64le-virtio                                                                                                             | 2021-07-07 19:36:50 | malbec     |        3
 6393941 | 06393941-sle-15-SP1-Server-DVD-Incidents-Kernel-ppc64le-Build:20294:dtb-aarch64-ltp_can@ppc64le-virtio                                                                                                                    | 2021-07-07 19:36:51 | malbec     |        1
 6393939 | 06393939-sle-15-SP1-Server-DVD-Incidents-Kernel-ppc64le-Build:20294:dtb-aarch64-ltp_aiodio_part3@ppc64le-virtio                                                                                                           | 2021-07-07 19:36:54 | malbec     |        2
 6394566 | 06394566-sle-15-SP3-Server-DVD-Incidents-Install-ppc64le-Build:19446:yast2-country-qam-incidentinstall@ppc64le                                                                                                            | 2021-07-07 20:15:19 | malbec     |        4
 6397191 | 06397191-sle-15-SP2-Server-DVD-Incidents-Install-ppc64le-Build:20116:python-cachetools-qam-incidentinstall@ppc64le                                                                                                        | 2021-07-08 08:12:14 | malbec     |        1
 6397381 | 06397381-sle-15-SP2-Server-DVD-Incidents-Install-ppc64le-BuildMR:244309:hwdata-qam-incidentinstall@ppc64le                                                                                                                | 2021-07-08 09:13:12 | malbec     |        1
 6397702 | 06397702-sle-15-SP2-Server-DVD-Incidents-Install-ppc64le-Build:20310:dovecot23-qam-incidentinstall@ppc64le                                                                                                                | 2021-07-08 11:12:40 | malbec     |        1
 6398451 | 06398451-sle-15-Server-DVD-HA-Incidents-Install-ppc64le-Build:20315:crmsh-qam-incidentinstall-ha@ppc64le                                                                                                                  | 2021-07-08 12:12:56 | malbec     |        1
(73 rows)

so previously failures on grenache-1 on 2021-06-21, now malbec in the past days 2021-07-07 and 2021-07-08, not any today.

Following the trail of https://openqa.suse.de/tests/6393939# https://openqa.suse.de/tests/6393939/logfile?filename=autoinst-log.txt says

[info] [#51162] Downloading "SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" from "http://openqa.suse.de/tests/6393939/asset/iso/SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso"
[info] [#51162] Content of "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" has not changed, updating last use
[2021-07-07T20:37:39.0882 CEST] [info] [pid:81625] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #51165 sent to Cache Service
[2021-07-07T21:36:54.0341 CEST] [info] [pid:81625] +++ worker notes +++
[2021-07-07T21:36:54.0341 CEST] [info] [pid:81625] End time: 2021-07-07 19:36:54
[2021-07-07T21:36:54.0341 CEST] [info] [pid:81625] Result: timeout

so the iso is correctly downloaded in request #51162 and then a request for syncing tests is triggered in request #51165

On malbec in journalctl -u openqa-worker-cacheservice-minion.service I find

Jul 07 20:37:34 malbec openqa-worker-cacheservice-minion[16175]: [88318] [i] [#51162] Downloading: "SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso"
Jul 07 20:37:34 malbec openqa-worker-cacheservice-minion[16175]: Can't opendir(/var/lib/openqa/cache/lost+found): Permission denied
Jul 07 20:37:34 malbec openqa-worker-cacheservice-minion[16175]:  at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 74.
Jul 07 20:37:34 malbec openqa-worker-cacheservice-minion[16175]: [88318] [i] [#51162] Cache size of "/var/lib/openqa/cache" is 46GiB, with limit 50GiB
Jul 07 20:37:34 malbec openqa-worker-cacheservice-minion[16175]: [88318] [i] [#51162] Downloading "SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" from "http://openqa.suse.de/tests/6393939/asset/iso/SLE-15-SP1>
Jul 07 20:37:35 malbec openqa-worker-cacheservice-minion[16175]: [88318] [i] [#51162] Content of "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" has not changed, updating >
Jul 07 20:37:35 malbec openqa-worker-cacheservice-minion[16175]: [88318] [i] [#51162] Finished download
Jul 07 20:37:36 malbec openqa-worker-cacheservice-minion[16175]: [88320] [i] [#51163] Downloading: "SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso"
Jul 07 20:37:36 malbec openqa-worker-cacheservice-minion[16175]: Can't opendir(/var/lib/openqa/cache/lost+found): Permission denied
Jul 07 20:37:36 malbec openqa-worker-cacheservice-minion[16175]:  at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 74.
Jul 07 20:37:36 malbec openqa-worker-cacheservice-minion[16175]: [88320] [i] [#51163] Cache size of "/var/lib/openqa/cache" is 46GiB, with limit 50GiB
Jul 07 20:37:36 malbec openqa-worker-cacheservice-minion[16175]: [88320] [i] [#51163] Downloading "SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" from "http://openqa.suse.de/tests/6393945/asset/iso/SLE-15-SP1>
Jul 07 20:37:36 malbec openqa-worker-cacheservice-minion[16175]: [88320] [i] [#51163] Content of "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" has not changed, updating >
Jul 07 20:37:36 malbec openqa-worker-cacheservice-minion[16175]: [88320] [i] [#51163] Finished download
Jul 07 20:37:37 malbec openqa-worker-cacheservice-minion[16175]: [88321] [i] [#51164] Downloading: "SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso"
Jul 07 20:37:37 malbec openqa-worker-cacheservice-minion[16175]: Can't opendir(/var/lib/openqa/cache/lost+found): Permission denied
Jul 07 20:37:37 malbec openqa-worker-cacheservice-minion[16175]:  at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 74.
Jul 07 20:37:37 malbec openqa-worker-cacheservice-minion[16175]: [88321] [i] [#51164] Cache size of "/var/lib/openqa/cache" is 46GiB, with limit 50GiB
Jul 07 20:37:37 malbec openqa-worker-cacheservice-minion[16175]: [88321] [i] [#51164] Downloading "SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" from "http://openqa.suse.de/tests/6393941/asset/iso/SLE-15-SP1>
Jul 07 20:37:37 malbec openqa-worker-cacheservice-minion[16175]: [88321] [i] [#51164] Content of "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP1-Installer-DVD-ppc64le-GM-DVD1.iso" has not changed, updating >
Jul 07 20:37:37 malbec openqa-worker-cacheservice-minion[16175]: [88321] [i] [#51164] Finished download
Jul 07 21:15:19 malbec openqa-worker-cacheservice-minion[16175]: [91671] [i] [#51168] Downloading: "SLES-15-SP3-ppc64le-Installtest.qcow2"
Jul 07 21:15:19 malbec openqa-worker-cacheservice-minion[16175]: Can't opendir(/var/lib/openqa/cache/lost+found): Permission denied
Jul 07 21:15:19 malbec openqa-worker-cacheservice-minion[16175]:  at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 74.
Jul 07 21:15:19 malbec openqa-worker-cacheservice-minion[16175]: [91671] [i] [#51168] Cache size of "/var/lib/openqa/cache" is 46GiB, with limit 50GiB
Jul 07 21:15:19 malbec openqa-worker-cacheservice-minion[16175]: [91671] [i] [#51168] Downloading "SLES-15-SP3-ppc64le-Installtest.qcow2" from "http://openqa.suse.de/tests/6394566/asset/hdd/SLES-15-SP3-ppc64>
Jul 07 21:15:19 malbec openqa-worker-cacheservice-minion[16175]: [91671] [i] [#51168] Content of "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP3-ppc64le-Installtest.qcow2" has not changed, updating last use
Jul 07 21:15:19 malbec openqa-worker-cacheservice-minion[16175]: [91671] [i] [#51168] Finished download
Jul 07 21:15:29 malbec openqa-worker-cacheservice-minion[16175]: [91686] [i] [#51169] Downloading: "SLE-15-SP3-Full-ppc64le-GM-Media1.iso"
Jul 07 21:15:29 malbec openqa-worker-cacheservice-minion[16175]: Can't opendir(/var/lib/openqa/cache/lost+found): Permission denied
Jul 07 21:15:29 malbec openqa-worker-cacheservice-minion[16175]:  at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 74.
Jul 07 21:15:29 malbec openqa-worker-cacheservice-minion[16175]: [91686] [i] [#51169] Cache size of "/var/lib/openqa/cache" is 46GiB, with limit 50GiB
Jul 07 21:15:29 malbec openqa-worker-cacheservice-minion[16175]: [91686] [i] [#51169] Downloading "SLE-15-SP3-Full-ppc64le-GM-Media1.iso" from "http://openqa.suse.de/tests/6394566/asset/iso/SLE-15-SP3-Full-p>
Jul 07 21:15:29 malbec openqa-worker-cacheservice-minion[16175]: [91686] [i] [#51169] Content of "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP3-Full-ppc64le-GM-Media1.iso" has not changed, updating last use
Jul 07 21:15:29 malbec openqa-worker-cacheservice-minion[16175]: [91686] [i] [#51169] Finished download

so the request for the ISO appears with #51162 and also later requests appear but #51165 is not appearing anymore at all. I suggest we look into how it can happen that the request gets lost from the cacheservice to the minion, or add corresponding logging for that case.

Actions #2

Updated by okurz over 3 years ago

  • Subject changed from Tests timeout with reason 'setup exceeded MAX_SETUP_TIME' on osd ppc64le workers auto_review:"Result: timeout":retry to Tests timeout with reason 'setup exceeded MAX_SETUP_TIME' on osd ppc64le workers auto_review:"Result: timeout":retry size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by mkittler over 3 years ago

  • Assignee set to mkittler
Actions #4

Updated by mkittler over 3 years ago

All of these tests timed out 6 days ago waiting for tests to be synchronized ([2021-07-07T20:37:39.0882 CEST] [info] [pid:81625] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #51165 sent to Cache Service).

There's a Minion job for caching tests (http://localhost:9530/minion/jobs?id=51130 on malbec.arch.suse.de) which has been started 6 days ago and was stuck for 2 days until it failed with result: 'Job terminated unexpectedly (exit code: 0, signal: 15)'. There are also more of these failures from several months ago. More recent jobs have all succeeded.

So I guess I'll have to find out why these tasks can get stuck. Unfortunately the logs are not very helpful:

martchus@malbec:~> sudo journalctl --no-pager -u openqa-worker-cacheservice-minion | grep -i 51130
Jul 07 14:13:14 malbec openqa-worker-cacheservice-minion[16175]: [54325] [i] [#51130] Sync: "rsync://openqa.suse.de/tests" to "/var/lib/openqa/cache/openqa.suse.de"
Jul 07 14:13:14 malbec openqa-worker-cacheservice-minion[16175]: [54325] [i] [#51130] Calling: rsync -avHP rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/

Before the service is stopped we see this:

Jul 09 05:19:55 malbec openqa-worker-cacheservice-minion[16175]: [124176] [i] [#51254] Sync: "rsync://openqa.suse.de/tests" to "/var/lib/openqa/cache/openqa.suse.de"
Jul 09 05:19:55 malbec openqa-worker-cacheservice-minion[16175]: [124176] [i] [#51254] Calling: rsync -avHP rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/
Jul 09 05:20:05 malbec openqa-worker-cacheservice-minion[16175]: [124176] [i] [#51254] Finished sync: 0
Jul 09 07:16:12 malbec systemd[1]: Stopping OpenQA Worker Cache Service Minion...
Jul 09 07:16:12 malbec openqa-worker-cacheservice-minion[16175]: rsync error: received SIGINT, SIGTERM, or SIGHUP (code 20) at rsync.c(642) [generator=3.1.3]
Jul 09 07:16:12 malbec openqa-worker-cacheservice-minion[16175]: rsync error: received SIGUSR1 (code 19) at main.c(1455) [receiver=3.1.3]
Jul 09 07:16:18 malbec openqa-worker-cacheservice-minion[16175]: [16175] [i] Worker 16175 stopped
Jul 09 07:16:18 malbec systemd[1]: openqa-worker-cacheservice-minion.service: Main process exited, code=exited, status=144/n/a
Jul 09 07:16:18 malbec systemd[1]: Stopped OpenQA Worker Cache Service Minion.

so the request for the ISO appears with #51162 and also later requests appear but #51165 is not appearing anymore at all. I suggest we look into how it can happen that the request gets lost from the cacheservice to the minion, or add corresponding logging for that case.

Looks like #51165 (and Minion jobs mentioned in the other openQA jobs) already have been cleaned up from the Minion dashboard. However, I suppose these jobs are not relevant anyways because #51130 was doing the real download. So the other jobs have just been telling the worker that the real download is happening within #51130 and therefore didn't produce any logs. Not sure whether we should add logging for this.


Jul 07 20:37:34 malbec openqa-worker-cacheservice-minion[16175]: Can't opendir(/var/lib/openqa/cache/lost+found): Permission denied

These log lines are likely not causing the problem. However, I've removed the lost+found directory which was empty (and only readable by root). We don't have such a directory on other workers so it has likely been created by accident.

Actions #5

Updated by mkittler over 3 years ago

  • Status changed from Workable to In Progress

I'm still not sure why this happens but likely adding rsync's --timeout option helps here. PR: https://github.com/os-autoinst/openQA/pull/4043

Actions #6

Updated by okurz over 3 years ago

mkittler wrote:

Jul 07 20:37:34 malbec openqa-worker-cacheservice-minion[16175]: Can't opendir(/var/lib/openqa/cache/lost+found): Permission denied

These log lines are likely not causing the problem. However, I've removed the lost+found directory which was empty (and only readable by root). We don't have such a directory on other workers so it has likely been created by accident.

The directory lost+found is automatically created on any ext filesystem and is only readable by root, that is correct. We should not try to read it at all. Better if we avoid reading it within our code, e.g. either not try to read all directories within the cache dir or if that is not feasible then at least explicitly exclude "lost+found" … or catch the error message and pipe it to /dev/null

Actions #7

Updated by openqa_review over 3 years ago

  • Due date set to 2021-07-28

Setting due date based on mean cycle time of SUSE QE Tools

Actions #8

Updated by mkittler over 3 years ago

  • Status changed from In Progress to Feedback

PR for the lost+found error: https://github.com/os-autoinst/openQA/pull/4048

Note that this error being logged has nothing to do with the problem from the ticket description. It is only a disturbing log message without any impact on the behavior of the cache service (apart from that directory not being taken into account).

The PR for adding an inactivity timeout for the Minion job has been merged and deployed. So far select jobs.id,result_dir,t_finished,host,instance from jobs join workers on jobs.assigned_worker_id=workers.id where reason ~ 'timeout: setup exceeded' and t_started >= '2021-07-14T11:40' order by t_finished; returns no results. I'll check again later in a few days. (However, judging by the failures on the worker's Minion dashboard the problem only happened about once a month.)

Actions #9

Updated by okurz over 3 years ago

mkittler wrote:

PR for the lost+found error: https://github.com/os-autoinst/openQA/pull/4048

Note that this error being logged has nothing to do with the problem from the ticket description. It is only a disturbing log message without any impact on the behavior of the cache service (apart from that directory not being taken into account).

agreed.

The PR for adding an inactivity timeout for the Minion job has been merged and deployed. So far select jobs.id,result_dir,t_finished,host,instance from jobs join workers on jobs.assigned_worker_id=workers.id where reason ~ 'timeout: setup exceeded' and t_started >= '2021-07-14T11:40' order by t_finished; returns no results. I'll check again later in a few days. (However, judging by the failures on the worker's Minion dashboard the problem only happened about once a month.)

+1

Actions #10

Updated by mkittler over 3 years ago

Now both PRs have been merged. The query still returns no results.

Actions #11

Updated by mkittler over 3 years ago

So far the problem isn't occurring anymore.

Actions #12

Updated by okurz over 3 years ago

ok, what's your plan?

Actions #13

Updated by mkittler over 3 years ago

It is still not happening anymore. The plan is to keep it in feedback for a while because the problem only occurred once a month or so. But we can also just close the ticket (and possibly re-open it if it occurs again).

Actions #14

Updated by okurz over 3 years ago

As an alternative you can consider introducing an alert because users can likely not do something useful when jobs run into setup timeout

Actions #15

Updated by mkittler over 3 years ago

Not sure whether it is worth adding an alert. Making it useful, e.g. to contain relevant job IDs (and not just the count) might not be that trivial.


Note that a similar problem occurred again:

openqa=> select jobs.id,result_dir,t_finished,host,instance from jobs join workers on jobs.assigned_worker_id=workers.id where reason ~ 'timeout: setup exceeded' and t_started >= '2021-07-14T11:40' order by t_finished;
   id    |                                                   result_dir                                                   |     t_finished      |     host      | instance 
---------+----------------------------------------------------------------------------------------------------------------+---------------------+---------------+----------
 6578161 | 06578161-sle-15-SP3-Server-DVD-Updates-x86_64-Build20210727-1-mru-iscsi_client_normal_auth_backstore_lvm@64bit | 2021-07-27 05:40:56 | openqaworker6 |       17
(1 Zeile)

(https://openqa.suse.de/tests/6578161)

The history doesn't go far enough to tell whether there's a failed cache_tests Minion job (or a passed one which took extraordinarily long). That's all the log has to say:

Jul 27 07:42:38 openqaworker6 openqa-worker-cacheservice-minion[31438]: [37967] [i] [#4278] Sync: "rsync://openqa.suse.de/tests" to "/var/lib/openqa/cache/openqa.suse.de"
Jul 27 07:42:38 openqaworker6 openqa-worker-cacheservice-minion[31438]: [37967] [i] [#4278] Calling: rsync -avHP --timeout 1800 rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/

So the timeout parameter is passed as expected.

According to the timestamps of the job's log the setup already took already ~50 minutes before the rsync request:

[2021-07-27T06:40:55.0658 CEST] [info] [pid:30484] +++ setup notes +++
[2021-07-27T06:40:55.0658 CEST] [info] [pid:30484] Running on openqaworker6:17 (Linux 5.3.18-lp152.84-default #1 SMP Tue Jul 20 23:04:11 UTC 2021 (baaeecf) x86_64)
[2021-07-27T06:40:55.0668 CEST] [debug] [pid:30484] Found HDD_1, caching SLES-15-SP3-x86_64-mru-install-desktop-with-addons-Build20210727-1.qcow2
[2021-07-27T06:40:55.0671 CEST] [info] [pid:30484] Downloading SLES-15-SP3-x86_64-mru-install-desktop-with-addons-Build20210727-1.qcow2, request #4149 sent to Cache Service
[2021-07-27T06:51:31.0843 CEST] [info] [pid:30484] Download of SLES-15-SP3-x86_64-mru-install-desktop-with-addons-Build20210727-1.qcow2 processed:
[info] [#4149]
Cache size of "/var/lib/openqa/cache" is 48GiB, with limit 50GiB
[info] [#4149]
Downloading "SLES-15-SP3-x86_64-mru-install-desktop-with-addons-Build20210727-1.qcow2" from "http://openqa.suse.de/tests/6578161/asset/hdd/SLES-15-SP3-x86_64-mru-install-desktop-with-addons-Build20210727-1.qcow2"
[info] [#4149]
Size of "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP3-x86_64-mru-install-desktop-with-addons-Build20210727-1.qcow2" is 2.5GiB, with ETag ""9d050000-5c812f9f306d1""
[info] [#4149]
Download of "/var/lib/openqa/cache/openqa.suse.de/SLES-15-SP3-x86_64-mru-install-desktop-with-addons-Build20210727-1.qcow2" successful, new cache size is 49GiB

[2021-07-27T06:51:31.0855 CEST] [debug] [pid:30484] Found ISO, caching SLE-15-SP3-Full-x86_64-GM-Media1.iso
[2021-07-27T06:51:31.0858 CEST] [info] [pid:30484] Downloading SLE-15-SP3-Full-x86_64-GM-Media1.iso, request #4187 sent to Cache Service
[2021-07-27T07:37:30.0997 CEST] [info] [pid:30484] Download of SLE-15-SP3-Full-x86_64-GM-Media1.iso processed:
[info] [#4187]
Cache size of "/var/lib/openqa/cache" is 50GiB, with limit 50GiB
[info] [#4187]
Downloading "SLE-15-SP3-Full-x86_64-GM-Media1.iso" from "http://openqa.suse.de/tests/6578161/asset/iso/SLE-15-SP3-Full-x86_64-GM-Media1.iso"
[info] [#4187]
Cache size 49GiB + needed 11GiB exceeds limit of 50GiB, purging least used assets
[info] [#4187]
Purging "/var/lib/openqa/cache/openqa.suse.de/SLED-15-SP3-x86_64-GM-gnome.qcow2" because we need space for new assets, reclaiming 2.6GiB
[info] [#4187]
Purging "/var/lib/openqa/cache/openqa.suse.de/SLE-12-SP5-SAP-DVD-x86_64-GM-DVD1.iso" because we need space for new assets, reclaiming 3.9GiB
[info] [#4187]
Purging "/var/lib/openqa/cache/openqa.suse.de/SLES-12-SP4-x86_64-mru-install-minimal-with-addons-Build20210726-1-Server-DVD-Updates-64bit.qcow2" because we need space for new assets, reclaiming 1.1GiB
[info] [#4187]
Purging "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP4-Full-x86_64-Build16.3-Media1.iso" because we need space for new assets, reclaiming 11GiB
[info] [#4187]
Size of "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP3-Full-x86_64-GM-Media1.iso" is 11GiB, with ETag ""2d7b00000-5c2d3a3560f9a""
[info] [#4187]
Download of "/var/lib/openqa/cache/openqa.suse.de/SLE-15-SP3-Full-x86_64-GM-Media1.iso" successful, new cache size is 42GiB

[2021-07-27T07:37:31.0002 CEST] [info] [pid:30484] Rsync from 'rsync://openqa.suse.de/tests' to '/var/lib/openqa/cache/openqa.suse.de', request #4278 sent to Cache Service
[2021-07-27T07:40:55.0720 CEST] [info] [pid:30484] +++ worker notes +++
[2021-07-27T07:40:55.0720 CEST] [info] [pid:30484] End time: 2021-07-27 05:40:55
[2021-07-27T07:40:55.0720 CEST] [info] [pid:30484] Result: timeout
[2021-07-27T07:40:55.0726 CEST] [info] [pid:37617] Uploading autoinst-log.txt

There are no timeouts, though. So I guess this worker really just suffered from very slow downloads at this point and it was not just rsync hanging forever. Not sure whether we can do something about it. (We restart such jobs already anyways via job_done_hook_timeout_exceeded = env host=openqa.suse.de /opt/os-autoinst-scripts/openqa-label-known-issues-hook.)

Actions #16

Updated by okurz over 3 years ago

Actions #17

Updated by okurz over 3 years ago

As discussed we should do the following

  • Add an alert if there is any non-restarted job exceeding max_setup_time (similar to what we already have for incomplete job monitoring), maybe with a hint in the description what SQL query to run to find out the job and find logs from there -> #96254
  • Apply a higher MAX_SETUP_TIME applicable for the complete OSD infrastructure, e.g. add to every worker config entry -> #96257
  • DONE: Add a feature request for "configuration applicable to all jobs for one openQA instance", e.g. add to #65271 with a link back here -> #65271#note-69
Actions #18

Updated by okurz over 3 years ago

  • Copied to action #96254: Tests timeout with MAX_SETUP_TIME - Add an alert if there is any non-restarted job exceeding max_setup_time added
Actions #19

Updated by okurz over 3 years ago

  • Copied to action #96257: Tests timeout with MAX_SETUP_TIME on osd - Apply a higher MAX_SETUP_TIME applicable for the *complete* OSD infrastructure, e.g. add to every worker config entry added
Actions #20

Updated by okurz over 3 years ago

  • Due date deleted (2021-07-28)
  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF