Project

General

Profile

Actions

action #175464

closed

coordination #102906: [saga][epic] Increased stability of tests with less "known failures", known incompletes handled automatically within openQA

coordination #175515: [epic] incomplete jobs with "Failed to find an available port: Address already in use"

jobs incomplete with auto_review:"setup failure: isotovideo can not be started"

Added by rainerkoenig 16 days ago. Updated 15 days ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2025-01-15
Due date:
% Done:

0%

Estimated time:

Description

Observation

Jobs are failing with setup failure: isotovideo can not be started on OSD e.g. https://openqa.suse.de/tests/16454025 after rsync is done.

Same symptoms on o3 e.g. https://openqa.opensuse.org/tests/4774219 after assets synced successfully.

https://openqa.opensuse.org/tests/4774741 and many others on o3 incomplete with "setup failure: isotovideo can not be started". No more details that seem to be related in autoinst-log.txt nor worker-log.txt

possible regression since 2025-01-13. From /var/log/zypp/history on w26

2025-01-13 09:49:42|install|os-autoinst|4.6.1736759998.a4f72cc-lp156.2000.1|x86_64||devel_openQA|5eea1b72f67fa131e63ac48ecaff2191352a5b3acfe6faf02c82013148ef164f|
2025-01-14 16:17:37|install|os-autoinst|4.6.1736869520.3d40ba7-lp156.2001.1|x86_64||devel_openQA|aa6827c564e410e9893e2bff91270229378cf6d1f882520923782ded1ede257b|

with the only commit included "855fbb60 (okurz/fix/diag_explain) t: Fix hidden output of 'diag explain'" which has changes only in "t/"

openQA-worker maybe? Last good version from https://openqa.opensuse.org/tests/4769715 seems to be crosschecking with /var/log/zypp/history "2025-01-13 15:59:48|install|openQA-worker|4.6.1736782755.0771fde7-lp156.7510.1|x86_64||devel_openQA|e0194139cae23cfdcb5b869bb92962b3106caf041cba9585752756416647cc30|". diff

6f71c5c20 (okurz/feature/openqa_gru, feature/openqa_gru) systemd: Fix premature kill of openqa-gru background processes
5742c2302 Bump eslint-config-prettier from 9.1.0 to 10.0.1
03a921907 Avoid calling `is_running` unnecessarily in `kill`

maybe ReadWriteProcess?

Suggestions

  • Identify recent changes relevant to the failure and mitigate as a first step
  • These are seen at least on grenache and worker24

Rollback steps

  • DONE Re-enable openQA-auto-update

Related issues 6 (0 open6 closed)

Related to openQA Project (public) - action #170209: [sporadic] auto_review:"Failed to find an available port: Address already in use":retry, produces incomplete jobs on OSD, multiple machines size:MResolvedmkittler2024-11-25

Actions
Related to openQA Infrastructure (public) - action #175473: OpenQA Jobs test - Incomplete jobs (not restarted) of last 24h alert SaltResolvedokurz2024-12-19

Actions
Has duplicate openQA Project (public) - action #175470: jobs incomplete with auto_review:"setup failure: isotovideo can not be started"Rejectedokurz2025-01-15

Actions
Has duplicate openQA Infrastructure (public) - action #175494: [openQA][worker][ipmi] isotovideo can not be startedRejected2025-01-15

Actions
Copied to openQA Project (public) - action #175482: jobs incomplete with auto_review:"setup failure: isotovideo can not be started" - why did no tests prevent this to be deployed on both o3+osd?Rejectedokurz2025-01-15

Actions
Copied to openQA Project (public) - action #175518: Conduct "lessons learned" with Five Why analysis for "jobs incomplete with setup failure: isotovideo can not be started" size:SResolvedlivdywan2025-01-24

Actions
Actions #1

Updated by rainerkoenig 16 days ago

  • Subject changed from setup failure: isotovideo can not be started to Setup failure: isotovideo can not be started on OSD and O3

Currently experiencing massive problems on OSD and O3 because tests report

setup failure: isotovideo can not be started

Example for OSD: https://openqa.suse.de/tests/16454025#

Example for O3: https://openqa.opensuse.org/tests/4774219
Just look at the overview on O3 to see how many tests have that dark red dot.

Actions #2

Updated by rainerkoenig 16 days ago

  • Priority changed from Normal to Urgent
Actions #3

Updated by livdywan 16 days ago

  • Description updated (diff)
  • Category set to Regressions/Crashes
  • Target version set to Ready
Actions #4

Updated by livdywan 16 days ago

  • Description updated (diff)
Actions #5

Updated by livdywan 16 days ago

  • Related to action #175470: jobs incomplete with auto_review:"setup failure: isotovideo can not be started" added
Actions #6

Updated by okurz 16 days ago

  • Subject changed from Setup failure: isotovideo can not be started on OSD and O3 to jobs incomplete with auto_review:"setup failure: isotovideo can not be started"
  • Description updated (diff)
  • Priority changed from Urgent to Immediate
Actions #7

Updated by okurz 16 days ago

  • Has duplicate action #175470: jobs incomplete with auto_review:"setup failure: isotovideo can not be started" added
Actions #8

Updated by okurz 16 days ago · Edited

On openqaworker26

zypper se --details perl-Mojo-IOLoop-ReadWriteProcess
…
zypper in --oldpackage perl-Mojo-IOLoop-ReadWriteProcess-0.340.0-bp156.1.1

triggered test with

openqa-clone-job --within-instance https://openqa.opensuse.org/tests/4774948 WORKER_CLASS=openqaworker26 _GROUP=0 BUILD= TEST+=-okurz

incompleted the same as before https://openqa.opensuse.org/tests/4774963#

Actions #9

Updated by okurz 16 days ago

  • Status changed from New to In Progress
  • Assignee set to okurz
openqaworker26:/tmp # snapper ls
Failed to set locale. Fix your system.
    # | Type   | Pre # | Date                     | User | Used Space | Cleanup | Description           | Userdata     
------+--------+-------+--------------------------+------+------------+---------+-----------------------+--------------
   0  | single |       |                          | root |            |         | current               |              
   1* | single |       | Fri Aug  4 15:09:15 2023 | root |   2.97 MiB |         | first root filesystem |              
6166  | pre    |       | Fri Dec  6 17:35:35 2024 | root | 742.77 MiB | number  | zypp(zypper)          | important=yes
6167  | post   |  6166 | Fri Dec  6 17:36:57 2024 | root | 194.69 MiB | number  |                       | important=yes
6274  | pre    |       | Fri Dec 13 16:54:45 2024 | root | 189.26 MiB | number  | zypp(zypper)          | important=yes
6275  | post   |  6274 | Fri Dec 13 16:55:34 2024 | root |   1.58 MiB | number  |                       | important=yes
6276  | pre    |       | Sat Dec 14 03:33:06 2024 | root | 944.00 KiB | number  | zypp(zypper)          | important=yes
6277  | post   |  6276 | Sat Dec 14 03:33:39 2024 | root | 136.96 MiB | number  |                       | important=yes
6316  | pre    |       | Tue Dec 17 09:32:26 2024 | root | 234.55 MiB | number  | zypp(zypper)          | important=yes
6317  | post   |  6316 | Tue Dec 17 09:33:11 2024 | root | 218.92 MiB | number  |                       | important=yes
6502  | pre    |       | Fri Jan  3 14:30:06 2025 | root | 266.17 MiB | number  | zypp(zypper)          | important=yes
6503  | post   |  6502 | Fri Jan  3 14:30:54 2025 | root | 213.94 MiB | number  |                       | important=yes
6642  | pre    |       | Tue Jan 14 12:33:31 2025 | root | 261.70 MiB | number  | zypp(zypper)          | important=no 
6643  | post   |  6642 | Tue Jan 14 12:33:46 2025 | root | 368.00 KiB | number  |                       | important=no 
6644  | pre    |       | Tue Jan 14 14:33:18 2025 | root | 352.00 KiB | number  | zypp(zypper)          | important=no 
6645  | post   |  6644 | Tue Jan 14 14:33:37 2025 | root | 416.00 KiB | number  |                       | important=no 
6646  | pre    |       | Tue Jan 14 16:17:31 2025 | root | 480.00 KiB | number  | zypp(zypper)          | important=no 
6647  | post   |  6646 | Tue Jan 14 16:17:50 2025 | root | 512.00 KiB | number  |                       | important=no 
6648  | pre    |       | Tue Jan 14 17:53:36 2025 | root | 416.00 KiB | number  | zypp(zypper)          | important=no 
6649  | post   |  6648 | Tue Jan 14 17:53:50 2025 | root | 384.00 KiB | number  |                       | important=no 
6650  | pre    |       | Tue Jan 14 19:17:31 2025 | root | 336.00 KiB | number  | zypp(zypper)          | important=no 
6651  | post   |  6650 | Tue Jan 14 19:17:38 2025 | root | 448.00 KiB | number  |                       | important=no 
6652  | pre    |       | Wed Jan 15 04:24:30 2025 | root | 320.00 KiB | number  | zypp(zypper)          | important=no 
6653  | post   |  6652 | Wed Jan 15 04:24:37 2025 | root | 304.00 KiB | number  |                       | important=no 
6654  | pre    |       | Wed Jan 15 04:36:31 2025 | root | 304.00 KiB | number  | zypp(zypper)          | important=no 
6655  | post   |  6654 | Wed Jan 15 04:36:37 2025 | root | 304.00 KiB | number  |                       | important=no 
6656  | pre    |       | Wed Jan 15 07:12:31 2025 | root | 304.00 KiB | number  | zypp(zypper)          | important=no 
6657  | post   |  6656 | Wed Jan 15 07:12:38 2025 | root | 304.00 KiB | number  |                       | important=no 
6658  | pre    |       | Wed Jan 15 07:54:31 2025 | root | 304.00 KiB | number  | zypp(zypper)          | important=no 
6659  | post   |  6658 | Wed Jan 15 07:54:37 2025 | root | 496.00 KiB | number  |                       | important=no 
…
openqaworker26:/tmp # snapper rollback 6503
Actions #10

Updated by okurz 16 days ago · Edited

triggered rollback on OSD from https://gitlab.suse.de/openqa/osd-deployment/-/jobs/3659170

w26 job after snapper rollback seems to be fine https://openqa.opensuse.org/tests/4774968#live

Actions #11

Updated by okurz 16 days ago

I assume the faulty packages are part of

2025-01-13 21:10:37|install|os-autoinst-distri-opensuse-deps|1.1736801914.80d1fed9-lp156.16130.1|
2025-01-14 09:51:04|install|os-autoinst-distri-opensuse-deps|1.1736847260.51421743-lp156.16131.1|
2025-01-14 10:37:42|install|openQA-common|4.6.1736849661.1b1680ba-lp156.7511.1
2025-01-14 10:37:42|install|openQA-client|4.6.1736849661.1b1680ba-lp156.7511.1
2025-01-14 10:37:45|install|openQA-worker|4.6.1736849661.1b1680ba-lp156.7511.1
2025-01-14 10:37:47|install|openQA-continuous-update|4.6.1736849661.1b1680ba-lp156.7511.1
2025-01-14 10:37:49|install|openQA-auto-update|4.6.1736849661.1b1680ba-lp156.7511.1
2025-01-14 11:40:36|install|os-autoinst-distri-opensuse-deps|1.1736851565.682d637b-lp156.16133.1|
2025-01-14 11:40:36|install|perl-Mojo-IOLoop-ReadWriteProcess|0.340.0-lp156.45.1|
2025-01-14 12:33:37|install|openQA-common|4.6.1736849661.1b1680ba-lp156.7512.1
2025-01-14 12:33:38|install|openQA-continuous-update|4.6.1736849661.1b1680ba-lp156.7512.1
2025-01-14 12:33:38|install|openQA-client|4.6.1736849661.1b1680ba-lp156.7512.1
2025-01-14 12:33:40|install|openQA-auto-update|4.6.1736849661.1b1680ba-lp156.7512.1
2025-01-14 12:33:45|install|openQA-worker|4.6.1736849661.1b1680ba-lp156.7512.1
2025-01-14 14:33:24|install|openQA-common|4.6.1736863711.05c983e0-lp156.7514.1
2025-01-14 14:33:24|install|os-autoinst-distri-opensuse-deps|1.1736864399.8b8e2d18-lp156.16136.1|
2025-01-14 14:33:25|install|openQA-client|4.6.1736863711.05c983e0-lp156.7514.1
2025-01-14 14:33:32|install|openQA-worker|4.6.1736863711.05c983e0-lp156.7514.1
2025-01-14 14:33:33|install|openQA-continuous-update|4.6.1736863711.05c983e0-lp156.7514.1
2025-01-14 14:33:35|install|openQA-auto-update|4.6.1736863711.05c983e0-lp156.7514.1
2025-01-14 16:17:37|install|os-autoinst|4.6.1736869520.3d40ba7-lp156.2001.1
2025-01-14 16:17:39|install|os-autoinst-openvswitch|4.6.1736869520.3d40ba7-lp156.2001.1
2025-01-14 16:17:39|install|openQA-common|4.6.1736869444.81f8cc9f-lp156.7515.1
2025-01-14 16:17:39|install|openQA-client|4.6.1736869444.81f8cc9f-lp156.7515.1
2025-01-14 16:17:45|install|openQA-worker|4.6.1736869444.81f8cc9f-lp156.7515.1
2025-01-14 16:17:46|install|openQA-continuous-update|4.6.1736869444.81f8cc9f-lp156.7515.1
2025-01-14 16:17:48|install|openQA-auto-update|4.6.1736869444.81f8cc9f-lp156.7515.1
2025-01-14 16:17:48|install|perl-Perl-Tidy|20250105.0.0-lp156.3.1|
2025-01-14 16:17:48|install|os-autoinst-devel|4.6.1736869520.3d40ba7-lp156.2001.1
2025-01-14 17:53:42|install|openQA-common|4.6.1736869444.81f8cc9f-lp156.7517.1
2025-01-14 17:53:44|install|openQA-continuous-update|4.6.1736869444.81f8cc9f-lp156.7517.1
2025-01-14 17:53:44|install|openQA-client|4.6.1736869444.81f8cc9f-lp156.7517.1
2025-01-14 17:53:45|install|openQA-auto-update|4.6.1736869444.81f8cc9f-lp156.7517.1
2025-01-14 17:53:49|install|openQA-worker|4.6.1736869444.81f8cc9f-lp156.7517.1
2025-01-14 19:17:36|install|os-autoinst-distri-opensuse-deps|1.1736881600.3e38d3ba-lp156.16137.1|
2025-01-15 04:24:36|install|os-autoinst-distri-opensuse-deps|1.1736914421.21888f1c-lp156.16138.1|
2025-01-15 04:36:36|install|os-autoinst-distri-opensuse-deps|1.1736915087.9b730b21-lp156.16139.1|
2025-01-15 07:12:36|install|os-autoinst-distri-opensuse-deps|1.1736924291.2cfd3894-lp156.16140.1|
2025-01-15 07:54:36|install|os-autoinst-distri-opensuse-deps|1.1736926855.94bc815c-lp156.16141.1|
2025-01-15 09:06:54|install|perl-Mojo-IOLoop-ReadWriteProcess|0.340.0-bp156.1.1|
Actions #12

Updated by okurz 16 days ago · Edited

Now trying to reproduce again on a different o3 worker but with broken state and then try to partially revert package versions

openqa-clone-job --within-instance https://openqa.opensuse.org/tests/4774948 WORKER_CLASS=openqaworker25 _GROUP=0 BUILD= TEST+=-okurz-w25

1 job has been created:

incompleted with https://openqa.opensuse.org/tests/4774997#, now reverting openQA packages.

Can't revert. find /var/cache/zypp/ | less only reveals one package "/var/cache/zypp/packages/devel_openQA/x86_64/openQA-common-4.6.1715785524.aacf9bc3-lp155.6658.1.x86_64.rpm" so there might be a regression from https://github.com/os-autoinst/openQA/pull/6104, reopening #174313

Actions #13

Updated by favogt 16 days ago

Diffing between working and broken snapshots yields this:

--- /.snapshots/6663/snapshot/usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm  2025-01-14 11:14:21.000000000 +0000
+++ /.snapshots/6662/snapshot/usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm  2023-10-17 08:24:30.000000000 +0000
@@ -403,8 +403,7 @@
 sub is_running {
   my ($self) = shift;
   $self->session->consume_collected_info;
-  return 0 unless my $pid = $self->process_id;
-  kill(0, ($self->kill_whole_group ? -$pid : $pid));
+  $self->process_id ? kill 0 => $self->process_id : 0;
 }

 sub write_pidfile {
@@ -520,6 +519,7 @@
   my $pid = $self->pid;
   return $self               unless defined $pid;
   return $self->_shutdown(1) unless $self->is_running;
+  $self->_diag("Stopping $pid") if DEBUG;

   my $ret;
   my $attempt      = 1;
@@ -527,9 +527,7 @@
   my $sleep_time   = $self->sleeptime_during_kill;
   my $max_attempts = $self->max_kill_attempts;
   my $signal       = $self->_default_kill_signal;
-  $pid = -$pid if $self->kill_whole_group;
-  $self->_diag("Stopping $pid") if DEBUG;
-
+  $pid = -getpgrp($pid) if $self->kill_whole_group;
   until ((defined $ret && ($ret == $pid || $ret == -1))
       || ($attempt > $max_attempts && $timeout <= 0))
   {

i.e.

openqaworker26:~ # rpm --root /.snapshots/6663/snapshot/ -q perl-Mojo-IOLoop-ReadWriteProcess
perl-Mojo-IOLoop-ReadWriteProcess-0.340.0-lp156.45.1.noarch
openqaworker26:~ # rpm --root /.snapshots/6664/snapshot/ -q perl-Mojo-IOLoop-ReadWriteProcess
perl-Mojo-IOLoop-ReadWriteProcess-0.340.0-bp156.1.1.noarch
Actions #15

Updated by okurz 16 days ago

  • Copied to action #175482: jobs incomplete with auto_review:"setup failure: isotovideo can not be started" - why did no tests prevent this to be deployed on both o3+osd? added
Actions #16

Updated by okurz 16 days ago

  • Description updated (diff)

Thanks @favogt

we verified that downgrading perl-Mojo-IOLoop-ReadWriteProcess works together with restarting openQA workers so doing

hosts="openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26 openqaworker27 openqaworker28 openqaworker-arm21 openqaworker-arm22 qa-power8-3"; for i in $hosts; do echo "### $i" && ssh root@$i "zypper -n in --oldpackage perl-Mojo-IOLoop-ReadWriteProcess-0.340.0-bp156.1.1 && systemctl mask openQA-auto-update && reboot" ; done
Actions #17

Updated by livdywan 16 days ago

  • Related to action #175494: [openQA][worker][ipmi] isotovideo can not be started added
Actions #18

Updated by livdywan 16 days ago

  • Related to deleted (action #175494: [openQA][worker][ipmi] isotovideo can not be started)
Actions #19

Updated by livdywan 16 days ago

  • Has duplicate action #175494: [openQA][worker][ipmi] isotovideo can not be started added
Actions #20

Updated by okurz 16 days ago

For OSD (together with nicksinger)

sudo salt \* cmd.run 'zypper -n in --oldpackage perl-Mojo-IOLoop-ReadWriteProcess-0.340.0-bp156.1.1'
sudo salt -C 'G@roles:worker' cmd.run 'zypper -n dup --dry-run'
Actions #21

Updated by gpathak 16 days ago

Restarted incomplete jobs on osd: host=openqa.suse.de ./openqa-advanced-retrigger-jobs

Actions #22

Updated by okurz 16 days ago

And for o3 I did

failed_since=2025-01-14 ./openqa-advanced-retrigger-jobs
Actions #23

Updated by nicksinger 16 days ago

I did:

salt -C 'G@roles:worker' cmd.run 'zypper in -f --oldpackage -r repo-oss perl-Mojo-IOLoop-ReadWriteProcess && bash -c "reboot &"'

on OSD to cover a downgrade of most OSD workers. Unfortunately not all workers have a repo named repo-oss so @okurz did another try with the explicit version call from https://progress.opensuse.org/issues/175464#note-8

On some hosts (e.g. sapworker1) I also used this command:

zypper lr -Eu | grep -E "leap/15\.[0-9]+/repo/oss/*$" | cut -d "|" -f 1 | awk "{\$1=\$1;print}" | xargs -I{} zypper in -y -n -f --oldpackage -r {} perl-Mojo-IOLoop-ReadWriteProcess

this would work on every worker but I never managed to execute it via salt (too much escaping needed)

Actions #24

Updated by ybonatakis 16 days ago · Edited

okurz wrote in #note-16:

Thanks @favogt

we verified that downgrading perl-Mojo-IOLoop-ReadWriteProcess works together with restarting openQA workers so doing

hosts="openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26 openqaworker27 openqaworker28 openqaworker-arm21 openqaworker-arm22 qa-power8-3"; for i in $hosts; do echo "### $i" && ssh root@$i "zypper -n in --oldpackage perl-Mojo-IOLoop-ReadWriteProcess-0.340.0-bp156.1.1 && systemctl mask openQA-auto-update && reboot" ; done

After the downgration of perl-Mojo-IOLoop-ReadWriteProcess I restarted all the O3 jobs I found via dry_run=1 host=ybonatakis@ariel.dmz-prg2.suse.org ./openqa-advanced-retrigger-jobs. I checked random jobs for isotovideo can not be started. Not reproduced. other incomplete jobs are still visible but it is due to missing assets

I continue going to the job groups and filter for incompete. not found any as mostly were in running/schedule

Actions #25

Updated by gpathak 16 days ago

gpathak wrote in #note-21:

Restarted incomplete jobs on osd: host=openqa.suse.de ./openqa-advanced-retrigger-jobs

Triggered failed jobs again:
host=openqa.suse.de failed_since=2025-01-14 ./openqa-advanced-retrigger-jobs

Actions #26

Updated by okurz 16 days ago

  • Priority changed from Immediate to High

With mitigations and rollbacks in place we can lower the prio. Thanks all for participating in the task force.

Actions #27

Updated by okurz 16 days ago

  • Related to action #170209: [sporadic] auto_review:"Failed to find an available port: Address already in use":retry, produces incomplete jobs on OSD, multiple machines size:M added
Actions #28

Updated by gpathak 16 days ago

  • Related to action #175473: OpenQA Jobs test - Incomplete jobs (not restarted) of last 24h alert Salt added
Actions #29

Updated by pcervinka 16 days ago

Job failed just few minutes ago on qesapworker-prg5:23 https://openqa.suse.de/tests/16458701/logfile?filename=worker-log.txt

[2025-01-15T12:02:33.979239Z] [info] [pid:4210] Preparing cgroup to start isotovideo
[2025-01-15T12:02:33.980406Z] [warn] [pid:4210] Disabling cgroup usage because cgroup creation failed: mkdir /sys/fs/cgroup/systemd: Permission denied at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/File.pm line 85.
[2025-01-15T12:02:33.980559Z] [info] [pid:4210] You can define a custom slice with OPENQA_CGROUP_SLICE or indicating the base mount with MOJO_CGROUP_FS.
[2025-01-15T12:02:33.980914Z] [info] [pid:4210] Starting isotovideo container
[2025-01-15T12:02:33.985090Z] [debug] [pid:4210] Registered process:94385
[2025-01-15T12:02:33.986015Z] [error] [pid:4210] Unable to setup job 16458701: isotovideo can not be started
Actions #30

Updated by okurz 16 days ago

  • Parent task set to #175515
Actions #31

Updated by okurz 16 days ago

  • Copied to action #175518: Conduct "lessons learned" with Five Why analysis for "jobs incomplete with setup failure: isotovideo can not be started" size:S added
Actions #32

Updated by okurz 16 days ago

still recent incompletes with same error as reported, e.g. https://openqa.suse.de/tests/16459246 from w40. I confirmed that w40 has the correct downgraded package version but maybe worker processes have still not been restarted. Trying sudo systemctl restart openqa-worker.target

Called

sudo salt -C 'G@roles:worker' cmd.run 'sudo systemctl restart openqa-worker.target'

and once again

host=openqa.suse.de ./openqa-advanced-retrigger-jobs
Actions #33

Updated by okurz 16 days ago

  • Description updated (diff)
  • Due date set to 2025-01-29
  • Status changed from In Progress to Feedback

Called on ariel

hosts="openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26 openqaworker27 openqaworker28 openqaworker-arm21 openqaworker-arm22 qa-power8-3"; for i in $hosts; do echo "### $i" && ssh root@$i "systemctl unmask openQA-auto-update" ; done

I think it would have been "openqa-auto-update" anyway. hosts="openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26 openqaworker27 openqaworker28 openqaworker-arm21 openqaworker-arm22 qa-power8-3"; for i in $hosts; do echo "### $i" && ssh root@$i "rpm -q perl-Mojo-IOLoop-ReadWriteProcess" ; done also looks ok.

I will check the state again in some hours and/or days.

Actions #34

Updated by pcervinka 16 days ago

okurz wrote in #note-33:

Called on ariel

hosts="openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26 openqaworker27 openqaworker28 openqaworker-arm21 openqaworker-arm22 qa-power8-3"; for i in $hosts; do echo "### $i" && ssh root@$i "systemctl unmask openQA-auto-update" ; done

what about qesapworker-prg5?

Actions #35

Updated by okurz 16 days ago

pcervinka wrote in #note-34:

okurz wrote in #note-33:

Called on ariel

hosts="openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26 openqaworker27 openqaworker28 openqaworker-arm21 openqaworker-arm22 qa-power8-3"; for i in $hosts; do echo "### $i" && ssh root@$i "systemctl unmask openQA-auto-update" ; done

what about qesapworker-prg5?

That's part of OSD and I found that openQA worker processes have not all been consistently restarted explaining your observation from #175464-29, fixed meanwhile. A cloned job from the scenario you referenced passed meanwhile on another worker: https://openqa.suse.de/tests/16458902
qesapworker-prg5 is also ok by now, e.g. see https://openqa.suse.de/tests/16460304

Actions #36

Updated by okurz 15 days ago

  • Due date deleted (2025-01-29)
  • Status changed from Feedback to Resolved

All still good. Follow-up tickets created. Also select id,test,clone_id,t_started,t_finished from jobs where result='incomplete' and reason like '%isotovideo can not be started%' and t_finished >= '2025-01-15 11:00' and clone_id is null; did not return any more matches.

Actions

Also available in: Atom PDF