https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842022-11-10T11:24:53ZopenSUSE Project Management ToolopenQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5730662022-11-10T11:24:53Zokurzokurz@suse.com
<ul><li><strong>Copied from</strong> <i><a class="issue tracker-4 status-3 priority-6 priority-high2 closed child" href="/issues/109241">action #109241</a>: Prefer to use domain names rather than IPv4 in salt pillars size:M</i> added</li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5730752022-11-10T11:35:33Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" href="/issues/119443">action #119443</a>: Conduct the migration of SUSE openQA systems from Nbg SRV1 to new security zones size:M</i> added</li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5732102022-11-10T13:33:14Zmkittlermarius.kittler@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/573210/diff?detail_id=539550">diff</a>)</li><li><strong>Assignee</strong> set to <i>mkittler</i></li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5732462022-11-10T13:38:29Zmkittlermarius.kittler@suse.com
<ul><li><strong>Subject</strong> changed from <i>tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get "worker2" or something</i> to <i>tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get "worker2" or something auto_review:".*curl.*worker\d+:.*failed at.*":retry</i></li></ul><p>Letting auto-review handle the restarting for future occurrences.</p>
<p>About recent jobs: I found all affected jobs by running <code>for d in 0991* 0992* ; do grep --files-with-matches -PR --include autoinst-log.txt '.*curl.*worker\d+:.*failed at.*' "$d" ; done > /home/martchus/failed-du-to-hostname.txt</code> on OSD and restarted them via <code>for i in $(grep --invert-match ':retry' /home/martchus/failed-du-to-hostname.txt | cut -f2 --delimiter\=/ | cut -f1 --delimiter\=-) ; do sudo openqa-cli api --host "openqa.suse.de" -X POST jobs/"$i"/restart ; done</code>. This automatically skips restarting jobs that have already been restarted.</p>
<p>The list of affected jobs found that way is fortunately not <em>that</em> long and seems plausible.</p>
<p>I'm currently checking on the restarted jobs via:</p>
<pre><code>martchus@openqa:~> cat restarts.json | jq -r '[.[] | .result | .[] | map(.)] | flatten | sort | join(",")'
9921403,9921404,9921405,9921406,9921407,9921408,9921409,9921410,9921411,9921412,9921413,9921414,9921415,9921416,9921417,9921418,9921419,9921420,9921421,9921422,9921423,9921424,9921425,9921426,9921427,9921428,9921429
[martchus@linux-9lzf ~]$ openqa-mon https://openqa.suse.de --jobs 9921403,9921404,9921405,9921406,9921407,9921408,9921409,9921410,9921411,9921412,9921413,9921414,9921415,9921416,9921417,9921418,9921419,9921420,9921421,9921422,9921423,9921424,9921425,9921426,9921427,9921428,9921429
</code></pre> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5733302022-11-10T15:40:49Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5734952022-11-11T04:28:46Zopenqa_reviewopenqa-review@suse.de
<ul><li><strong>Due date</strong> set to <i>2022-11-25</i></li></ul><p>Setting due date based on mean cycle time of SUSE QE Tools</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5738462022-11-11T13:41:51Zacarvajalacarvajal@suse.com
<ul></ul><p>Noticed several HA jobs failing on this as well, for example:</p>
<p><a href="https://openqa.suse.de/tests/9917615#step/patch_sle/96" class="external">https://openqa.suse.de/tests/9917615#step/patch_sle/96</a><br>
<a href="https://openqa.suse.de/tests/9917874#step/setup_hosts_and_luns/12" class="external">https://openqa.suse.de/tests/9917874#step/setup_hosts_and_luns/12</a></p>
<p>Tagging those failures with the poo#.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5738642022-11-11T14:37:25Zokurzokurz@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/573864/diff?detail_id=540096">diff</a>)</li></ul><p>Investigating this together in the team. We learned from the man page of hostname <code>man 5 hostname</code> that /etc/hostname should really only contain a label so we should not put an FQDN there.</p>
<p>As acarvajal stated the issue still reproduced. One open point:</p>
<ol>
<li><em>DONE</em> Why did auto-review not label <a href="https://openqa.suse.de/tests/9917615#step/patch_sle/96">https://openqa.suse.de/tests/9917615#step/patch_sle/96</a>? -> we used double quotes for two terms in the subject line which was the main problem, maybe also the <code>:</code></li>
</ol>
<p>We reviewed the variables, e.g. WORKER_HOSTNAME and despite slight ambiguity regarding the term "hostname", see <a href="https://en.wikipedia.org/wiki/Hostname">https://en.wikipedia.org/wiki/Hostname</a> , we agreed that the variable name is ok.</p>
<p>os-autoinst/testapi.pm defines a function <code>host_ip</code> which for qemu returns an IPv4 address but in other cases returns the value of WORKER_HOSTNAME which normally shouldn't be an IPv4 address but a FQDN.</p>
<ol>
<li>We should consider adding a new method "host_address" returning what host_ip does but providing a default value for WORKER_HOSTNAME if not set and deprecate host_ip with a log message</li>
</ol>
<p>-> <a href="https://github.com/os-autoinst/os-autoinst/pull/2202">https://github.com/os-autoinst/os-autoinst/pull/2202</a></p>
<p>We found that <a href="https://openqa.suse.de/tests/9925066">https://openqa.suse.de/tests/9925066</a> from 2022-11-10 23:37:35Z still has a correct WORKER_HOSTNAME=worker2.oqa.suse.de </p>
<ol>
<li>openQA itself uses a "worker_hostname" attribute that it initializes from <code>POSIX::uname()</code> so e.g. resolving to worker2 regardless of the FQDN. This is something that we should change: Either get rid of this at all if it's only for logging or update the variable name or change it to return what WORKER_HOSTNAME in the config file would state</li>
</ol>
<p>Coming back, so <a href="https://openqa.suse.de/tests/9925066">https://openqa.suse.de/tests/9925066</a> from 2022-11-10 23:37:35Z is our "last good". The next job on the assigned worker <a href="https://openqa.suse.de/admin/workers/2122" class="external">worker2:45</a> <a href="https://openqa.suse.de/tests/9925838">https://openqa.suse.de/tests/9925838</a> has WORKER_HOSTNAME=worker2 in <a href="https://openqa.suse.de/tests/9925838/file/vars.json">https://openqa.suse.de/tests/9925838/file/vars.json</a></p>
<p>We took a look into system journal. With <code>journalctl --since=today | grep -v 'worker2 worker\[' | less</code> I found</p>
<pre><code>Nov 11 00:00:10 worker2 openqa-worker-cacheservice-minion[28605]: [28605] [i] [#10801] Sync: "rsync://openqa.suse.de/tests" to "/var/lib/openqa/cache/openqa.suse.de"
Nov 11 00:00:10 worker2 openqa-worker-cacheservice-minion[28605]: [28605] [i] [#10801] Calling: rsync -avHP --timeout 1800 rsync://openqa.suse.de/tests/ --delete /var/lib/openqa/cache/openqa.suse.de/tests/
Nov 11 00:01:00 worker2 telegraf[11514]: 2022-11-10T23:01:00Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state failed': hostname: Name or service not known
Nov 11 00:01:00 worker2 telegraf[11514]: 2022-11-10T23:01:00Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state masked': hostname: Name or service not known
</code></pre>
<p>but this problem persists periodically. Specifically what that bash script calls is <code>hostname -f</code>. Calling</p>
<pre><code>sudo salt --no-color --state-output=changes -C 'G@roles:worker' cmd.run 'hostname -f'
</code></pre>
<p>which shows that this problem really only appears on worker2. I have for now removed worker2 from salt keys and manually edited /etc/openqa/workers.ini to have the FQDN. To be sure that we have consistent data I triggered a reboot of worker2.</p>
<p>After reboot the command <code>hostname -f</code> works just fine also on worker2. Maybe we just never rebooted the host since the migration and something was not restarted or so?</p>
<ol>
<li>If tests turn out to be ok I suggest to readd the machine back to salt, let salt apply a high state and monitor tests for at least 2 days</li>
</ol>
<p><a class="user active user-mention" href="https://progress.opensuse.org/users/22072">@mkittler</a> please decide yourself which of the above points you would like to follow-up with yourself or pull out into separate tickets</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5741282022-11-14T09:44:43Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-5 priority-4 priority-default closed" href="/issues/120363">action #120363</a>: [qe-core][functional] test fails in prepare_test_data</i> added</li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5741372022-11-14T10:00:18Zokurzokurz@suse.com
<ul></ul><p>Output of <code>openqa-query-for-job-label poo#120261</code> as of now:</p>
<pre><code>[[B9926841|2022-11-11 14:26:35|done|failed|docker_tests||worker2
9926842|2022-11-11 14:25:59|done|failed|podman_tests||worker2
9926833|2022-11-11 14:18:33|done|failed|kubectl_tests||worker2
9918009|2022-11-11 05:14:06|done|failed|default||worker2
9917877|2022-11-11 05:09:14|done|failed|migration_offline_scc_verify_sle12sp5_ha_alpha_node02||worker2
9917874|2022-11-11 05:09:14|done|failed|migration_offline_scc_verify_sle12sp5_ha_alpha_node01||worker2
9917912|2022-11-11 04:42:34|done|failed|migration_online_sle15sp3_ha_alpha_node01||worker2
9917914|2022-11-11 04:07:53|done|failed|migration_online_sle15sp3_ha_alpha_node02||worker2
9917615|2022-11-11 02:03:44|done|failed|migration_offline_dvd_sle15sp3_ha_alpha_node02||worker2
9917612|2022-11-11 01:59:47|done|failed|migration_offline_dvd_sle15sp3_ha_alpha_node01||worker2
</code></pre>
<p>so no further failed jobs or regex fails to match.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5741402022-11-14T10:04:39Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get "worker2" or something auto_review:".*curl.*worker\d+:.*failed at.*":retry</i> to <i>tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get "worker2" or something auto_review:".*curl.*worker\d+.*failed at.*":retry</i></li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5741642022-11-14T10:31:25Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get "worker2" or something auto_review:".*curl.*worker\d+.*failed at.*":retry</i> to <i>tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+.*failed at.*":retry</i></li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5741912022-11-14T12:34:50Zmkittlermarius.kittler@suse.com
<ul></ul><p>Note that the <code>"</code> in the ticket name prevented <code>openqa-label-known-issues</code> to work. That's why it was changed. Maybe we should document this limitation somewhere.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5742122022-11-14T12:59:43Zokurzokurz@suse.com
<ul></ul><p>Over the weekend no further jobs were labeled with this ticket so we assume that our hypothesis is correct that the salt updated the worker config file with the incomplete hostname and not the FQDN due to <code>hostname -f</code> not being able to return the full hostname, maybe shortly after bootup. So I am conducting a "reboot test":</p>
<pre><code>host=worker2.oqa.suse.de; for run in {1..5}; do for host in $host; do echo -n "run: $run, $host: ping .. " && timeout -k 5 600 sh -c "until ping -c30 $host >/dev/null; do :; done" && echo -n "ok, ssh .. " && timeout -k 5 600 sh -c "until nc -z -w 1 $host 22; do :; done" && echo -n "ok, hostname/uptime/reboot: " && ssh $host "hostname -f && uptime && sudo reboot" && sleep 120 || break; done || break; done
</code></pre>
<p>result:</p>
<pre><code>run: 1, worker2.oqa.suse.de: ping .. ok, ssh .. ok, hostname/uptime/reboot: worker2.oqa.suse.de
13:54:52 up 1:58, 1 user, load average: 5.81, 5.87, 6.38
Connection to worker2.oqa.suse.de closed by remote host.
run: 2, worker2.oqa.suse.de: ping .. ok, ssh .. ok, hostname/uptime/reboot: worker2.oqa.suse.de
14:04:19 up 0:03, 0 users, load average: 2.93, 2.36, 1.01
Connection to worker2.oqa.suse.de closed by remote host.
run: 3, worker2.oqa.suse.de: ping .. ok, ssh .. ok, hostname/uptime/reboot: worker2.oqa.suse.de
14:13:44 up 0:03, 0 users, load average: 3.06, 2.47, 1.05
Connection to worker2.oqa.suse.de closed by remote host.
run: 4, worker2.oqa.suse.de: ping .. ok, ssh .. ok, hostname/uptime/reboot: worker2.oqa.suse.de
14:23:05 up 0:03, 0 users, load average: 2.21, 1.68, 0.72
Connection to worker2.oqa.suse.de closed by remote host.
run: 5, worker2.oqa.suse.de: ping .. ok, ssh .. ok, hostname/uptime/reboot: worker2.oqa.suse.de
14:32:31 up 0:03, 0 users, load average: 2.43, 2.47, 1.11
Connection to worker2.oqa.suse.de closed by remote host.
</code></pre>
<p>so it looks like there is no problem to retrieve the FQDN after every reboot. What could be happening is that still during early system initialization the hostname can not be properly retrieved. The command <code>sudo journalctl -b | grep hostname</code> reveals:</p>
<pre><code>Nov 14 14:39:23 worker2 ovs-ctl[1812]: hostname: Name or service not known
Nov 14 14:39:23 worker2 ovs-vsctl[1817]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait add Open_vSwitch . external-ids hostname=worker2
Nov 14 14:39:23 worker2 ovs-ctl[1945]: hostname: Name or service not known
Nov 14 14:39:23 worker2 ovs-vsctl[1954]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --no-wait add Open_vSwitch . external-ids hostname=worker2
Nov 14 14:40:03 worker2 worker[3023]: - worker hostname: worker2
Nov 14 14:40:03 worker2 worker[2986]: - worker hostname: worker2
Nov 14 14:40:03 worker2 worker[3024]: - worker hostname: worker2
Nov 14 14:40:03 worker2 worker[2987]: - worker hostname: worker2
Nov 14 14:40:03 worker2 worker[3027]: - worker hostname: worker2
Nov 14 14:40:03 worker2 worker[2985]: - worker hostname: worker2
…
Nov 14 14:40:04 worker2 worker[2988]: - worker hostname: worker2
Nov 14 14:40:07 worker2 salt-minion[3182]: [ERROR ] Master hostname: 'openqa.suse.de' not found or not responsive. Retrying in 30 seconds
Nov 14 14:40:37 worker2 salt-minion[3182]: [ERROR ] Master hostname: 'openqa.suse.de' not found or not responsive. Retrying in 30 seconds
Nov 14 14:41:00 worker2 telegraf[2839]: 2022-11-14T13:41:00Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state failed': hostname: Name or service not known
Nov 14 14:41:00 worker2 telegraf[2839]: 2022-11-14T13:41:00Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state masked': hostname: Name or service not known
Nov 14 14:41:07 worker2 salt-minion[3182]: [ERROR ] Master hostname: 'openqa.suse.de' not found or not responsive. Retrying in 30 seconds
Nov 14 14:41:37 worker2 salt-minion[3182]: [ERROR ] Master hostname: 'openqa.suse.de' not found or not responsive. Retrying in 30 seconds
Nov 14 14:42:17 worker2 worker[5628]: - worker hostname: worker2
…
Nov 14 15:04:48 worker2 worker[7783]: - worker hostname: worker2
</code></pre>
<p>Anyway, following up the points from <a class="issue tracker-4 status-3 priority-6 priority-high2 closed child" title="action: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or somethin... (Resolved)" href="https://progress.opensuse.org/issues/120261#note-8">#120261#note-8</a> would help</p>
<p>EDIT: Maybe also <a href="https://askubuntu.com/questions/185070/why-do-i-get-hostname-name-or-service-not-known-error/185084">https://askubuntu.com/questions/185070/why-do-i-get-hostname-name-or-service-not-known-error/185084</a> and <a href="https://unix.stackexchange.com/questions/283550/i-got-error-hostname-name-or-service-not-known-when-checking-ip-of-hostname">https://unix.stackexchange.com/questions/283550/i-got-error-hostname-name-or-service-not-known-when-checking-ip-of-hostname</a> and <a href="https://man7.org/linux/man-pages/man8/nss-myhostname.8.html">https://man7.org/linux/man-pages/man8/nss-myhostname.8.html</a> help</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5747702022-11-15T11:40:08Zmkittlermarius.kittler@suse.com
<ul></ul><p>Two improvements for the worker:</p>
<ul>
<li><a href="https://github.com/os-autoinst/openQA/pull/4899" class="external">https://github.com/os-autoinst/openQA/pull/4899</a></li>
<li><a href="https://github.com/os-autoinst/openQA/pull/4900" class="external">https://github.com/os-autoinst/openQA/pull/4900</a></li>
</ul>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5748032022-11-15T12:11:41Zmkittlermarius.kittler@suse.com
<ul></ul><p>The latest occurrence is still from 2022-11-11 (based on the automatic labeling which should now work as we've tested yesterday).</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5750642022-11-16T08:08:44Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-5 priority-high3 closed behind-schedule" href="/issues/120579">action #120579</a>: test fails in openqa_worker</i> added</li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5757602022-11-17T10:50:41Zlivdywanliv.dywan@suse.com
<ul><li><strong>Subject</strong> changed from <i>tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+.*failed at.*":retry</i> to <i>tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+.*failed at.*":retry size:meow</i></li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/575760/diff?detail_id=541773">diff</a>)</li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5758232022-11-17T12:56:55Zmkittlermarius.kittler@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/575823/diff?detail_id=541851">diff</a>)</li></ul><p>The most recent jobs returned by <code>openqa-query-for-job-label poo#120261</code> are still from 11-11. So I assume also relying on the new worker feature which was tested on worker11 and worker12 didn't make things worse. So I'm preparing a SR to use it everywhere. I'll also add worker11 and worker12 back to salt.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5758322022-11-17T13:18:30Zmkittlermarius.kittler@suse.com
<ul></ul><p>SR (see previous comment): <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/772" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/772</a></p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5760332022-11-18T09:33:03Zmkittlermarius.kittler@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/576033/diff?detail_id=542073">diff</a>)</li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5761592022-11-18T10:42:09Zokurzokurz@suse.com
<ul></ul><p><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/772" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/772</a> merged. Please monitor the impact and also remove <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls#L1160" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls#L1160</a> when this works</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5762162022-11-18T11:56:39Zmkittlermarius.kittler@suse.com
<ul></ul><p>The SR has been merged. So far it looks still good.</p>
<p>Here's another SR to remove the explicit override of <code>WORKER_HOSTNAME</code> for grenache-1: <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/463" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/463</a></p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5762462022-11-18T12:41:32Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5763242022-11-20T18:59:32Zokurzokurz@suse.com
<ul></ul><p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/463" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/463</a> merged.</p>
<p>Please look into:</p>
<pre><code>$ openqa-query-for-job-label poo#120261
9986752|2022-11-18 09:36:39|done|failed|gi-guest_developing-on-host_developing-xen||worker2
9981295|2022-11-17 22:38:36|done|failed|sriov_pci_passthrough-guest_sles12sp5_fv-on-host_developing-xen||worker2
</code></pre> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5764632022-11-21T09:44:05Zokurzokurz@suse.com
<ul></ul><p>FYI: Traffic between .oqa.suse.de and .qa.suse.de unblocked, see <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Conduct the migration of SUSE QA systems (non-tools-team maintained) from Nbg SRV1 to new securit... (Resolved)" href="https://progress.opensuse.org/issues/120264#note-9">#120264#note-9</a></p>
<p>I checked<br>
<a href="https://openqa.suse.de/tests/9986752#step/xen_guest_irqbalance/175" class="external">https://openqa.suse.de/tests/9986752#step/xen_guest_irqbalance/175</a> and found it failing to upload but the FQDN was specified so that's not the same problem. Same for <a href="https://openqa.suse.de/tests/9981295#step/sriov_network_card_pci_passthrough/173" class="external">https://openqa.suse.de/tests/9981295#step/sriov_network_card_pci_passthrough/173</a></p>
<p>I think worker2 can be considered good again and you can conduct the rollback steps and conclude.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5766602022-11-21T15:58:33Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>I don't know on which basis we can consider worker2 specifically good again because it wasn't in salt (so salt would not have overridden the FQDN setting with a short hostname anyways). However, all the other workers look still good so I suppose the setup relying on the worker auto-detection is good enough. I added worker2 back to salt so it is now also using that setup.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5768372022-11-22T09:05:17Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>Feedback</i></li></ul><p><a href="https://suse.slack.com/archives/C02CANHLANP/p1669103545665849?thread_ts=1669103545.665849&cid=C02CANHLANP" class="external">https://suse.slack.com/archives/C02CANHLANP/p1669103545665849?thread_ts=1669103545.665849&cid=C02CANHLANP</a></p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5768502022-11-22T09:30:06Zmkittlermarius.kittler@suse.com
<ul></ul><p>Looks like <code>WORKER_HOSTNAME</code> was completely empty when <a href="https://openqa.suse.de/tests/10011018" class="external">https://openqa.suse.de/tests/10011018</a> ran (not entry in <a href="https://openqa.suse.de/tests/10011018/file/vars.json" class="external">https://openqa.suse.de/tests/10011018/file/vars.json</a> and variable not logged in <a href="https://openqa.suse.de/tests/10011018/logfile?filename=worker-log.txt" class="external">https://openqa.suse.de/tests/10011018/logfile?filename=worker-log.txt</a>). Likely the worker couldn't determine the FQDN on startup due to some network issue.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5768642022-11-22T10:03:37Zmkittlermarius.kittler@suse.com
<ul></ul><p>I had a closer look at the logs¹. Apparently just an old version of the worker package was installed at the time. Only on "Nov 22 06:22:49" logs show "worker address (WORKER_HOSTNAME)" (log message from newer worker version) and not just "worker hostname" (log message from older worker version). If it is just the worker version then we should be good again (and don't need to blame DNS availability).</p>
<hr>
<p>¹</p>
<pre><code>sudo journalctl --since '7 hours ago' -fu openqa-worker-auto-restart@45
…
Nov 22 05:22:38 worker2 worker[30782]: [debug] [pid:30782] Informing openqa.suse.de that we are going offline
Nov 22 05:22:38 worker2 systemd[1]: openqa-worker-auto-restart@45.service: Deactivated successfully.
Nov 22 05:22:38 worker2 systemd[1]: openqa-worker-auto-restart@45.service: Scheduled restart job, restart counter is at 59.
Nov 22 05:22:38 worker2 systemd[1]: Stopped openQA Worker #45.
Nov 22 05:22:38 worker2 systemd[1]: Starting openQA Worker #45...
Nov 22 05:22:38 worker2 systemd[1]: Started openQA Worker #45.
Nov 22 05:22:39 worker2 worker[32113]: [info] [pid:32113] worker 45:
Nov 22 05:22:39 worker2 worker[32113]: - config file: /etc/openqa/workers.ini
Nov 22 05:22:39 worker2 worker[32113]: - worker hostname: worker2
Nov 22 05:22:39 worker2 worker[32113]: - isotovideo version: 33
Nov 22 05:22:39 worker2 worker[32113]: - websocket API version: 1
Nov 22 05:22:39 worker2 worker[32113]: - web UI hosts: openqa.suse.de
Nov 22 05:22:39 worker2 worker[32113]: - class: s390-kvm-sle12,worker2
Nov 22 05:22:39 worker2 worker[32113]: - no cleanup: no
Nov 22 05:22:39 worker2 worker[32113]: - pool directory: /var/lib/openqa/pool/45
Nov 22 05:22:39 worker2 worker[32113]: [info] [pid:32113] Project dir for host openqa.suse.de is /var/lib/openqa/share
Nov 22 05:22:39 worker2 worker[32113]: [info] [pid:32113] Registering with openQA openqa.suse.de
Nov 22 05:22:39 worker2 worker[32113]: [info] [pid:32113] Establishing ws connection via ws://openqa.suse.de/api/v1/ws/2122
Nov 22 05:22:39 worker2 worker[32113]: [info] [pid:32113] Registered and connected via websockets with openQA host openqa.suse.de and worker ID 2122
Nov 22 06:18:04 worker2 worker[32113]: [warn] [pid:32113] Websocket connection to http://openqa.suse.de/api/v1/ws/2122 finished by remote side with code 1006, no reason - trying again in 10 seconds
Nov 22 06:18:14 worker2 worker[32113]: [info] [pid:32113] Registering with openQA openqa.suse.de
Nov 22 06:18:16 worker2 worker[32113]: [info] [pid:32113] Establishing ws connection via ws://openqa.suse.de/api/v1/ws/2122
Nov 22 06:18:18 worker2 worker[32113]: [info] [pid:32113] Registered and connected via websockets with openQA host openqa.suse.de and worker ID 2122
Nov 22 06:22:46 worker2 systemd[1]: Reloading openQA Worker #45...
Nov 22 06:22:46 worker2 worker[32113]: [info] [pid:32113] Received signal HUP
Nov 22 06:22:46 worker2 worker[32113]: [debug] [pid:32113] Informing openqa.suse.de that we are going offline
Nov 22 06:22:46 worker2 systemd[1]: Reloaded openQA Worker #45.
Nov 22 06:22:46 worker2 systemd[1]: openqa-worker-auto-restart@45.service: Deactivated successfully.
Nov 22 06:22:46 worker2 systemd[1]: openqa-worker-auto-restart@45.service: Scheduled restart job, restart counter is at 60.
Nov 22 06:22:46 worker2 systemd[1]: Stopped openQA Worker #45.
Nov 22 06:22:46 worker2 systemd[1]: Starting openQA Worker #45...
Nov 22 06:22:46 worker2 systemd[1]: Started openQA Worker #45.
Nov 22 06:22:49 worker2 worker[19963]: [info] [pid:19963] worker 45:
Nov 22 06:22:49 worker2 worker[19963]: - config file: /etc/openqa/workers.ini
Nov 22 06:22:49 worker2 worker[19963]: - name used to register: worker2
Nov 22 06:22:49 worker2 worker[19963]: - worker address (WORKER_HOSTNAME): worker2.oqa.suse.de
Nov 22 06:22:49 worker2 worker[19963]: - isotovideo version: 34
Nov 22 06:22:49 worker2 worker[19963]: - websocket API version: 1
Nov 22 06:22:49 worker2 worker[19963]: - web UI hosts: openqa.suse.de
Nov 22 06:22:49 worker2 worker[19963]: - class: s390-kvm-sle12,worker2
Nov 22 06:22:49 worker2 worker[19963]: - no cleanup: no
Nov 22 06:22:49 worker2 worker[19963]: - pool directory: /var/lib/openqa/pool/45
Nov 22 06:22:49 worker2 worker[19963]: [info] [pid:19963] Project dir for host openqa.suse.de is /var/lib/openqa/share
Nov 22 06:22:49 worker2 worker[19963]: [info] [pid:19963] Registering with openQA openqa.suse.de
Nov 22 06:22:49 worker2 worker[19963]: [info] [pid:19963] Establishing ws connection via ws://openqa.suse.de/api/v1/ws/2122
Nov 22 06:22:49 worker2 worker[19963]: [info] [pid:19963] Registered and connected via websockets with openQA host openqa.suse.de and worker ID 2122
Nov 22 08:44:35 worker2 worker[19963]: [info] [pid:19963] Received signal HUP
Nov 22 08:44:35 worker2 worker[19963]: [debug] [pid:19963] Informing openqa.suse.de that we are going offline
Nov 22 08:44:35 worker2 systemd[1]: Reloading openQA Worker #45...
Nov 22 08:44:36 worker2 systemd[1]: openqa-worker-auto-restart@45.service: Deactivated successfully.
Nov 22 08:44:36 worker2 systemd[1]: Reloaded openQA Worker #45.
Nov 22 08:44:36 worker2 systemd[1]: openqa-worker-auto-restart@45.service: Scheduled restart job, restart counter is at 61.
Nov 22 08:44:36 worker2 systemd[1]: Stopped openQA Worker #45.
</code></pre> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5768712022-11-22T10:05:22Zlivdywanliv.dywan@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-6 priority-high2 closed child" href="/issues/120441">action #120441</a>: OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow</i> added</li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5768772022-11-22T10:09:34Zmkittlermarius.kittler@suse.com
<ul></ul><p>I've just added the worker back to salt and applied salt states immediately. I suppose I should have been updating all packages as well. Now it is good anyways because a deployment was done (automatically) meanwhile.</p>
<p>Since it was just that we can likely give the FQDN setup another chance (before resorting back to using IPs).</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5768832022-11-22T10:14:46Zmkittlermarius.kittler@suse.com
<ul><li><strong>Related to</strong> deleted (<i><a class="issue tracker-4 status-3 priority-6 priority-high2 closed child" href="/issues/120441">action #120441</a>: OSD parallel jobs failed with "get_job_autoinst_url: No worker info for job xxx available" size:meow</i>)</li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5773502022-11-23T12:39:19Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>I found no further jobs so I suppose it is good after the update.</p>
<p>(I actually found <a href="https://openqa.suse.de/tests/10013250" class="external">https://openqa.suse.de/tests/10013250</a> but it had the correct <code>WORKER_HOSTNAME</code> setting and failed because the SUT could not resolve the FQDN, see <a href="https://openqa.suse.de/tests/10013250#step/journal_check/8" class="external">https://openqa.suse.de/tests/10013250#step/journal_check/8</a>. I removed the wrong ticket reference on that job. It has already been restarted, ran on worker2 again and also succeeded. Not sure why it couldn't resolve the <em>correct</em> FQDN in this case but since the FQDN itself was correct I'd ignore this for now unless we see it more frequently.)</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5786382022-11-28T12:16:01ZMDouchamartin.doucha@suse.com
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>Workable</i></li></ul><p>New cases have popped up:<br>
<a href="https://openqa.suse.de/tests/10041747#step/svirt_upload_assets/9" class="external">https://openqa.suse.de/tests/10041747#step/svirt_upload_assets/9</a><br>
<a href="https://openqa.suse.de/tests/10041749#step/install_ltp/429" class="external">https://openqa.suse.de/tests/10041749#step/install_ltp/429</a><br>
<a href="https://openqa.suse.de/tests/10042208#step/boot_ltp/149" class="external">https://openqa.suse.de/tests/10042208#step/boot_ltp/149</a></p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5787942022-11-28T17:22:46Zmkittlermarius.kittler@suse.com
<ul></ul><p>The hostname resolution was indeed not working:</p>
<pre><code>martchus@worker2:~> sudo journalctl -u openqa-worker-auto-restart@* | grep 'worker address'
Nov 23 21:41:33 worker2 worker[22686]: - worker address (WORKER_HOSTNAME): worker2.oqa.suse.de
Nov 23 21:46:18 worker2 worker[25007]: - worker address (WORKER_HOSTNAME): worker2.oqa.suse.de
Nov 23 21:46:26 worker2 worker[25034]: - worker address (WORKER_HOSTNAME): worker2.oqa.suse.de
…
Nov 27 03:12:10 worker2 worker[5125]: - worker address (WORKER_HOSTNAME): worker2.oqa.suse.de
Nov 27 03:21:02 worker2 worker[5937]: - worker address (WORKER_HOSTNAME): worker2.oqa.suse.de
Nov 27 03:22:44 worker2 worker[6012]: - worker address (WORKER_HOSTNAME): worker2.oqa.suse.de
Nov 27 03:37:52 worker2 worker[3420]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3419]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3458]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3425]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3407]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3437]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3417]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3442]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3433]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3451]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3418]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3409]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3440]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3428]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3415]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3445]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3444]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3461]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3411]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3446]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3452]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3464]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3416]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3435]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3449]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3450]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3453]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3436]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3423]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3413]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3410]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3448]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3439]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3421]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3426]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3424]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3414]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3443]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3438]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3463]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3447]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3459]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3427]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3465]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3434]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3429]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3406]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3412]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3432]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3431]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3422]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3430]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 03:37:52 worker2 worker[3408]: - worker address (WORKER_HOSTNAME): worker2
Nov 27 10:47:12 worker2 worker[15958]: - worker address (WORKER_HOSTNAME): worker2.oqa.suse.de
Nov 27 10:56:55 worker2 worker[16865]: - worker address (WORKER_HOSTNAME): worker2.oqa.suse.de
Nov 27 12:29:22 worker2 worker[28039]: - worker address (WORKER_HOSTNAME): worker2.oqa.suse.de
Nov 27 12:30:07 worker2 worker[28192]: - worker address (WORKER_HOSTNAME): worker2.oqa.suse.de
…
</code></pre>
<hr>
<p>Since the FQDN can apparently not be reliably detected (at least shortly after booting) I see multiple options:</p>
<ol>
<li>We go back to using IP addresses again
<ol>
<li>by setting <code>WORKER_HOSTNAME</code> again via salt to an IPv4 address.</li>
<li>by adding a fallback to the worker's auto-detection so it'll at least assign an IP if it cannot determine the FQDN.</li>
<li>by setting <code>WORKER_HOSTNAME_FALLBACK</code> via salt to an IPv4 address to fix the flaw of 1.2 pointed out in my next comment.</li>
</ol></li>
<li>We let the worker consider itself "broken" if the FQDN isn't set or cannot be auto-detected. We can simply re-use the already existing mechanisms which also made a re-try for the auto-detection easy to implement.</li>
</ol>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5787972022-11-28T17:40:54Zokurzokurz@suse.com
<ul></ul><p>1.2 sounds like a good idea </p>
<p>How likely do you consider the hypothesis that this is a worker2 specific problem? Did we see any problem on grenache-1 that also runs a lot of non-qemu workers?</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5791032022-11-29T09:56:03Zmkittlermarius.kittler@suse.com
<ul><li><strong>Due date</strong> changed from <i>2022-11-25</i> to <i>2022-12-09</i></li></ul><p>There's more work to be done so I'm bumping the due date.</p>
<hr>
<blockquote>
<p>1.2 sounds like a good idea</p>
</blockquote>
<p>When thinking about it further myself I must say that it likely won't cut it. If the worker is started early enough not even an IP address might be assigned yet. With 1.1 we'd likely won't run into the problem because Salt is supposedly really only becoming "active" once there's also network (at least we haven't seen any problems putting the IP address there via Salt in the past).</p>
<p>So likely its either 1.1 or 2 after all. I've also been adding the new option 1.3 but I find it rather messy if this gets spread over two different places (Salt and the worker code).</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5795382022-11-30T09:23:56Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Workable</i> to <i>Feedback</i></li></ul><p>Draft PR for option 2: <a href="https://github.com/os-autoinst/openQA/pull/4935" class="external">https://github.com/os-autoinst/openQA/pull/4935</a></p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5815842022-12-06T11:36:25Zmkittlermarius.kittler@suse.com
<ul><li><strong>Subject</strong> changed from <i>tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+.*failed at.*":retry size:meow</i> to <i>tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meow</i></li></ul><p>The PR has just been merged.</p>
<p>I've also re-conducted AT-1 to see how bad the situation is (without the PR being deployed so far). There were some jobs labeled but actually only one of them was actually about this issue (<a href="https://openqa.suse.de/tests/10088274" class="external">https://openqa.suse.de/tests/10088274</a>). Otherwise jobs were labeled incorrectly either due to the regex of this ticket being to generic, wrong manual labeling and wrong automatic takeover (failure happened in the very same set of test modules so that's expected). I removed the wrong labels and pinged Nan Zhang about the wrong labeling in the chat.</p>
<p>That's where this ticket's regex wrongly matched: <a href="https://openqa.suse.de/tests/10090329#step/suseconnect_scc/17" class="external">https://openqa.suse.de/tests/10090329#step/suseconnect_scc/17</a> - Those tickets contain log lines like</p>
<pre><code>[2022-12-06T02:48:45.338104+01:00] [info] ::: basetest::runtest: # Test died: command 'curl --form upload=@/tmp/full_journal.log --form upname=journal_check-full_journal.log http://worker2.oqa.suse.de:20113/VCc6rYDvppJx12SJ/uploadlog/full_journal.log' failed at /usr/lib/os-autoinst/testapi.pm line 922.
testapi::assert_script_run("curl --form upload=\@/tmp/full_journal.log --form upname=journ"..., 90) called at /usr/lib/os-autoinst/testapi.pm line 2191
testapi::upload_logs("/tmp/full_journal.log") called at sle/tests/console/journal_check.pm line 91
</code></pre>
<p>So I made the regex more specific to include the colon.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5816172022-12-06T12:53:40Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-5 priority-high3 closed behind-schedule" href="/issues/121567">action #121567</a>: test fails in test_running</i> added</li></ul> openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5822442022-12-07T10:14:15Zokurzokurz@suse.com
<ul></ul><p>As discussed after daily 2022-12-07 refreshing the ACs we want to cover:</p>
<ol>
<li>It is possible for users to override WORKER_HOSTNAME with a specific FQDN or IP address</li>
<li>For a properly resolvable FQDN WORKER_HOSTNAME does not need to be configured and developer mode still works, i.e. we don't need to specify WORKER_HOSTNAME by default in OSD salt pillars because our FQDNs are sane</li>
<li>For a single-host webui+worker setup the worker by default connects to 'localhost' (i.e. default for the HOST variable) and also WORKER_HOSTNAME does not need to be specified even if FQDN is not fully resolvable, i.e. openQA-in-openQA tests work with an "incomplete qemu usermode network"</li>
<li>A combination of localhost+non-local in HOST does not need to auto-resolve correctly for localhost, i.e. users need to ensure a proper FQDN resolution that also works locally (2.) or just needs to specify manually</li>
<li>REJECT In cause auto-resolution does not work (reg. 2.) openQA jobs still start to run and fail accordingly with a message, e.g. that log upload fails -> We can only ensure 2. if we never start jobs without a valid WORKER_HOSTNAME <em>unless</em> we explicitly "opt-out"</li>
</ol>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5824572022-12-07T15:09:30Zmkittlermarius.kittler@suse.com
<ul></ul><p>"Avoid running jobs with undetermined worker address" has been deployed today so this issue should be gone. There's also <a href="https://github.com/os-autoinst/openQA/pull/4949" class="external">https://github.com/os-autoinst/openQA/pull/4949</a> which hasn't been deployed yet to fix the regression handled in <a class="issue tracker-4 status-3 priority-5 priority-high3 closed behind-schedule" title="action: test fails in test_running (Resolved)" href="https://progress.opensuse.org/issues/121567">#121567</a> according to <a class="issue tracker-4 status-3 priority-6 priority-high2 closed child" title="action: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or somethin... (Resolved)" href="https://progress.opensuse.org/issues/120261#note-42">#120261#note-42</a>.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5824632022-12-07T15:22:24Zmkittlermarius.kittler@suse.com
<ul></ul><p>There was another occurrence (<a href="https://openqa.suse.de/tests/10086991" class="external">https://openqa.suse.de/tests/10086991</a>) but it ran before the change has been deployed.</p>
<p>Unfortunately 9 other jobs were labeled due via automatic carry over. The jobs the label had been carried over from were already wrongly labeled manually (e.g. <a href="https://openqa.suse.de/tests/10086966#comments" class="external">https://openqa.suse.de/tests/10086966#comments</a>) or the carry over's limitation to distinguish different problems lead to the carry over (e.g. <a href="https://openqa.suse.de/tests/10097687#comments" class="external">https://openqa.suse.de/tests/10097687#comments</a>). Maybe we can at least improve the latter. For now I've just removed the wrong labels.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5824962022-12-07T18:15:51Zokurzokurz@suse.com
<ul></ul><p><a class="issue tracker-4 status-3 priority-5 priority-high3 closed behind-schedule" title="action: test fails in test_running (Resolved)" href="https://progress.opensuse.org/issues/121567">#121567</a> has been resolved. IMHO we can resolve here as well?</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5830362022-12-09T10:31:41Zokurzokurz@suse.com
<ul><li><strong>Due date</strong> changed from <i>2022-12-09</i> to <i>2022-12-16</i></li><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Workable</i></li></ul><pre><code>$ openqa-query-for-job-label poo#120261
10111972|2022-12-09 06:52:49|done|failed|fips_install_lvm_encrypt_separate_boot||worker2
10110342|2022-12-09 04:03:30|done|failed|fips_install_lvm_encrypt_separate_boot||worker2
10105119|2022-12-08 04:27:58|done|failed|fips_install_lvm_encrypt_separate_boot||worker2
10104295|2022-12-08 03:42:15|done|failed|fips_install_lvm_encrypt_separate_boot||worker2
10086982|2022-12-07 16:41:23|done|failed|xfstests_xfs-generic||worker2
10086976|2022-12-07 16:17:13|done|failed|xfstests_btrfs-generic||worker2
10097638|2022-12-07 08:31:21|done|failed|fips_install_lvm_encrypt_separate_boot||worker2
10086991|2022-12-06 01:31:41|done|failed|jeos-base+sdk+desktop||worker2
10089358|2022-12-06 01:30:37|done|failed|jeos-containers-podman:investigate:last_good_tests:4adccec6a14a722ab92fc25f72b7a5a6a9d2bfd5||worker2
10086996|2022-12-06 01:22:02|done|failed|jeos-extratest||worker2
</code></pre>
<p>looking at the most recent one I find <a href="https://openqa.suse.de/tests/10077880/file/vars.json" class="external">https://openqa.suse.de/tests/10077880/file/vars.json</a> referencing WORKER_HOSTNAME=worker2 which again is wrong and problematic.</p>
<p>Right now the worker <a href="https://openqa.suse.de/admin/workers/2111" class="external">https://openqa.suse.de/admin/workers/2111</a> shows again the correct "worker2.oqa.suse.de".</p>
<p>I logged in to worker2 over ssh and checked in <code>rpm -q --changelog openQA-worker</code> and found that also the latest PR <a href="https://github.com/os-autoinst/openQA/pull/4949" class="external">https://github.com/os-autoinst/openQA/pull/4949</a> is resolved.</p>
<p>So apparently the later PR did not resolve the issues. We need to focus on this task next week.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5831292022-12-09T12:52:41Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Workable</i> to <i>Feedback</i></li></ul><p>I recommend to hardcode the FQDN in the worker config of worker2 only assuming we only have special problems there:<br>
<a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/474" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/474</a></p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5839512022-12-12T08:21:58Zmkittlermarius.kittler@suse.com
<ul></ul><p>Your investigation doesn't make sense. The problematic job <a href="https://openqa.suse.de/tests/10077880" class="external">https://openqa.suse.de/tests/10077880</a> ran 8 days ago. At this point <a href="https://github.com/os-autoinst/openQA/pull/4935" class="external">https://github.com/os-autoinst/openQA/pull/4935</a> (that is the actually relevant PR) hasn't been merged yet as it was only merged 6 days ago.</p>
<p>I'll nevertheless check labeled jobs again myself to see whether there are any relevant ones.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5839602022-12-12T08:37:28Zmkittlermarius.kittler@suse.com
<ul></ul><p>I only found jobs wrongly labeled manually, with wrong carry over or jobs that ran before <a href="https://github.com/os-autoinst/openQA/pull/4935" class="external">https://github.com/os-autoinst/openQA/pull/4935</a> has been merged.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5839662022-12-12T08:40:23Zmkittlermarius.kittler@suse.com
<ul></ul><p>MR to remove workaround: <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/475" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/475</a></p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5859582022-12-19T10:33:52Zokurzokurz@suse.com
<ul><li><strong>Due date</strong> deleted (<del><i>2022-12-16</i></del>)</li><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>mkittler wrote:</p>
<blockquote>
<p>MR to remove workaround: <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/475" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/475</a></p>
</blockquote>
<p>MR was merged. I am not aware of further problems. It seems we are good.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5870682022-12-22T01:48:38Zyosunyosun@suse.com
<ul></ul><p>Hello, I have a question, may I expect this change enable in osd SP5 test build61.1 or later? because my s390x case still fails in curl download log part in build61.1, any idea?<br>
<a href="https://openqa.suse.de/tests/10185510#step/generate_report/2" class="external">https://openqa.suse.de/tests/10185510#step/generate_report/2</a></p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5872872022-12-22T13:05:34Zokurzokurz@suse.com
<ul></ul><p>yosun wrote:</p>
<blockquote>
<p>Hello, I have a question, may I expect this change enable in osd SP5 test build61.1 or later?</p>
</blockquote>
<p>yes, the change is effective since <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/475" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/475</a> was merged and the linked pipeline job passed.</p>
<blockquote>
<p>because my s390x case still fails in curl download log part in build61.1, any idea?<br>
<a href="https://openqa.suse.de/tests/10185510#step/generate_report/2" class="external">https://openqa.suse.de/tests/10185510#step/generate_report/2</a></p>
</blockquote>
<p><a href="https://openqa.suse.de/tests/10185510#step/generate_report/2" class="external">https://openqa.suse.de/tests/10185510#step/generate_report/2</a> shows</p>
<pre><code># wait_serial expected: "curl -O http://worker2.oqa.suse.de:20403/poqC6RHW8bUb6NVl/files/status.log; cat status.log > /opt/status.log; echo ZlKoJ-\$?-"
# Result:
</code></pre>
<p>so the correct FQDN "worker2.oqa.suse.de" is referenced. Hence the problem must be something different. I suggest you investigate in more detail and create a specific ticket for that</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5958102023-01-21T00:30:48Zopenqa_reviewopenqa-review@suse.de
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>Feedback</i></li></ul><p>This is an autogenerated message for openQA integration by the openqa_review script:</p>
<p>This bug is still referenced in a failing openQA test: xfstests_xfs<br>
<a href="https://openqa.suse.de/tests/10339785#step/generate_report/1" class="external">https://openqa.suse.de/tests/10339785#step/generate_report/1</a></p>
<p>To prevent further reminder comments one of the following options should be followed:</p>
<ol>
<li>The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted</li>
<li>The openQA job group is moved to "Released" or "EOL" (End-of-Life)</li>
<li>The bugref in the openQA scenario is removed or replaced, e.g. <code>label:wontfix:boo1234</code></li>
</ol>
<p>Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5960052023-01-23T10:02:04Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>This was just a takeover from a 2 month old job with a wrong manual amendment. The failing job has <code>Test died: command 'curl -O http://worker2.oqa.suse.de:20303</code> so the FQDN <em>is</em> correct (and the download error happened despite that). I'm closing this issue again and pinged the user.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5985932023-02-07T00:36:41Zopenqa_reviewopenqa-review@suse.de
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>Feedback</i></li></ul><p>This is an autogenerated message for openQA integration by the openqa_review script:</p>
<p>This bug is still referenced in a failing openQA test: jeos-apparmor@svirt-vmware65<br>
<a href="https://openqa.suse.de/tests/10436752#step/journal_check/1" class="external">https://openqa.suse.de/tests/10436752#step/journal_check/1</a></p>
<p>To prevent further reminder comments one of the following options should be followed:</p>
<ol>
<li>The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted</li>
<li>The openQA job group is moved to "Released" or "EOL" (End-of-Life)</li>
<li>The bugref in the openQA scenario is removed or replaced, e.g. <code>label:wontfix:boo1234</code></li>
</ol>
<p>Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.</p>
openQA Infrastructure - action #120261: tests should try to access worker by WORKER_HOSTNAME FQDN but sometimes get 'worker2' or something auto_review:".*curl.*worker\d+:.*failed at.*":retry size:meowhttps://progress.opensuse.org/issues/120261?journal_id=5988872023-02-07T10:01:38Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>I called</p>
<pre><code>$ interval='90 day' openqa-query-for-job-label poo#120261
10436780|2023-02-06 17:04:32|done|failed|jeos-main||worker2
10353788|2023-01-20 14:54:15|done|failed|jeos-apparmor||worker2
10353481|2023-01-20 14:22:53|done|failed|jeos-main||worker2
10165585|2022-12-14 10:24:43|done|failed|jeos-containers-podman||worker2
10086991|2022-12-06 01:31:41|done|failed|jeos-base+sdk+desktop||worker2
10086996|2022-12-06 01:22:02|done|failed|jeos-extratest||worker2
10088170|2022-12-06 01:20:01|done|failed|online_upgrade_sles15sp4_hyperv||worker2
10088145|2022-12-06 01:05:30|done|failed|textmode_svirt||worker2
10089379|2022-12-06 00:54:27|done|failed|jeos-base+sdk+desktop||worker2
10087142|2022-12-06 00:54:20|done|failed|jeos-main||worker2
</code></pre>
<p>and removed the label from the previously mentioned job and the more recent ones from the list above. I also checked <a href="https://openqa.io.suse.de/openqa-review/openqa_suse_de_status.html" class="external">https://openqa.io.suse.de/openqa-review/openqa_suse_de_status.html</a> for references which only showed two jobs from the above list. We only found automatic take-overs, no manually referenced jobs so I would say for now we are ok with just removing those references from jobs assuming that there won't be more take-overs.</p>