https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842020-12-09T13:42:06ZopenSUSE Project Management ToolopenQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3572482020-12-09T13:42:06Zlivdywanliv.dywan@suse.com
<ul><li><strong>Assignee</strong> set to <i>livdywan</i></li></ul> openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3573182020-12-09T16:20:42Zokurzokurz@suse.com
<ul></ul><p>@cdywan please be aware of <a class="issue tracker-6 status-3 priority-4 priority-default closed child parent" title="coordination: [epic] Continuous deployment (package upgrade or config update) without interrupting currently ru... (Resolved)" href="https://progress.opensuse.org/issues/80908#note-3">#80908#note-3</a> . We might be able to solve this story as well as the generic one to restart for upgrade by "terminate after executing all currently assigned jobs" and letting systemd restart and hence implicitly also load config again.</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3574282020-12-10T04:07:50Zopenqa_reviewopenqa-review@suse.de
<ul><li><strong>Due date</strong> set to <i>2020-12-24</i></li></ul><p>Setting due date based on mean cycle time of SUSE QE Tools</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3575542020-12-10T10:44:13Zlivdywanliv.dywan@suse.com
<ul></ul><p>okurz wrote:</p>
<blockquote>
<p>@cdywan please be aware of <a class="issue tracker-6 status-3 priority-4 priority-default closed child parent" title="coordination: [epic] Continuous deployment (package upgrade or config update) without interrupting currently ru... (Resolved)" href="https://progress.opensuse.org/issues/80908#note-3">#80908#note-3</a> . We might be able to solve this story as well as the generic one to restart for upgrade by "terminate after executing all currently assigned jobs" and letting systemd restart and hence implicitly also load config again.</p>
</blockquote>
<p>Ack. We should probably have a ticket for that then since that's the epic.</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3578002020-12-11T13:21:23Zlivdywanliv.dywan@suse.com
<ul><li><strong>Assignee</strong> deleted (<del><i>livdywan</i></del>)</li></ul><p>cdywan wrote:</p>
<blockquote>
<p>okurz wrote:</p>
<blockquote>
<p>@cdywan please be aware of <a class="issue tracker-6 status-3 priority-4 priority-default closed child parent" title="coordination: [epic] Continuous deployment (package upgrade or config update) without interrupting currently ru... (Resolved)" href="https://progress.opensuse.org/issues/80908#note-3">#80908#note-3</a> . We might be able to solve this story as well as the generic one to restart for upgrade by "terminate after executing all currently assigned jobs" and letting systemd restart and hence implicitly also load config again.</p>
</blockquote>
<p>Ack. We should probably have a ticket for that then since that's the epic.</p>
</blockquote>
<p>Indeed we have <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: terminate worker process after executing all currently assigned jobs based on config/env variable (Resolved)" href="https://progress.opensuse.org/issues/80986">#80986</a> now.</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3585542020-12-16T11:09:28Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Workable</i> to <i>Blocked</i></li><li><strong>Assignee</strong> set to <i>mkittler</i></li></ul><p><a class="user active user-mention" href="https://progress.opensuse.org/users/22072">@mkittler</a> please check again after <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: terminate worker process after executing all currently assigned jobs based on config/env variable (Resolved)" href="https://progress.opensuse.org/issues/80986">#80986</a> if this is implicitly done or not needed anymore.</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3587502020-12-17T12:50:58Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Blocked</i> to <i>In Progress</i></li></ul><p>The phrasing "whenever they are ready to pick up new jobs" really calls for a solution which covers idling workers as well like my PR <a href="https://github.com/os-autoinst/openQA/pull/3641" class="external">https://github.com/os-autoinst/openQA/pull/3641</a>. Considering I've already created a PR it is no longer blocked.</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3599542020-12-28T15:06:36Zlivdywanliv.dywan@suse.com
<ul><li><strong>Due date</strong> changed from <i>2020-12-24</i> to <i>2021-01-08</i></li></ul><p>PR has <em>not</em> been merged <em>yet</em>. Updating <em>due date</em> to account for holidays.</p>
<p>See also <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/423/diffs" class="external">!423</a> for the related salt change (not covered by this ticket)</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3728692021-01-11T11:05:52Zokurzokurz@suse.com
<ul></ul><p>We, mkittler, cdywan, okurz discussed together. The mentioned PR is merged but the caveat is that still "one more job" will run with the old config. <a class="user active user-mention" href="https://progress.opensuse.org/users/22072">@mkittler</a> to change the code accordingly.</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3728872021-01-11T12:18:13Zmkittlermarius.kittler@suse.com
<ul></ul><p>I'd like to note that this even leaves one more caveat considering the example given in AC1:</p>
<blockquote>
<p>AC1: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobs</p>
</blockquote>
<p>One further complication is that to actually apply the new <code>WORKER_CLASS</code> the worker does not only need to re-read the config but also to re-register with its web UIs. That should be easy but I'd like to mention it because the code changes will be a little bit more than expected.</p>
<p>There's another problem when it comes to triggering the re-reading. I would have implemented this feature so that the worker re-reads the config before starting a new job so the new job will definitely run under the new config to avoid the "one more job" problem mentioned in <a class="user active user-mention" href="https://progress.opensuse.org/users/17668">@okurz</a>'s previous comment. That is usually fine except for settings which don't affect the job itself but the scheduling of further jobs like the <code>WORKER_CLASS</code>. Even if I also make the worker reload the config after finishing its current jobs the new <code>WORKER_CLASS</code> would still not be applied when a worker is idling. I could implement a periodic check for re-reading the config file while idling to solve this. I could also use Inotify using <code>Linux::Perl::inotify</code> or <code>Linux::Inotify2</code>. (None of them are currently in TW.)</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3728962021-01-11T12:54:50Zlivdywanliv.dywan@suse.com
<ul></ul><p>mkittler wrote:</p>
<blockquote>
<p>I'd like to note that this even leaves one more caveat considering the example given in AC1:</p>
<blockquote>
<p>AC1: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobs</p>
</blockquote>
<p>There's another problem when it comes to triggering the re-reading. I would have implemented this feature so that the worker re-reads the config before starting a new job so the new job will definitely run under the new config to avoid the "one more job" problem mentioned in <a class="user active user-mention" href="https://progress.opensuse.org/users/17668">@okurz</a>'s previous comment. That is usually fine except for settings which don't affect the job itself but the scheduling of further jobs like the <code>WORKER_CLASS</code>. Even if I also make the worker reload the config after finishing its current jobs the new <code>WORKER_CLASS</code> would still not be applied when a worker is idling. I could implement a periodic check for re-reading the config file while idling to solve this. I could also use Inotify using <code>Linux::Perl::inotify</code> or <code>Linux::Inotify2</code>. (None of them are currently in TW.)</p>
</blockquote>
<p><code>read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobs</code> could be taken to mean read the file when there's a new job. No file monitoring required.</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3729172021-01-11T14:03:18Zlivdywanliv.dywan@suse.com
<ul></ul><p>The systemd way suggested: <a href="https://www.freedesktop.org/software/systemd/man/systemd.path.html#PathExists=" class="external">PathModified</a> which could emit a signal to terminate the worker (and <em>also</em> read the config as a side effect).</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3729352021-01-11T15:20:52Zmkittlermarius.kittler@suse.com
<ul></ul><blockquote>
<p>read the file when there's a new job</p>
</blockquote>
<p>As discussed this simplification is not possible. It would mean a job for the old <code>WORKER_CLASS</code> runs with the new configuration.</p>
<p>We also came to further conclusions:</p>
<ol>
<li>Using systemd to fire the signal is a conceivable idea.</li>
<li>We should avoid the changes mentioned in <a href="#note-10">#note-10</a> because is leads to far. The worker should not have to deal with watching its config file.</li>
<li>For now I just going to enable <code>OPENQA_WORKER_TERMINATE_AFTER_JOBS_DONE</code> by switching to <code>openqa-worker-auto-restart@.service</code> to terminate and restart the worker after each job assignment has been processed (see <a href="https://github.com/os-autoinst/openQA/pull/3636" class="external">https://github.com/os-autoinst/openQA/pull/3636</a>). This means we will always run one more job with the old configuration (per idle worker slot) but it is likely better than nothing. This is likely not fulfilling AC1 as it was originally meant but should be ok for now.</li>
</ol>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3729382021-01-11T15:32:34Zmkittlermarius.kittler@suse.com
<ul></ul><p>I've just did 3. from the previous comment on the o3 worker <code>imagetester</code> to see how it works in production.</p>
<p>To revert in case of problems, just use:</p>
<pre><code>systemctl disable --now openqa-worker-auto-restart@{1,2}
systemctl enable --now openqa-worker@{1,2}
</code></pre> openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3741492021-01-12T10:00:37Zmkittlermarius.kittler@suse.com
<ul></ul><p>PR for how the systemd way would look like: <a href="https://github.com/os-autoinst/openQA/pull/3666" class="external">https://github.com/os-autoinst/openQA/pull/3666</a></p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3773062021-01-12T12:20:32Zmkittlermarius.kittler@suse.com
<ul></ul><p><code>openqa-worker-auto-restart@.service</code> seems to work on <code>imagetester</code>. Note that the <code>openqa-worker.target</code> was not enabled/started on this machine but it is on other machines and interferes with enabling other worker services (see <a href="https://progress.opensuse.org/issues/80986#note-13" class="external">https://progress.opensuse.org/issues/80986#note-13</a>). So it must be disabled before starting any custom service files in place of <code>openqa-worker@.service</code>. This disables the automatic restart of the service on package updates as well.</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3773302021-01-12T13:33:43Zlivdywanliv.dywan@suse.com
<ul><li><strong>Due date</strong> changed from <i>2021-01-08</i> to <i>2021-01-15</i></li></ul> openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3775342021-01-13T10:06:43Zmkittlermarius.kittler@suse.com
<ul></ul><p>By the way, here's a draft for how a switch to <code>openqa-worker-auto-restart@.service</code> would look like in salt: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/426/diffs" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/426/diffs</a></p>
<p>With this SR we still need to disable and stop the regular <code>openqa-worker@.service</code> manually (e.g. using <code>salt -C 'G@roles:worker' cmd.run 'systemctl disable --now openqa-worker@*'</code>). For the least interruption this could be done before the next deployment.</p>
<p>The users requesting this were only talking about OSD and we update o3 more frequently anyways. So I'm not going to apply usage of <code>openqa-worker-auto-restart@.service</code> on all o3 workers for now (unless someone says we want that).</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3797872021-01-21T19:17:04Zlivdywanliv.dywan@suse.com
<ul><li><strong>Blocked by</strong> <i><a class="issue tracker-4 status-3 priority-4 priority-default closed child" href="/issues/80986">action #80986</a>: terminate worker process after executing all currently assigned jobs based on config/env variable</i> added</li></ul> openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3797892021-01-21T19:17:07Zlivdywanliv.dywan@suse.com
<ul><li><strong>Due date</strong> deleted (<del><i>2021-01-15</i></del>)</li></ul><p>I guess we're still waiting on <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: terminate worker process after executing all currently assigned jobs based on config/env variable (Resolved)" href="https://progress.opensuse.org/issues/80986">#80986</a></p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3800412021-01-23T04:33:19Zopenqa_reviewopenqa-review@suse.de
<ul><li><strong>Due date</strong> set to <i>2021-02-06</i></li></ul><p>Setting due date based on mean cycle time of SUSE QE Tools</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3801852021-01-25T17:39:09Zlivdywanliv.dywan@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Blocked</i></li></ul> openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3805772021-01-27T13:18:02Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Blocked</i> to <i>In Progress</i></li></ul><ul>
<li>PR: <a href="https://github.com/os-autoinst/openQA/pull/3635" class="external">https://github.com/os-autoinst/openQA/pull/3635</a></li>
<li>SR to enable it on OSD: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/438" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/438</a></li>
</ul>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3817292021-02-05T09:47:23Zokurzokurz@suse.com
<ul></ul><p>both merged. Your salt change might help with <a class="issue tracker-4 status-3 priority-3 priority-lowest closed" title="action: ensure openqa worker instances are disabled and stopped when "numofworkers" is reduced in salt pi... (Resolved)" href="https://progress.opensuse.org/issues/63874">#63874</a> as well although I still hope for an easier solution than the sed+awk+tr magic. I hoped we could rely on openqa-worker.target or something</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3821592021-02-10T11:07:41Zmkittlermarius.kittler@suse.com
<ul></ul><p>It does not work in production as expected. I've been editing <code>workers.ini</code> twice on <code>openqaworker-arm-1</code> and <code>Received signal HUP</code> has been logged immediately and the behavior of the worker was as expected:</p>
<pre><code>Feb 10 11:00:40 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] REST-API call: POST http://openqa.suse.de/api/v1/jobs/5442303/status
Feb 10 11:00:50 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] REST-API call: POST http://openqa.suse.de/api/v1/jobs/5442303/status
Feb 10 11:01:00 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] REST-API call: POST http://openqa.suse.de/api/v1/jobs/5442303/status
Feb 10 11:01:10 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] REST-API call: POST http://openqa.suse.de/api/v1/jobs/5442303/status
Feb 10 11:01:15 openqaworker-arm-1 worker[3961]: [info] [pid:3961] Received signal HUP
Feb 10 11:01:20 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] REST-API call: POST http://openqa.suse.de/api/v1/jobs/5442303/status
Feb 10 11:01:21 openqaworker-arm-1 worker[3961]: [info] [pid:3961] Isotovideo exit status: 0
Feb 10 11:01:21 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] Stopping job 5442303 from openqa.suse.de: 05442303-sle-15-SP3-Full-aarch64-Build145.1-migration_offline_sle15sp2_ha_alpha_node02@aarch64 - reason: done
Feb 10 11:01:21 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] REST-API call: POST http://openqa.suse.de/api/v1/jobs/5442303/status
Feb 10 11:01:21 openqaworker-arm-1 worker[3961]: [info] [pid:3961] +++ worker notes +++
Feb 10 11:01:21 openqaworker-arm-1 worker[3961]: [info] [pid:3961] End time: 2021-02-10 11:01:21
Feb 10 11:01:21 openqaworker-arm-1 worker[3961]: [info] [pid:3961] Result: done
Feb 10 11:01:21 openqaworker-arm-1 worker[3961]: [info] [pid:35402] Uploading vars.json
Feb 10 11:01:21 openqaworker-arm-1 worker[3961]: [info] [pid:35402] Uploading autoinst-log.txt
Feb 10 11:01:21 openqaworker-arm-1 worker[3961]: [info] [pid:35402] Uploading worker-log.txt
Feb 10 11:01:21 openqaworker-arm-1 worker[3961]: [info] [pid:35402] Uploading serial0.txt
Feb 10 11:01:21 openqaworker-arm-1 worker[3961]: [info] [pid:35402] Uploading video_time.vtt
Feb 10 11:01:22 openqaworker-arm-1 worker[3961]: [info] [pid:35402] Uploading serial_terminal.txt
Feb 10 11:01:22 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] Setting job 5442303 to done
Feb 10 11:01:22 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] Unable to read result-patch_sle.json: Can't open file "/var/lib/openqa/pool/1/testresults/result-patch_sle.json": No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Job.pm line 1152.
Feb 10 11:01:22 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] REST-API call: POST http://openqa.suse.de/api/v1/jobs/5442303/status
Feb 10 11:01:22 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] REST-API call: POST http://openqa.suse.de/api/v1/jobs/5442303/set_done?reason=isotovideo+done%3A+isotovideo+received+signal+HUP&worker_id=476
Feb 10 11:01:22 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] Job 5442303 from openqa.suse.de finished - reason: done
Feb 10 11:01:22 openqaworker-arm-1 worker[3961]: [debug] [pid:3961] Informing openqa.suse.de that we are going offline
Feb 10 11:01:23 openqaworker-arm-1 systemd[1]: openqa-worker-auto-restart@1.service: Service RestartSec=100ms expired, scheduling restart.
Feb 10 11:01:23 openqaworker-arm-1 systemd[1]: Stopped openQA Worker #1.
Feb 10 11:01:23 openqaworker-arm-1 systemd[1]: Starting openQA Worker #1...
Feb 10 11:01:23 openqaworker-arm-1 systemd[1]: Started openQA Worker #1.
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: [info] [pid:35409] worker 1:
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: - config file: /etc/openqa/workers.ini
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: - worker hostname: openqaworker-arm-1
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: - isotovideo version: 20
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: - websocket API version: 1
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: - web UI hosts: openqa.suse.de
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: - class: qemu_aarch64,qemu_aarch64_slow_worker,tap,openqaworker-arm-1
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: - no cleanup: no
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: - pool directory: /var/lib/openqa/pool/1
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: [info] [pid:35409] CACHE: caching is enabled, setting up /var/lib/openqa/cache/openqa.suse.de
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: [info] [pid:35409] Project dir for host openqa.suse.de is /var/lib/openqa/share
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: [info] [pid:35409] Registering with openQA openqa.suse.de
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: [info] [pid:35409] Establishing ws connection via ws://openqa.suse.de/api/v1/ws/476
Feb 10 11:01:26 openqaworker-arm-1 worker[35409]: [info] [pid:35409] Registered and connected via websockets with openQA host openqa.suse.de and worker ID 476
Feb 10 11:01:36 openqaworker-arm-1 worker[35409]: [info] [pid:35409] Received signal HUP
Feb 10 11:01:36 openqaworker-arm-1 worker[35409]: [debug] [pid:35409] Informing openqa.suse.de that we are going offline
Feb 10 11:01:37 openqaworker-arm-1 systemd[1]: openqa-worker-auto-restart@1.service: Service RestartSec=100ms expired, scheduling restart.
Feb 10 11:01:37 openqaworker-arm-1 systemd[1]: Stopped openQA Worker #1.
Feb 10 11:01:37 openqaworker-arm-1 systemd[1]: Starting openQA Worker #1...
Feb 10 11:01:37 openqaworker-arm-1 systemd[1]: Started openQA Worker #1.
Feb 10 11:01:40 openqaworker-arm-1 worker[35450]: [info] [pid:35450] worker 1:
Feb 10 11:01:40 openqaworker-arm-1 worker[35450]: - config file: /etc/openqa/workers.ini
Feb 10 11:01:40 openqaworker-arm-1 worker[35450]: - worker hostname: openqaworker-arm-1
Feb 10 11:01:40 openqaworker-arm-1 worker[35450]: - isotovideo version: 20
Feb 10 11:01:40 openqaworker-arm-1 worker[35450]: - websocket API version: 1
Feb 10 11:01:40 openqaworker-arm-1 worker[35450]: - web UI hosts: openqa.suse.de
Feb 10 11:01:40 openqaworker-arm-1 worker[35450]: - class: qemu_aarch64,qemu_aarch64_slow_worker,tap,openqaworker-arm-1
Feb 10 11:01:40 openqaworker-arm-1 worker[35450]: - no cleanup: no
Feb 10 11:01:40 openqaworker-arm-1 worker[35450]: - pool directory: /var/lib/openqa/pool/1
Feb 10 11:01:40 openqaworker-arm-1 worker[35450]: [info] [pid:35450] CACHE: caching is enabled, setting up /var/lib/openqa/cache/openqa.suse.de
Feb 10 11:01:40 openqaworker-arm-1 worker[35450]: [info] [pid:35450] Project dir for host openqa.suse.de is /var/lib/openqa/share
Feb 10 11:01:40 openqaworker-arm-1 worker[35450]: [info] [pid:35450] Registering with openQA openqa.suse.de
</code></pre>
<p>However, the job failed with</p>
<pre><code>Result: failed finished 5 minutes ago ( 12:14 minutes )
Reason: isotovideo done: isotovideo received signal HUP
</code></pre>
<p>(<a href="https://openqa.suse.de/tests/5442303">https://openqa.suse.de/tests/5442303</a>)</p>
<p>because systemd apparently sends the signal to the entire process group (and not just the worker process).</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3821782021-02-10T16:18:16Zmkittlermarius.kittler@suse.com
<ul></ul><p>This PR should fix it: <a href="https://github.com/os-autoinst/openQA/pull/3716" class="external">https://github.com/os-autoinst/openQA/pull/3716</a></p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3825552021-02-15T09:03:14Zokurzokurz@suse.com
<ul></ul><p>from today morning after the weekly automatic reboot</p>
<pre><code># salt -l error --no-color -C 'G@roles:worker' cmd.run "systemctl list-units --failed | grep service"
openqaworker2.suse.de:
openqaworker8.suse.de:
* openqa-worker@12.service loaded failed failed openQA Worker #12
openqaworker5.suse.de:
openqaworker9.suse.de:
QA-Power8-5-kvm.qa.suse.de:
openqaworker6.suse.de:
QA-Power8-4-kvm.qa.suse.de:
malbec.arch.suse.de:
grenache-1.qa.suse.de:
openqaworker10.suse.de:
openqaworker-arm-1.suse.de:
* openqa-worker@2.service loaded failed failed openQA Worker #2
openqaworker-arm-3.suse.de:
* openqa-worker@10.service loaded failed failed openQA Worker #10
* openqa-worker@13.service loaded failed failed openQA Worker #13
* openqa-worker@16.service loaded failed failed openQA Worker #16
* openqa-worker@19.service loaded failed failed openQA Worker #19
* openqa-worker@5.service loaded failed failed openQA Worker #5
* openqa-worker@7.service loaded failed failed openQA Worker #7
* openqa-worker@9.service loaded failed failed openQA Worker #9
openqaworker-arm-2.suse.de:
* openqa-worker@17.service loaded failed failed openQA Worker #17
* openqa-worker@2.service loaded failed failed openQA Worker #2
* openqa-worker@6.service loaded failed failed openQA Worker #6
* openqa-worker@9.service loaded failed failed openQA Worker #9
powerqaworker-qam-1:
Minion did not return. [Not connected]
openqaworker13.suse.de:
Minion did not return. [Not connected]
openqaworker3.suse.de:
Minion did not return. [Not connected]
ERROR: Minions returned with non-zero exit code
</code></pre>
<p>seems there are multiple cases of openqa-worker@ that should not be running after openqa-worker-auto-restart@ instead should be active. Can you check/explain/fix that?</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3827772021-02-15T15:59:34Zlivdywanliv.dywan@suse.com
<ul><li><strong>Due date</strong> changed from <i>2021-02-06</i> to <i>2021-02-19</i></li></ul><p><a class="user active user-mention" href="https://progress.opensuse.org/users/22072">@mkittler</a> Can you please check the issues mentioned?</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3828012021-02-15T17:17:32Zmkittlermarius.kittler@suse.com
<ul></ul><p>I've already checked and the fix (<a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/commit/0ca8d841188f0353b8b48f90f03e9735675accd3" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/commit/0ca8d841188f0353b8b48f90f03e9735675accd3</a>) has been merged.</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3847482021-02-22T15:30:40Zlivdywanliv.dywan@suse.com
<ul><li><strong>Due date</strong> changed from <i>2021-02-19</i> to <i>2021-02-26</i></li></ul><p>Should this be in <em>Feedback</em>? </p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3849732021-02-23T10:21:57Zmkittlermarius.kittler@suse.com
<ul></ul><p>On the next OSD deployment I can continue working on this. Until then I can't even get feedback. So it is actually blocked. The same counts for the parent ticket.</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3851652021-02-23T22:02:22Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul><p>We use "Feedback" when we wait for defined events that need active checks by the assignee, e.g. "wait until after the next OSD deployment" is such case. "In Progress" means that you are busy coding, researching, trying, debugging, etc.</p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3853332021-02-24T11:34:56Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>Works in production, see <a class="issue tracker-6 status-3 priority-4 priority-default closed child parent" title="coordination: [epic] Continuous deployment (package upgrade or config update) without interrupting currently ru... (Resolved)" href="https://progress.opensuse.org/issues/80908#note-18">#80908#note-18</a></p>
openQA Project - action #80910: openQA workers read updated configuration, e.g. WORKER_CLASS, whenever they are ready to pick up new jobshttps://progress.opensuse.org/issues/80910?journal_id=3860442021-02-26T10:33:41Zokurzokurz@suse.com
<ul><li><strong>Due date</strong> deleted (<del><i>2021-02-26</i></del>)</li></ul>