https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842021-05-26T07:25:17ZopenSUSE Project Management ToolopenQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4104872021-05-26T07:25:17Zmgriessmeiermgriessmeier@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/410487/diff?detail_id=389967">diff</a>)</li></ul> openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4106022021-05-26T09:29:33Zokurzokurz@suse.com
<ul><li><strong>Target version</strong> set to <i>future</i></li></ul><p>cool, thank you. For now you can track it as a "personal" task. Unless you can resolve it yourself please take care to assign it to a squad eventually. I suggest either [qe-core] or [tools].</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4106462021-05-26T09:55:34Zgeorggkioulis@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/410646/diff?detail_id=390096">diff</a>)</li><li><strong>Target version</strong> deleted (<del><i>future</i></del>)</li></ul> openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4106552021-05-26T10:26:22Zokurzokurz@suse.com
<ul><li><strong>Target version</strong> set to <i>future</i></li></ul><p><a class="user active user-mention" href="https://progress.opensuse.org/users/30196">@geor</a> I assume you deleted the target version by mistake in an edit conflict, setting back to "Future".</p>
<p>On request by mgriessmeier creating a temporary worker configuration on grenache-1.qa. In /etc/openqa/workers.ini:</p>
<pre><code>[66]
WORKER_CLASS=s390-kvm-sle12-poo93119-okurz,grenache-1
NETDEV=eth0
SUT_IP=10.161.145.90
VIRSH_HOSTNAME=s390zp19.suse.de
VIRSH_PASSWORD=nots3cr3t
VIRSH_GUEST=10.161.145.90
VIRSH_MAC=52:54:00:12:5c:d6
VIRSH_CMDLINE=ifcfg=dhcp
VIRSH_INSTANCE=1
</code></pre>
<p>and starting with <code>systemctl start openqa-worker@66</code>. Scheduled new job with</p>
<pre><code>openqa-clone-job --skip-chained-deps --within-instance https://openqa.suse.de/tests/6107218 _GROUP=0 WORKER_CLASS=s390-kvm-sle12-poo93119-okurz
</code></pre>
<p>Created job #6111084: sle-15-SP2-Server-DVD-Updates-s390x-Build20210526-1-ltp_cve@s390x-kvm-sle12 -> <a href="https://openqa.suse.de/tests/6111084" class="external">https://openqa.suse.de/tests/6111084</a></p>
<p>EDIT: job is fine</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4106612021-05-26T10:47:31Zmgriessmeiermgriessmeier@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/410661/diff?detail_id=390114">diff</a>)</li></ul><p>new LPAR installed, <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/320" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/320</a> created, waiting for merge</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4106842021-05-26T11:29:07Zmgriessmeiermgriessmeier@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/410684/diff?detail_id=390128">diff</a>)</li></ul> openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4107322021-05-26T13:27:59Zokurzokurz@suse.com
<ul></ul><p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/320" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/320</a> was merged. Tests were picked up to run on the new instances. So far no problems observed.</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4107422021-05-26T14:04:26Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>[s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12</i> to <i>[s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12 auto_review:"(?s)2021-05-2[1-5]T.*Error connecting to <root@s390p.*.suse.de>: No route to host":retry</i></li></ul><p>As the new workers are in place we can now use auto-review, not only to label openQA jobs but to retrigger as well but selecting only jobs within the timeframe 2021-05-21 to 2021-05-25 to not match any potential future problems as well.</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4107482021-05-26T14:12:18Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>[s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12 auto_review:"(?s)2021-05-2[1-5]T.*Error connecting to <root@s390p.*.suse.de>: No route to host":retry</i> to <i>[s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12 auto_review:"(?s)2021-05-2[1-5]T.*Error connecting to <root@s390p.*.suse.de>: No route to host"</i></li></ul><p>Wait. If I understand the intended change correctly we will not have any machine "zkvm" anymore at all so we would need to change all test schedules, correct?</p>
<p>EDIT: for the sake of being backward-compatible how about to apply the worker class "svirt" to some or all s390-kvm-sle12 instances?</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4107502021-05-26T14:16:28Zgeorggkioulis@suse.com
<ul></ul><p>okurz wrote:</p>
<blockquote>
<p>Wait. If I understand the intended change correctly we will not have any machine "zkvm" anymore at all so we would need to change all test schedules, correct?</p>
</blockquote>
<p>Well for now we still have the zkvm Machine entry, it just points to a s390-kvm-sle12 worker class, so it should not affect scheduling. But maybe it should be re-adapted later to be consistent</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4107552021-05-26T14:29:15Zokurzokurz@suse.com
<ul></ul><p>yes, I saw that now as well. So it seems like someone updated the machine. Then what we would need is manual tinkering with API to reschedule all failed tests with the updated worker class, hm …</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4107632021-05-26T14:33:00Zgeorggkioulis@suse.com
<ul></ul><p>okurz wrote:</p>
<blockquote>
<p>yes, I saw that now as well. So it seems like someone updated the machine. Then what we would need is manual tinkering with API to reschedule all failed tests with the updated worker class, hm …</p>
</blockquote>
<p>I took the liberty to update the machine entry, I also am in the process of rescheduling any zkvm s390 jobs to update their Settings, and further potential tinkering</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4107682021-05-26T14:36:45Zmgriessmeiermgriessmeier@suse.com
<ul></ul><p>yeah technically the Machines 'zkvm' and s390-kvm-sle15 could be deleted...</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4107722021-05-26T14:51:43Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>[s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12 auto_review:"(?s)2021-05-2[1-5]T.*Error connecting to <root@s390p.*.suse.de>: No route to host"</i> to <i>[s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12 auto_review:"(?s)2021-05-2[1-5]T.*Error connecting to <root@s390p.*.suse.de>: No route to host":retry:WORKER_CLASS=s390-kvm-sle12</i></li></ul><p><a class="user active user-mention" href="https://progress.opensuse.org/users/30196">@geor</a> I think I have an idea how to extend auto-review to retrigger tests with changed settings. I hope you leave some old un-retriggered fails for me :D</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4107742021-05-26T14:59:54Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>[s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12 auto_review:"(?s)2021-05-2[1-5]T.*Error connecting to <root@s390p.*.suse.de>: No route to host":retry:WORKER_CLASS=s390-kvm-sle12</i> to <i>[s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12 auto_review:"(?s)2021-05-2[1-5]T.*Error connecting to <root@s390p.*.suse.de>: No route to host"</i></li></ul><p>ok, nevermind. I will followup with my approach elsewhere. For now you can go ahead your way to handle all failed jobs that you can find.</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4107872021-05-26T16:30:35Zgeorggkioulis@suse.com
<ul></ul><p>okurz wrote:</p>
<blockquote>
<p><a class="user active user-mention" href="https://progress.opensuse.org/users/30196">@geor</a> I think I have an idea how to extend auto-review to retrigger tests with changed settings. I hope you leave some old un-retriggered fails for me :D</p>
</blockquote>
<p>Sorry just saw it! But this sounds useful!</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4117682021-05-31T13:07:05Zgeorggkioulis@suse.com
<ul></ul><p>mgriessmeier wrote:</p>
<blockquote>
<p>yeah technically the Machines 'zkvm' and s390-kvm-sle15 could be deleted...</p>
</blockquote>
<p>I just started migrating all zkvm jobs to s390x-kvm-sle12, when all seems well I will delete the zkvm Machine entry.</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4127792021-06-02T13:44:09Zgeorggkioulis@suse.com
<ul></ul><p>geor wrote:</p>
<blockquote>
<p>mgriessmeier wrote:</p>
<blockquote>
<p>yeah technically the Machines 'zkvm' and s390-kvm-sle15 could be deleted...</p>
</blockquote>
<p>I just started migrating all zkvm jobs to s390x-kvm-sle12, when all seems well I will delete the zkvm Machine entry.</p>
</blockquote>
<p>Done, MRs can be found <a href="https://gitlab.suse.de/qsf-u/qa-sle-functional-userspace/-/merge_requests/134" class="external">here</a>, <a href="https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/merge_requests/126" class="external">here</a> and <a href="https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/merge_requests/130" class="external">here</a></p>
<p>I m keeping an eye on the newly scheduled s390 ex zkvm jobs just to be sure, but all looks good for now.</p>
<p>I think next week I will follow up with replacing s390-kvm-sle15 as well, from all job groups that reference it</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4188342021-06-22T07:31:48Zmgriessmeiermgriessmeier@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/418834/diff?detail_id=397627">diff</a>)</li><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li><li><strong>Priority</strong> changed from <i>High</i> to <i>Normal</i></li></ul> openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4189002021-06-22T09:17:39Zokurzokurz@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/418900/diff?detail_id=397687">diff</a>)</li></ul><p>I have added the additional setting <code>_MACHINE_COMMENT="As temporary workaround worker class set to s390x-kvm-sle12 instead of s390x-kvm-sle15, see https://progress.opensuse.org/issues/93119"</code> to the MACHINE config "s390x-kvm-sle15" on <a href="https://openqa.suse.de/admin/machines" class="external">https://openqa.suse.de/admin/machines</a></p>
<p>We still need to update existing, scheduled jobs though. With <code>openqa-cli api --pretty --osd jobs state=scheduled machine=zkvm | jq '.jobs | .[] | select(.settings.WORKER_CLASS=="svirt") | .id'</code> we can identify currently scheduled jobs that are scheduled for the worker class "svirt" which currently does not have a worker available anymore. I don't know how to update the settings over an existing job using <code>openqa-cli</code> though (tried <code>openqa-cli api --pretty -X put --osd jobs/6287610 -d '{"settings[WORKER_CLASS]": "s390x-kvm-sle12"}'</code>) but we can also do it over SQL. So I did:</p>
<pre><code>for i in $(openqa-cli api --pretty --osd jobs state=scheduled machine=zkvm | jq '.jobs | .[] | select(.settings.WORKER_CLASS=="svirt") | .id'); do ssh osd "sudo -u geekotest psql --command=\"update job_settings set value = 's390-kvm-sle12' where job_id = $i and key = 'WORKER_CLASS';\" openqa"; done
</code></pre>
<p>and the same for WORKER_CLASS "s390x-kvm-sle15". Now to check if jobs are being picked up.</p>
<p>EDIT: Jobs were not picked up until mkittler restarted the openQA scheduler. Likely there is some memory caching going on and the scheduler does not see the manual updates in the database.</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4189842021-06-22T12:43:33Zokurzokurz@suse.com
<ul><li><strong>Subject</strong> changed from <i>[s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12 auto_review:"(?s)2021-05-2[1-5]T.*Error connecting to <root@s390p.*.suse.de>: No route to host"</i> to <i>[s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12 auto_review:"(?s)2021-.*T.*Error connecting to <root@s390p.*.suse.de>: No route to host"</i></li></ul> openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4197642021-06-23T15:24:48Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-4 priority-default closed behind-schedule" href="/issues/94465">action #94465</a>: [tools] zkvm tests are scheduled by retriggering month old jobs even though we do not have any "svirt" workers anymore</i> added</li></ul> openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4341382021-08-10T12:15:47Ztinitatina.mueller+trick-redmine@suse.com
<ul><li><strong>Subject</strong> changed from <i>[s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12 auto_review:"(?s)2021-.*T.*Error connecting to <root@s390p.*.suse.de>: No route to host"</i> to <i>[s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12</i></li></ul><p>Temporarily disabled auto_review:</p>
<pre><code>auto_review:"(?s)2021-.*T.*Error connecting to <root@s390p.*.suse.de>: No route to host"
</code></pre>
<p>Because the regex is slow and can take over 4 minutes.<br>
We should improve the regex. Is there a log example with lines we have to match?</p>
<p>See also <a class="issue tracker-4 status-3 priority-4 priority-default closed" title="action: Slow grep in openqa-label-known-issues leads to high CPU usage (Resolved)" href="https://progress.opensuse.org/issues/96713">#96713</a></p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4355782021-08-16T21:14:02Zokurzokurz@suse.com
<ul></ul><p>I guess by now we should really not have any more jobs matching the original pattern unless people really try to retrigger multi-month old jobs</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=4363282021-08-18T12:15:38Ztinitatina.mueller+trick-redmine@suse.com
<ul></ul><p>The 8 day old tests <a href="https://openqa.suse.de/tests/6795459" class="external">https://openqa.suse.de/tests/6795459</a> and <a href="https://openqa.suse.de/tests/6795460" class="external">https://openqa.suse.de/tests/6795460</a> still have <code>WORKER_CLASS=s390x-kvm-sle15</code></p>
<p>Since there is no worker matching that, I cancelled the jobs.</p>
openQA Infrastructure - action #93119: [s390x] Update of s390x Test infrastructure after shutdown of Mainframe zEC12https://progress.opensuse.org/issues/93119?journal_id=7374112023-11-23T13:14:37Zmgriessmeiermgriessmeier@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Closed</i></li></ul><p>not applicable anymore</p>