openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842023-05-10T13:33:13ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #129065 (Resolved): [alert] HTTP Response alert fired, OSD loads s...https://progress.opensuse.org/issues/1290652023-05-10T13:33:13Znicksingernsinger@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1683722604024&to=1683725326412&viewPanel=78">https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1683722604024&to=1683725326412&viewPanel=78</a> alerted on 2023-05-10 15:07 CEST</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: The alert is not firing anymore.</li>
<li><strong>AC2</strong>: Logs have been investigated.</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Look into the timeframe
<a href="https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1683723624920&to=1683724305517">https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1683723624920&to=1683724305517</a> and compare to other panels on OSD if it's visible what made the system busy DONE: nothing too unusual. Maybe a little high IO times but far from concerning</li>
<li><p><a class="user active user-mention" href="https://progress.opensuse.org/users/17668">@okurz</a> suggested in <a href="https://suse.slack.com/archives/C02AJ1E568M/p1683724668733689?thread_ts=1683724103.321589&cid=C02AJ1E568M">https://suse.slack.com/archives/C02AJ1E568M/p1683724668733689?thread_ts=1683724103.321589&cid=C02AJ1E568M</a> that it might be caused by something we don't collect metrics from - brainstorm what these could be, implement metrics for them</p>
<ul>
<li>Open network connections - nsinger observed peaks of >2k, ~75% of them related to httpd-prefork, ~20% to openqa-websocket</li>
<li>> (Nick Singer) I'm currently logged into OSD. CPU utilization is quite high with a longterm load of 12 and shortterm of ~14 with only 12 cores on OSD. velociraptor goes up to 200% and is in general quite high in the process list but also telegraf and obviously openqa itself.
> (Oliver Kurz) all of that sounds fine. When the HTTP response was high I just took a look and the CPU usage was near 0 same as we suspected in the past. Remember our debugging on why qanet is slow? Comparable to that but here it's likely apache, number of concurrent connections, something like that</li>
</ul></li>
<li><p>Take <a href="https://suse.slack.com/archives/C02CANHLANP/p1683723956965209">https://suse.slack.com/archives/C02CANHLANP/p1683723956965209</a> into account - is there something we can do to improve this situation?</p></li>
</ul>
<blockquote>
<p>(Joaquin Rivera) is OSD also slow for someone else? (edited) <br>
(Fabian Vogt) That might be partially because of the yast2_nfs_server jobs for investigation. You might want to delete them now that they did their job. (e.g. <a href="https://openqa.suse.de/tests/11085729">https://openqa.suse.de/tests/11085729</a>. Don't open, might crash your browser...). those jobs are special. serial_terminal has some race condition so they hammer enter_cmd + assert_script_run in a loop until it fails</p>
</blockquote>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>limiting the number of test result steps uploads or handling the effect of test result steps uploading -> <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Limit the number of uploadable test result steps size:M (Resolved)" href="https://progress.opensuse.org/issues/129068">#129068</a></li>
</ul>
openQA Infrastructure - action #128420 (Resolved): [alert][grafana] 100% packet loss from qa-powe...https://progress.opensuse.org/issues/1284202023-04-28T16:55:01Znicksingernsinger@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Starting 2023-04-27 15:15:00 the mentioned machines in the title failed to access/ping s390 LPARs. Something between these hosts has changed or broke and needs to be fixed.<br>
We had similar issues in the past, see the following SD tickets:</p>
<ul>
<li><a href="https://sd.suse.com/servicedesk/customer/portal/1/SD-92689" class="external">https://sd.suse.com/servicedesk/customer/portal/1/SD-92689</a></li>
<li><a href="https://sd.suse.com/servicedesk/customer/portal/1/SD-115963" class="external">https://sd.suse.com/servicedesk/customer/portal/1/SD-115963</a></li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Check what these machines have in common. A quick look of mine showed that they are in the "old" qa network close by: <a href="https://racktables.suse.de/index.php?page=rack&rack_id=516" class="external">https://racktables.suse.de/index.php?page=rack&rack_id=516</a></li>
<li>Check if other machines in that location, network, room, switch have the same problems</li>
<li>Create a new SD ticket referencing the old ones. Robert mentioned in one of them that we might need to get rid of a second uplink </li>
</ul>
<a name="Rollback-steps"></a>
<h2 >Rollback steps<a href="#Rollback-steps" class="wiki-anchor">¶</a></h2>
<ol>
<li>Remove silence for rule_uid=2Z025iB4km </li>
</ol>
openQA Infrastructure - action #128417 (Resolved): [alert][grafana] openqaw5-xen: partitions usag...https://progress.opensuse.org/issues/1284172023-04-28T16:44:30Znicksingernsinger@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>On 2023-04-28 16:30 the partition usage of w5-xen skyrocketed to >90% (<a href="https://stats.openqa-monitor.qa.suse.de/d/GDopenqaw5-xen/dashboard-for-openqaw5-xen?orgId=1&viewPanel=65090&from=1682657429086&to=1682699823248" class="external">https://stats.openqa-monitor.qa.suse.de/d/GDopenqaw5-xen/dashboard-for-openqaw5-xen?orgId=1&viewPanel=65090&from=1682657429086&to=1682699823248</a>) and quickly after a alert was fired. Someone or something cleaned up a short time after to a reasonable 40% usage.</p>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>DONE: Check with e.g. <a class="user active user-mention" href="https://progress.opensuse.org/users/17668">@okurz</a> if this was maybe a one-time thing because somebody moved around stuff manually</li>
<li>DONE: Manual cleanup of files in /var/lib/libvirt/images, ask in #eng-testing what the stuff is needed for</li>
<li>Plug in more SSDs. Likely we have some spare in FC Basement shelves</li>
<li>Check virsh XMLs to crosscheck openQA jobs before deleting anything for good</li>
<li><del>Adjust the alert to allow longer periods over the threshold</del> We decided that our thresholds are feasible</li>
</ul>
openQA Infrastructure - coordination #113674 (Resolved): [epic] Configure I/O alerts again for th...https://progress.opensuse.org/issues/1136742022-07-15T15:16:23Znicksingernsinger@suse.com
<a name="Summary"></a>
<h1 >Summary<a href="#Summary" class="wiki-anchor">¶</a></h1>
<p>With <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Webui Summary dashboard in Grafana is missing I/O panels size:M (Resolved)" href="https://progress.opensuse.org/issues/112733">#112733</a> we got new I/O panels for the webui. Due to the nature of <strong>repeating panels</strong> we cannot add an alert for the IO time with the current alerting backend we use. This should be possible with unified alerting: <a href="https://grafana.com/blog/2021/06/14/the-new-unified-alerting-system-for-grafana-everything-you-need-to-know/" class="external">https://grafana.com/blog/2021/06/14/the-new-unified-alerting-system-for-grafana-everything-you-need-to-know/</a></p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> alerts for each disk on the webui with according thresholds</li>
<li><strong>AC2:</strong> grouping of alerts is properly configured and understood</li>
<li><strong>AC3:</strong> alerts can be configured across multiple panels (using repeated panels)</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Take a look at our previous alerting rule: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/1c505df5e92420d0f266e7ea4b3a049aae892dd5/monitoring/grafana/webui.dashboard.json#L3757-3842" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/1c505df5e92420d0f266e7ea4b3a049aae892dd5/monitoring/grafana/webui.dashboard.json#L3757-3842</a></li>
<li>Find out how to migrate to the new system, automatically/ manually</li>
<li>Repeating panels are important here so we can let Grafana create multiple panels based on different variables i.e. as opposed to having to copy and duplicate panels via salt
<ul>
<li>Currently we have panels that consist of variables, which can't support alerts</li>
<li>Ask Nick in case it's unclear</li>
</ul></li>
<li>Try out with an official test instance of Grafana available from their website</li>
<li>Test with a container</li>
<li>Confirm what we end up with e.g. new JSON or different layout</li>
<li>Keep in mind this is the default for Grafana 10 and our current setup may not be supportable long-term</li>
</ul>
openQA Infrastructure - coordination #112718 (Resolved): [alert][osd] openqa.suse.de is not reach...https://progress.opensuse.org/issues/1127182022-06-20T03:38:42Znicksingernsinger@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>We received a lot of alerts over the weekend regarding failed minion jobs and others. Checking Grafana I can see that the problem started Saturday, 18th of June around 13:00 CET: <a href="https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1655549105000&to=now" class="external">https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1655549105000&to=now</a><br>
The amount of returned PostgreSQL rows looks very suspicious and is now five times as high as before: <a href="https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1655475539000&to=now&viewPanel=89" class="external">https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1655475539000&to=now&viewPanel=89</a></p>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Load <a href="https://progress.opensuse.org/projects/openqav3/wiki/#Backup" class="external">OSD database dump</a> from after the incident started and try to reproduce the problem</li>
<li>Research how to find out where heavy queries come from</li>
<li>Research what can cause rows returned to grow from <100k to 20-60M</li>
</ul>
<a name="Problem"></a>
<h2 >Problem<a href="#Problem" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>H1:</strong> The migration to "bigint" has triggered a query planner update causing to end up with sub-optimal routing. As also auto-vacuum is eventually triggering "ANALYZE" we assume that eventually the system would recover automatically by using optimized queries. This is likely what happened on o3 after the period of 1-2 days. On OSD we do not have enough performance headroom (in particular CPU and potentially disk I/O) to cover for such periods.</li>
</ul>
<a name="Rollback-and-cleanup-steps"></a>
<h2 >Rollback and cleanup steps<a href="#Rollback-and-cleanup-steps" class="wiki-anchor">¶</a></h2>
<ul>
<li><em>DONE:</em> on osd <code>systemctl enable --now telegraf</code></li>
<li><em>DONE:</em> on osd <code>systemctl unmask --now openqa-scheduler</code> -> <a class="issue tracker-6 status-3 priority-4 priority-default closed child parent" title="coordination: [alert][osd] openqa.suse.de is not reachable anymore, response times > 30s, multiple alerts over ... (Resolved)" href="https://progress.opensuse.org/issues/112718#note-57">#112718#note-57</a></li>
<li><em>DONE:</em> Retrigger all incomplete jobs since 2022-06-18 with <a href="https://github.com/os-autoinst/scripts/blob/master/openqa-advanced-retrigger-jobs" class="external">https://github.com/os-autoinst/scripts/blob/master/openqa-advanced-retrigger-jobs</a> -> <a class="issue tracker-6 status-3 priority-4 priority-default closed child parent" title="coordination: [alert][osd] openqa.suse.de is not reachable anymore, response times > 30s, multiple alerts over ... (Resolved)" href="https://progress.opensuse.org/issues/112718#note-57">#112718#note-57</a></li>
<li><em>DONE:</em> Retrigger failed obs-sync trigger events: <a href="https://openqa.suse.de/admin/obs_rsync/" class="external">https://openqa.suse.de/admin/obs_rsync/</a></li>
<li><em>DONE:</em> Retrigger failed qem-bot trigger events: <a href="https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines</a></li>
<li><em>DONE:</em> Retrigger failed openQA bot trigger events: <a href="https://gitlab.suse.de/qa-maintenance/openQABot/-/pipelines" class="external">https://gitlab.suse.de/qa-maintenance/openQABot/-/pipelines</a></li>
<li><em>DONE:</em> Unmask and start on openqaworker10 and 13: <code>sudo systemctl unmask --now openqa-worker-cacheservice openqa-worker@{1..20}</code></li>
<li><em>DONE:</em> Remove /etc/openqa/templates/main/index.html.ep</li>
<li><em>DONE:</em> Apply salt high state and check that files are back to maintained format, e.g.
<ul>
<li><em>DONE:</em> /usr/share/openqa/script/openqa-gru</li>
</ul></li>
<li><em>DONE:</em> on osd <code>systemctl unmask --now salt-master</code> and ensure that /etc/telegraf/telegraf.d/telegraf-webui.conf is reverted</li>
<li>Unpause alerts:
<ul>
<li>Broken workers</li>
<li>Failed systemd services (except openqa.suse.de)</li>
<li>Open database connections by user</li>
<li>openqa-scheduler.service</li>
<li>salt-master.service</li>
<li>web UI: Too many minion job failures</li>
</ul></li>
</ul>
openQA Infrastructure - action #70834 (Resolved): [alert] Refine I/O time alerts for OSDhttps://progress.opensuse.org/issues/708342020-09-02T07:52:00Znicksingernsinger@suse.com
<p>We have several IO time alerts for OSD itself:</p>
<ul>
<li><a href="https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?panelId=46&fullscreen&edit&tab=alert" class="external">https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?panelId=46&fullscreen&edit&tab=alert</a></li>
<li><a href="https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?panelId=47&fullscreen&edit&tab=alert" class="external">https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?panelId=47&fullscreen&edit&tab=alert</a></li>
<li><a href="https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?panelId=48&fullscreen&edit&tab=alert" class="external">https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?panelId=48&fullscreen&edit&tab=alert</a></li>
<li><a href="https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?panelId=57&fullscreen&edit&tab=alert" class="external">https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?panelId=57&fullscreen&edit&tab=alert</a></li>
</ul>
<p>They need to be reworked so that:</p>
<ol>
<li>The right disk is shown for the right purpose (e.g. /dev/vde is not /results any longer)
<ul>
<li><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/telegraf-webui.conf#L32" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/telegraf-webui.conf#L32</a> might needs adjustments to store persistent identifier like UUIDs</li>
<li>The panel itself maybe can be generated out of info from salt (mountpoint): <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/monitoring/grafana/webui.dashboard.json#L5769" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/monitoring/grafana/webui.dashboard.json#L5769</a></li>
</ul></li>
<li>DONE: <del>The alert thresholds need to be adjusted to not trigger that often</del>
<ul>
<li><del>Spikes of up to 7s seem to happen from time to time</del></li>
<li><del>The situation gets critical if these spikes continue for several minutes</del></li>
</ul></li>
</ol>
<p><del>All above linked alerts are on pause right now since they don't provide a big benefit being that flaky.</del></p>
openQA Project - action #34555 (Rejected): "openqa/script/client assets get" does not work (anymo...https://progress.opensuse.org/issues/345552018-04-09T12:21:53Znicksingernsinger@suse.com
<p>While fixing a trigger-script I encountered the following problem on my TW box:</p>
<pre><code>glados git/scripts ‹oqainoqa_versionfix› » /usr/share/openqa/script/client --host "https://openqa.opensuse.org" assets get
Use of uninitialized value in numeric eq (==) at /usr/share/openqa/script/client line 247.
Use of uninitialized value in printf at /usr/share/openqa/script/client line 256.
ERROR: - Connect timeout
</code></pre>
<p>client.conf:</p>
<pre><code>[openqa.suse.de]
key = VALID_KEY
secret = VALID_SECRET
</code></pre>
<p>Also `--host "openqa.opensuse.org" didn't work. Works with version openQA-client-4.5.1521796063.34353006-31.1.noarch on riafarov's PC.<br>
Didn't work with version openQA-client-4.5.1522146606.26761592-34.1.noarch and most recent packaged version openQA-client-4.5.1523007547.90f9c396-44.1.noarch</p>
openQA Tests - action #31963 (Resolved): [functional][ha][yast][fast][easy] test fails in addon_p...https://progress.opensuse.org/issues/319632018-02-19T12:13:53Znicksingernsinger@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario sle-15-Installer-DVD-x86_64-addon-module-ftp@64bit fails in<br>
<a href="https://openqa.suse.de/tests/1483151/modules/addon_products_sle/steps/9" class="external">addon_products_sle</a> after <a href="https://bugzilla.suse.com/show_bug.cgi?id=1057223" class="external">bsc#1057223</a> got fixed.</p>
<p>Now we need to adapt the test to handle the new license agreement screen. This involves at least needle updates.</p>
<p>After further investigation we found out that this is most likely related to a product issue (SLES license appears twice). However I just lost the overview of all these modules/addons/products and I don't want to argue anymore with the devs.</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since Build <a href="https://openqa.suse.de/tests/1483151" class="external">459.1</a></p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p><strong>AC1:</strong> addon_products_sle is able to handle the license agreement screen properly to continue testing.<br>
<strong>AC1:</strong> Create a BSC and argue about technical excuses with the devs</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/1475944" class="external">457.1</a> (or more recent)</p>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=x86_64&version=15&flavor=Installer-DVD&distri=sle&test=addon-module-ftp&machine=64bit" class="external">latest</a></p>
openQA Tests - action #31603 (Resolved): [functional][medium] xen-pv tests do not register agains...https://progress.opensuse.org/issues/316032018-02-09T16:07:36Znicksingernsinger@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Test running on xen-pv do register against scc.suse.com: <a href="https://openqa.suse.de/tests/1459355/modules/scc_registration/steps/3" class="external">scc_registration</a></p>
<p>This results in the following repos being added to the SUT:</p>
<pre><code> # | Alias | Name | Enabled | GPG Check | Refresh | URI
---+-----------------------------------------------------------------------------------------+----------------------------------------------------+---------+-----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | Basesystem_Module_15_x86_64:SLE-Module-Basesystem15-Debuginfo-Pool | SLE-Module-Basesystem15-Debuginfo-Pool | No | ---- | ---- | https://updates.suse.com/SUSE/Products/SLE-Module-Basesystem/15/x86_64/product_debug?jFkKFPOsZpUhSUc7IH_UxVkGrqEVckAtQeTA9zjkkRUABslMWEm1xoNnfQUsU7ju1cvV6F0vDzVR8qtwBgtxFmDwBtTL0I7nsk9jjrySKRiqpiNg2jfU1BcR3oSfzi1tK7Whh4OQSIsJ2PVvZWP3XT9pyiG8TcSa_8Y
2 | Basesystem_Module_15_x86_64:SLE-Module-Basesystem15-Debuginfo-Updates | SLE-Module-Basesystem15-Debuginfo-Updates | No | ---- | ---- | https://updates.suse.com/SUSE/Updates/SLE-Module-Basesystem/15/x86_64/update_debug?CKtFrwmtv_ofFSr0zvrgaqpdXbJDJ-z0ccdjkZP65uRH6uunbtlMUnEho8dqgNcNS08WuZOfG8RUMI7W19BqIHfnaN-m7xW4kJoH4HdP2hgl7NE3J_jQF5hwZo1WF8UjMNacXYUrVyJY4Nte9v_YaIz0TMZMih_q
3 | Basesystem_Module_15_x86_64:SLE-Module-Basesystem15-Pool | SLE-Module-Basesystem15-Pool | Yes | (r ) Yes | No | https://updates.suse.com/SUSE/Products/SLE-Module-Basesystem/15/x86_64/product?qbyfz3cSl0jlK_uGuB7z495hO84cl1AiucQngDYMbhoMZ4r0eiPensBJCpWuxFM4h7qMklg9WgpEkFYBbvEXOqXf3dA1svKYHQTacI8ffm0Yrc3vyzfsC5KQqS4sanvJBQDEEArgm7p8AkpOM0YHcfcABZs
4 | Basesystem_Module_15_x86_64:SLE-Module-Basesystem15-Source-Pool | SLE-Module-Basesystem15-Source-Pool | No | ---- | ---- | https://updates.suse.com/SUSE/Products/SLE-Module-Basesystem/15/x86_64/product_source?cDlGk1URM4KC4_M5vgraVg4kNRUUiiD3Zt3PX00D_BNUqyGzvK82Kq03xwnq3aJAHQUAH_DrRYRBHr0GK49k-K5OJdyd0xn6EoKYRLNzaS8jvfaVl58iY7-sVnfKpXOPLY8WGb2jU-Ft47vQ0dhk-rYNJFaTs4GCVXrc
5 | Basesystem_Module_15_x86_64:SLE-Module-Basesystem15-Updates | SLE-Module-Basesystem15-Updates | Yes | (r ) Yes | Yes | https://updates.suse.com/SUSE/Updates/SLE-Module-Basesystem/15/x86_64/update?6casmX_siDQE1VgVAWy-0wZBnlY6WRIGgLhsLp26P_yyM_WNznLKnumxYmSk9BtFN7xRghr4Mzn88CNYGUga0REICDM-VkaeuRnS4uPEMy5w-B5HWpx6byKYjKyNvrcC297k9ksyJH_f3kfuix1uyY1L
6 | SUSE_Linux_Enterprise_Server_15_x86_64:SLE-Product-SLES15-Pool | SLE-Product-SLES15-Pool | Yes | (r ) Yes | No | https://updates.suse.com/SUSE/Products/SLE-Product-SLES/15/x86_64/product?ZCVRqS8a1yyAvvJFsPLzqqMtDGlYxWsXmws3s-9YKbvE09Q_nrzbM6slCouxuuP3lVVFMLRJJw3qN3ysNeArgR1k8I4--R2kx8X3gCAWnPkg3v5g6VDvF4teznUldXvA1jwQtyw05ADt8k_TgmDt
7 | SUSE_Linux_Enterprise_Server_15_x86_64:SLE-Product-SLES15-Updates | SLE-Product-SLES15-Updates | Yes | (r ) Yes | Yes | https://updates.suse.com/SUSE/Updates/SLE-Product-SLES/15/x86_64/update?mVFklqQaqWoMZGbgyJ4b_aU9R4ghGpmU5UXsZ3SHzcC3VCJYlF-pXjPNnV82O30E1VmLwpjxKTneI_pDE8_5K8P94BXTsJrP6jaWpWb_k3fdL_MG7-1GmR2HTUsvhrh0Tb3Ra640r1pUVciRkQ
8 | Server_Applications_Module_15_x86_64:SLE-Module-Server-Applications15-Debuginfo-Pool | SLE-Module-Server-Applications15-Debuginfo-Pool | No | ---- | ---- | https://updates.suse.com/SUSE/Products/SLE-Module-Server-Applications/15/x86_64/product_debug?V2eS_ZjAtvrTattiP3x3isQL63o7DxxVQyL3h9b5zOkWUR2RHRXW-8yIaspPz5koj7hTxTt8WyhK0_wkZAGORpWbR0MdemDdI9SX6xSUZEoQMPaQtapHe7wbRzYEa6ETZ5Y6_wkAX60zzs3WrIhRDIoNGZmsaeUNgwEtq8ULAX3QuHI
9 | Server_Applications_Module_15_x86_64:SLE-Module-Server-Applications15-Debuginfo-Updates | SLE-Module-Server-Applications15-Debuginfo-Updates | No | ---- | ---- | https://updates.suse.com/SUSE/Updates/SLE-Module-Server-Applications/15/x86_64/update_debug?riqKW81Kba4LljHUoIug0ZPIQTw0QG0hRRE4_ZuGNxawHzjZdSy6nTMmC25tDgeL8E6stMNsIN5-WkSSzXXcCarFO5spwLxamKguesIoGOjqtYGM7FRns--dVgiLVImT2Sn9zErCMHI4obUhgb923u4kTUOiS_AjJ3RqPOe92wEI
10 | Server_Applications_Module_15_x86_64:SLE-Module-Server-Applications15-Pool | SLE-Module-Server-Applications15-Pool | Yes | (r ) Yes | No | https://updates.suse.com/SUSE/Products/SLE-Module-Server-Applications/15/x86_64/product?YbWQyJAPI7FSIDOMmA6U5sG9z-pHpdNBoW9Ia1rJJMzry7wXbKx5EgOvpEayRQCoRZcaaepBeR9XEQO4SxLETW3atvPwJHE1VRZDXgHa4iuXvCjRTgKtxlVz3cC_Ry9Nod6JhRVhbz8xHHV90z6KvzpCXYHKa9tKzptBaQ0
11 | Server_Applications_Module_15_x86_64:SLE-Module-Server-Applications15-Source-Pool | SLE-Module-Server-Applications15-Source-Pool | No | ---- | ---- | https://updates.suse.com/SUSE/Products/SLE-Module-Server-Applications/15/x86_64/product_source?wfT1ibVGZswD3enZi6l_qD9NPN6puGNYMf_5bH9_TxpFWthLa5F24ahxYIEMRYK74_toxUX3GHMKXvJPgnq29Sr2n5aZsVzzFrj92k52fMWGyuVUst7O3_8mum8phD558I2q1TJuHsjvraqmx5xgsvD84Gbg0a9KbxOoTBMr5RFXpq1f
12 | Server_Applications_Module_15_x86_64:SLE-Module-Server-Applications15-Updates | SLE-Module-Server-Applications15-Updates | Yes | (r ) Yes | Yes | https://updates.suse.com/SUSE/Updates/SLE-Module-Server-Applications/15/x86_64/update?12G6EXPLytchxzDtQtrGRUsbLPHhWuDTz1o4dmfmH1MWnPq9oaPxPfuF1GDhPkRpznZvhxzOOI1Lt_LbJdQUy4yGwg75T_cW0WxOycIFoXCObDXyFHhwG7kr7AC27DwZV2gpKwz-ezerz3GbqCboZbpywl-NnXRSKHoi
</code></pre>
<p>and finally explodes when reaching the <a href="https://openqa.suse.de/tests/1459355#step/zypper_lifecycle/14" class="external">zypper_lifecycle</a> module because it cannot access these URLs.<br>
E.g. <a href="https://updates.suse.com/SUSE/Updates/SLE-Module-Server-Applications/15/x86_64/update/repodata/">https://updates.suse.com/SUSE/Updates/SLE-Module-Server-Applications/15/x86_64/update/repodata/</a> -> 403 - Forbidden</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Seen on textmode and graphical xen-pv jobs.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<p><strong>AC1</strong>: Make use of proxy-SCC like other tests do</p>
openQA Project - action #19156 (Closed): "isotovideo failed" on aarch64 - no obvious reason, no m...https://progress.opensuse.org/issues/191562017-05-12T12:27:02Znicksingernsinger@suse.com
<p><a href="https://openqa.suse.de/tests/930599" class="external">https://openqa.suse.de/tests/930599</a> fails without obvious reason.<br>
The only information provided by the logs is:</p>
<pre><code>23:17:30.5925 32519 isotovideo failed
</code></pre>
<p>Previous runs look all good and even other tests on aarch64 do not fail because of this. So most likely a sporadic issue without good reproducibility.</p>
openQA Infrastructure - action #18164 (Resolved): [devops][tools] monitoring of openqa worker ins...https://progress.opensuse.org/issues/181642017-03-30T08:07:18Znicksingernsinger@suse.com
<p>As already mentioned by okurz in poo#12912 we need proper monitoring of all important machines according openQA.<br>
OSD is already in the icinga instance maintained by Infra so i create this ticket to also keep track of the workers themselves.</p>
openQA Tests - action #18162 (Resolved): [tools][openqa-monitoring] START_AFTER_TEST not foundhttps://progress.opensuse.org/issues/181622017-03-30T07:46:49Znicksingernsinger@suse.com
<a name="observation"></a>
<h2 >observation<a href="#observation" class="wiki-anchor">¶</a></h2>
<pre><code>[Wed Mar 29 23:20:25 2017] [4347:warn] START_AFTER_TEST=RAID0:64bit not found - check for typos and dependency cycles
[Wed Mar 29 23:56:50 2017] [9214:warn] START_AFTER_TEST=RAID0:64bit not found - check for typos and dependency cycles
[Wed Mar 29 23:57:00 2017] [9214:warn] START_AFTER_TEST=install_only:64bit not found - check for typos and dependency cycles
[Wed Mar 29 23:57:09 2017] [9214:warn] START_AFTER_TEST=install_only:64bit not found - check for typos and dependency cycles
[Wed Mar 29 23:57:10 2017] [8449:warn] START_AFTER_TEST=install_only:64bit not found - check for typos and dependency cycles
[Wed Mar 29 23:57:10 2017] [8449:warn] START_AFTER_TEST=install_only:64bit not found - check for typos and dependency cycles
[Wed Mar 29 23:57:10 2017] [8449:warn] START_AFTER_TEST=install_only:64bit not found - check for typos and dependency cycles
[and many more…]
</code></pre>
<a name="suggestion"></a>
<h2 >suggestion<a href="#suggestion" class="wiki-anchor">¶</a></h2>
<p>Since this is now a known issue and not critical for the work of the server, I'll provide a PR for the logwarn-script of okurz to ignore this warning.</p>
openQA Project - action #18076 (Resolved): [tools][openqa-monitoring] no products found, retryin...https://progress.opensuse.org/issues/180762017-03-28T08:15:16Znicksingernsinger@suse.com
<a name="observation"></a>
<h2 >observation<a href="#observation" class="wiki-anchor">¶</a></h2>
<pre><code>[Tue Mar 28 07:09:40 2017] [31798:warn] no products found, retrying version wildcard
[Tue Mar 28 06:21:26 2017] [25543:warn] no products found, retrying version wildcard
[Tue Mar 28 05:50:48 2017] [14721:warn] no products found, retrying version wildcard
[Tue Mar 28 05:50:49 2017] [14721:warn] no products found, retrying version wildcard
[and many more…]
</code></pre>
<a name="suggestion"></a>
<h2 >suggestion<a href="#suggestion" class="wiki-anchor">¶</a></h2>
<p>Not sure about the true meaning of this warning but it's lack of information makes this warning almost useless anyway.<br>
Since this is now a known issue, I'll provide a PR for the logwarn-script of okurz to ignore this warning.</p>
openQA Project - action #18052 (Rejected): [tools][openqa-monitoring] Error on AMQP channel rece...https://progress.opensuse.org/issues/180522017-03-27T14:04:27Znicksingernsinger@suse.com
<a name="observation"></a>
<h2 >observation<a href="#observation" class="wiki-anchor">¶</a></h2>
<pre><code>[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[Mon Mar 27 12:11:02 2017] [1401:warn] Error on AMQP channel received: Failed closing channel: Unknown channel id received: 1
[and many more…]
</code></pre>
<a name="suggestion"></a>
<h2 >suggestion<a href="#suggestion" class="wiki-anchor">¶</a></h2>
<p>Someone seems to use the channel 1 which is obviously not known to AMQP. Further investigation is needed to find the root cause of the problem.<br>
Since this is now a known issue, I'll provide a PR for the logwarn-script of okurz to ignore this warning.</p>
openQA Project - action #18048 (Resolved): [tools][openqa-monitoring] AMQP connection closedhttps://progress.opensuse.org/issues/180482017-03-27T13:43:53Znicksingernsinger@suse.com
<a name="observation"></a>
<h2 >observation<a href="#observation" class="wiki-anchor">¶</a></h2>
<pre><code>[Mon Mar 27 10:11:24 2017] [25848:warn] AMQP connection closed
[Mon Mar 27 10:11:28 2017] [25898:warn] AMQP connection closed
[Mon Mar 27 10:11:45 2017] [24427:warn] AMQP connection closed
[Mon Mar 27 10:11:55 2017] [27069:warn] AMQP connection closed
[Mon Mar 27 10:12:03 2017] [25967:warn] AMQP connection closed
[Mon Mar 27 10:12:03 2017] [25704:warn] AMQP connection closed
[Mon Mar 27 10:12:03 2017] [22927:warn] AMQP connection closed
[Mon Mar 27 10:12:03 2017] [22869:warn] AMQP connection closed
[Mon Mar 27 10:12:29 2017] [25848:warn] AMQP connection closed
[Mon Mar 27 10:12:33 2017] [25711:warn] AMQP connection closed
[Mon Mar 27 10:12:33 2017] [25898:warn] AMQP connection closed
[Mon Mar 27 10:13:00 2017] [27069:warn] AMQP connection closed
[Mon Mar 27 10:13:08 2017] [22927:warn] AMQP connection closed
[Mon Mar 27 10:13:08 2017] [25967:warn] AMQP connection closed
[Mon Mar 27 10:13:08 2017] [25704:warn] AMQP connection closed
[and many more…]
</code></pre>
<a name="suggestion"></a>
<h2 >suggestion<a href="#suggestion" class="wiki-anchor">¶</a></h2>
<p>I guess the connection to the AMQP sockets get not closed after use and/or timeout after some time. Maybe a proper close of these sockets is required. Maybe the timeout for it is to low or maybe it's a whole other problem.<br>
Since this is now a known issue, I'll provide a PR for the logwarn-script of okurz to ignore this warning.</p>