openSUSE Project Management Tool: Issues
https://progress.opensuse.org/
https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?1582917784
2023-09-07T08:04:46Z
openSUSE Project Management Tool
Redmine
openQA Infrastructure - action #135335 (Resolved): [tools] gitlabci salt-pillars-openqa deploy f...
https://progress.opensuse.org/issues/135335
2023-09-07T08:04:46Z
osukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1810907">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1810907</a></p>
<p>from log:</p>
<pre><code> ID: wicked ifup br1
Function: cmd.run
Result: False
Comment: Command "wicked ifup br1" run
Started: 07:07:10.803006
Duration: 30119.464 ms
Changes:
----------
pid:
16955
retcode:
157
stderr:
stdout:
br1 no-device
Name: /etc/sysconfig/network/ifcfg-tap0 - Function: file.managed - Result: Clean Started: - 07:07:40.934647 Duration: 7.021 ms
Name: /etc/sysconfig/network/ifcfg-tap64 - Function: file.managed - Result: Clean Started: - 07:07:40.945434 Duration: 5.152 ms
Name: /etc/sysconfig/network/ifcfg-tap128 - Function: file.managed - Result: Clean Started: - 07:07:40.954239 Duration: 5.0 ms
Name: /etc/sysconfig/network/ifcfg-tap1 - Function: file.managed - Result: Clean Started: - 07:07:40.962885 Duration: 5.013 ms
Name: /etc/sysconfig/network/ifcfg-tap65 - Function: file.managed - Result: Clean Started: - 07:07:40.971581 Duration: 5.002 ms
Name: /etc/sysconfig/network/ifcfg-tap129 - Function: file.managed - Result: Clean Started: - 07:07:40.980234 Duration: 4.984 ms
Name: /etc/sysconfig/network/ifcfg-tap2 - Function: file.managed - Result: Clean Started: - 07:07:40.988915 Duration: 5.037 ms
Name: /etc/sysconfig/network/ifcfg-tap66 - Function: file.managed - Result: Clean Started: - 07:07:40.997624 Duration: 4.987 ms
Name: /etc/sysconfig/network/ifcfg-tap130 - Function: file.managed - Result: Clean Started: - 07:07:41.006362 Duration: 4.946 ms
Name: /etc/sysconfig/network/ifcfg-tap3 - Function: file.managed - Result: Clean Started: - 07:07:41.015198 Duration: 5.266 ms
Name: /etc/sysconfig/network/ifcfg-tap67 - Function: file.managed - Result: Clean Started: - 07:07:41.024299 Duration: 5.178 ms
Name: /etc/sysconfig/network/ifcfg-tap131 - Function: file.managed - Result: Clean Started: - 07:07:41.033154 Duration: 4.992 ms
Name: /etc/sysconfig/network/ifcfg-tap4 - Function: file.managed - Result: Clean Started: - 07:07:41.041806 Duration: 4.955 ms
Name: /etc/sysconfig/network/ifcfg-tap68 - Function: file.managed - Result: Clean Started: - 07:07:41.050532 Duration: 5.287 ms
Name: /etc/sysconfig/network/ifcfg-tap132 - Function: file.managed - Result: Clean Started: - 07:07:41.059443 Duration: 4.926 ms
Name: /etc/sysconfig/network/ifcfg-tap5 - Function: file.managed - Result: Clean Started: - 07:07:41.068081 Duration: 4.993 ms
Name: /etc/sysconfig/network/ifcfg-tap69 - Function: file.managed - Result: Clean Started: - 07:07:41.076758 Duration: 4.93 ms
Name: /etc/sysconfig/network/ifcfg-tap133 - Function: file.managed - Result: Clean Started: - 07:07:41.085353 Duration: 4.942 ms
Name: /etc/sysconfig/network/ifcfg-tap6 - Function: file.managed - Result: Clean Started: - 07:07:41.093943 Duration: 5.056 ms
Name: /etc/sysconfig/network/ifcfg-tap70 - Function: file.managed - Result: Clean Started: - 07:07:41.102645 Duration: 4.935 ms
Name: /etc/sysconfig/network/ifcfg-tap134 - Function: file.managed - Result: Clean Started: - 07:07:41.111287 Duration: 4.987 ms
Name: /etc/sysconfig/network/ifcfg-tap7 - Function: file.managed - Result: Clean Started: - 07:07:41.119942 Duration: 4.913 ms
Name: /etc/sysconfig/network/ifcfg-tap71 - Function: file.managed - Result: Clean Started: - 07:07:41.128614 Duration: 4.959 ms
Name: /etc/sysconfig/network/ifcfg-tap135 - Function: file.managed - Result: Clean Started: - 07:07:41.137410 Duration: 4.953 ms
Name: /etc/sysconfig/network/ifcfg-tap8 - Function: file.managed - Result: Clean Started: - 07:07:41.146176 Duration: 4.935 ms
Name: /etc/sysconfig/network/ifcfg-tap72 - Function: file.managed - Result: Clean Started: - 07:07:41.154807 Duration: 5.035 ms
Name: /etc/sysconfig/network/ifcfg-tap136 - Function: file.managed - Result: Clean Started: - 07:07:41.163660 Duration: 4.937 ms
Name: /etc/sysconfig/network/ifcfg-tap9 - Function: file.managed - Result: Clean Started: - 07:07:41.172266 Duration: 4.954 ms
Name: /etc/sysconfig/network/ifcfg-tap73 - Function: file.managed - Result: Clean Started: - 07:07:41.181001 Duration: 4.95 ms
Name: /etc/sysconfig/network/ifcfg-tap137 - Function: file.managed - Result: Clean Started: - 07:07:41.189605 Duration: 5.503 ms
</code></pre>
<a name="Acceptance-Criteria"></a>
<h2 >Acceptance Criteria<a href="#Acceptance-Criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: Salt states apply successfully on imageworker</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Investigate if one of the service defs is missing a "requires" or similar</li>
<li>Commands were re-run - consider persistent mitigations if this is causing other pipelines to fail</li>
<li>This seems to affect imagetester, openqaworker17.qa.suse.cz, openqaworker16.qa.suse.cz and openqaworker18.qa.suse.cz so far</li>
</ul>
openQA Infrastructure - action #135206 (Rejected): [tools] GitlabCI telegraf step on salt-states-...
https://progress.opensuse.org/issues/135206
2023-09-05T20:20:38Z
osukup
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/pipelines/791107" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/pipelines/791107</a></p>
<p>From log:</p>
<pre><code>openqaworker16.qa.suse.cz:
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "walter1.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: walter1.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "qa-jump.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: qa-jump.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [telegraf] Error running agent: input plugins recorded 2 errors
telegraf errors
worker30.oqa.prg2.suse.org:
telegraf is fine
openqaworker17.qa.suse.cz:
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "walter1.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: walter1.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "qa-jump.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: qa-jump.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [telegraf] Error running agent: input plugins recorded 2 errors
telegraf errors
openqaworker18.qa.suse.cz:
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "walter1.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: walter1.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "qa-jump.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: qa-jump.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [telegraf] Error running agent: input plugins recorded 2 errors
</code></pre>
<p>looks like some hosts have problem with dns:</p>
<p>openqaworker16.qa.suse.cz<br>
openqaworker17.qa.suse.cz<br>
openqaworker18.qa.suse.cz<br>
openqaworker14.qa.suse.cz<br>
qesapworker-prg4.qa.suse.cz<br>
qesapworker-prg5.qa.suse.cz<br>
qesapworker-prg7.qa.suse.cz<br>
qesapworker-prg6.qa.suse.cz<br>
openqa-monitor.qa.suse.de </p>
<p>AC1: pipeline pass </p>
openQA Infrastructure - action #134816 (Resolved): [tools] grafana dashboard for `OpenQA Jobs tes...
https://progress.opensuse.org/issues/134816
2023-08-30T08:46:38Z
osukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Dashboard <a href="https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1" class="external">https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1</a></p>
<p>missing data in graphs showing running tests from yesterday migration</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> No missing data for osd on Grafana</li>
<li><strong>AC2:</strong> Alerts related to affected panels are functioning</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>In salt states in monitoring/telegraf/telegraf-webui.conf instead of <code>grains['fqdn']</code> use something like grains.get('primary_webui_domain', grains.get('fqdn'))`. Alternatively we could use the "id" in place of the FQDN</li>
<li>If the above does not work then use an OR expression since we already have data with different domains in the db (or implement that to cover the data from 2023-08-29 to today)</li>
<li>Also check whether alerts need to be covered</li>
<li>As alternative can we change the FQDN of osd to again point to openqa.suse.de
<ul>
<li>Apparently a bad idea according to mcaj (not sure why)</li>
</ul></li>
<li>See existing MR: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/953" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/953</a></li>
</ul>
openQA Infrastructure - action #133154 (Resolved): osd-deployment failed because unreachable workers
https://progress.opensuse.org/issues/133154
2023-07-21T08:58:16Z
osukup
<p><a href="https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/736743" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/736743</a></p>
<p>from logs:</p>
<pre><code>sapworker1.qe.nue2.suse.org:
Minion did not return. [Not connected]
openqaworker1.qe.nue2.suse.org:
Minion did not return. [Not connected]
sapworker2.qe.nue2.suse.org:
Minion did not return. [Not connected]
sapworker3.qe.nue2.suse.org:
Minion did not return. [Not connected]
+++ kill %1
</code></pre>
<p>tried to ping/ssh hosts and none of these hosts is reachable<br>
also IPMI is without any response... + this hosts have corresponding host up alert in grapahana.</p>
openQA Infrastructure - action #133127 (Resolved): Frankencampus network broken + GitlabCi failed...
https://progress.opensuse.org/issues/133127
2023-07-20T17:34:02Z
osukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Job <a href="https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/735816" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/735816</a></p>
<p>In reality it passed but upload of artifacts failed ....</p>
<p>from logs:</p>
<pre><code>WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying... context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying... context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
FATAL: invalid argument
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1
1mERROR: Job failed: exit code 1
</code></pre>
openQA Infrastructure - action #133097 (Resolved): cron on OSD (date; fetch_openqa_bugs /etc/open...
https://progress.opensuse.org/issues/133097
2023-07-20T07:45:15Z
osukup
<pre><code>Exception occured while fetching boo#1115169
Traceback (most recent call last):
File "/usr/bin/fetch_openqa_bugs", line 62, in <module>
raise e
File "/usr/bin/fetch_openqa_bugs", line 55, in <module>
client.openqa_request("PUT", "bugs/%s" % bug_dbid, data=issue.get_dict())
File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 298, in openqa_request
return self.do_request(req, retries=retries, wait=wait, parse=True)
File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 238, in do_request
raise err
File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 213, in do_request
request.method, resp.url, resp.status_code
openqa_client.exceptions.RequestError: ('PUT', 'https://openqa.opensuse.org/api/v1/bugs/1021', 403)
</code></pre>
<p>it could be caused by broken IDP login service ? : <a href="https://suse.slack.com/archives/C029APBKLGK/p1689838423782549" class="external">https://suse.slack.com/archives/C029APBKLGK/p1689838423782549</a></p>
openQA Infrastructure - action #132926 (Workable): OSD cron -> (fetch_openqa_bugs)> /tmp/fetch_op...
https://progress.opensuse.org/issues/132926
2023-07-18T07:56:34Z
osukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>OSD cron -> (fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log failed:</p>
<p>from traceback:</p>
<pre><code>requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /repos/SUSE/ha-sap-terraform-deployments/issues/857 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f7439e43b38>, 'Connection to api.github.com timed out. (connect timeout=10)'))
</code></pre>
<p>fetch_openqa_bug failed when fetch issues from GitHub</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> It is understood why the error occurred</li>
<li><strong>AC2:</strong> The error does not persist</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Make sure you can login, see <a href="https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/id/openqa-service_qe_suse_de.sls#L11" class="external">https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/id/openqa-service_qe_suse_de.sls#L11</a> or ask dheidler/mkittler to do that for you</li>
<li>Assuming "host unavailable', check how long the scripts retried
<ul>
<li>Re-try more often?</li>
<li>Wait longer between attemps? </li>
</ul></li>
<li><a href="https://github.com/os-autoinst/openqa_bugfetcher" class="external">https://github.com/os-autoinst/openqa_bugfetcher</a></li>
</ul>
openQA Infrastructure - action #130132 (Resolved): jenkins.qa.suse.de seems down
https://progress.opensuse.org/issues/130132
2023-05-31T11:17:23Z
osukup
<p>Jenkins go stuck in emergency mode again ... @nsinger using Ctrl-D booted system.</p>
openQA Infrastructure - action #125228 (Rejected): Salt pillars deployment failed on storage.oqa....
https://progress.opensuse.org/issues/125228
2023-03-01T12:27:23Z
osukup
<pre><code> ID: /root/.ssh/id_ed25519.backup_osd
Function: file.managed
Result: False
Comment: Pillar id_ed25519.backup_osd does not exist
Started: 13:09:31.581660
Duration: 2.844 ms
Changes:
</code></pre>
openQA Infrastructure - action #125132 (Resolved): [alert] logrotate failed on OSD
https://progress.opensuse.org/issues/125132
2023-02-28T09:54:59Z
osukup
<p>from journalctl:</p>
<pre><code>Feb 15 00:00:07 openqa logrotate[12569]: logrotate does not support parallel execution on the same set of logfiles.
Feb 15 00:00:07 openqa logrotate[12569]: error: state file /var/lib/misc/logrotate.status is already locked
Feb 15 00:00:00 openqa systemd[1]: Starting Rotate log files...
</code></pre>
openQA Infrastructure - action #114908 (Resolved): [tools] https://stats.openqa-monitor.qa.suse.d...
https://progress.opensuse.org/issues/114908
2022-08-02T12:17:54Z
osukup
<p>grafana overview page isn't responding .</p>
openQA Infrastructure - action #109301 (Rejected): openqaworker14 + openqaworker15 sporadically g...
https://progress.opensuse.org/issues/109301
2022-03-31T09:07:53Z
osukup
<a name="OBSERVATION"></a>
<h2 >OBSERVATION<a href="#OBSERVATION" class="wiki-anchor">¶</a></h2>
<p>on reboot time to time this workers fails to correctly boot ending in emergency mode:</p>
<pre><code>bře 08 14:34:24 openqaworker14 kernel: Loading iSCSI transport class v2.0-870.
bře 08 14:34:24 openqaworker14 systemd[1]: Finished Create Volatile Files and Directories.
bře 08 14:34:24 openqaworker14 systemd[1]: Starting Security Auditing Service...
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: nvme0n1 259:0 0 3.5T 0 disk
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: ├─nvme0n1p1 259:1 0 512M 0 part
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: ├─nvme0n1p2 259:2 0 1T 0 part /
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: └─nvme0n1p3 259:3 0 2.5T 0 part
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: └─md127 9:127 0 2.5T 0 raid0
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1552]: Stopping current RAID "/dev/md/openqa"
bře 08 14:34:24 openqaworker14 systemd[1]: Finished Flush Journal to Persistent Storage.
bře 08 14:34:24 openqaworker14 kernel: i40iw_open: i40iw_open completed
bře 08 14:34:24 openqaworker14 systemd[1]: Created slice Slice /system/rdma-load-modules.
bře 08 14:34:24 openqaworker14 systemd[1]: Starting Load RDMA modules from /etc/rdma/modules/iwarp.conf...
bře 08 14:34:24 openqaworker14 systemd[1]: Starting Load RDMA modules from /etc/rdma/modules/rdma.conf...
bře 08 14:34:24 openqaworker14 kernel: ixgbe 0000:d8:00.1: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0
bře 08 14:34:24 openqaworker14 systemd[1]: Finished Load RDMA modules from /etc/rdma/modules/iwarp.conf.
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1559]: mdadm: stopped /dev/md/openqa
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1552]: Creating RAID0 "/dev/md/openqa" on: /dev/nvme0n1p3
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1574]: mdadm: /dev/nvme0n1p3 appears to be part of a raid array:
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1574]: level=raid0 devices=1 ctime=Mon Mar 7 10:20:52 2022
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1574]: mdadm: unexpected failure opening /dev/md127
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1552]: Unable to create RAID, mdadm returned with non-zero code
bře 08 14:34:24 openqaworker14 kernel: i40iw_open: i40iw_open completed
bře 08 14:34:24 openqaworker14 systemd[1]: openqa_nvme_format.service: Main process exited, code=exited, status=1/FAILURE
bře 08 14:34:24 openqaworker14 systemd[1]: openqa_nvme_format.service: Failed with result 'exit-code'.
bře 08 14:34:24 openqaworker14 systemd[1]: Failed to start Setup NVMe before mounting it.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for /var/lib/openqa.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for openQA Worker #1.
bře 08 14:34:24 openqaworker14 systemd[1]: openqa-worker-auto-restart@1.service: Job openqa-worker-auto-restart@1.service/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for var-lib-openqa-share.automount.
bře 08 14:34:24 openqaworker14 systemd[1]: var-lib-openqa-share.automount: Job var-lib-openqa-share.automount/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for openQA Worker #3.
bře 08 14:34:24 openqaworker14 systemd[1]: openqa-worker-auto-restart@3.service: Job openqa-worker-auto-restart@3.service/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for Prepare NVMe after mounting it.
bře 08 14:34:24 openqaworker14 systemd[1]: openqa_nvme_prepare.service: Job openqa_nvme_prepare.service/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for Local File Systems.
bře 08 14:34:24 openqaworker14 systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for openQA Worker #2.
bře 08 14:34:24 openqaworker14 systemd[1]: openqa-worker-auto-restart@2.service: Job openqa-worker-auto-restart@2.service/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for openQA Worker #4.
bře 08 14:34:24 openqaworker14 systemd[1]: openqa-worker-auto-restart@4.service: Job openqa-worker-auto-restart@4.service/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: var-lib-openqa.mount: Job var-lib-openqa.mount/start failed with result 'dependency'.
</code></pre>
<p>Cause of problem is probably difference in hw configuration of this workers. Our standard workers have 1x HDD with OS and 1x name SSD with /dev/md/openQA. This workers have only one nvme SSD.<br>
Configured as:</p>
<pre><code>nvme0n1
├─nvme0n1p1 vfat FAT32 9AED-277B 506M 1% /boot/efi
├─nvme0n1p2 btrfs 5a405f4e-bd0c-46cb-a5ee-a0e976968be1 1016,5G 1% /
└─nvme0n1p3 linux_raid_member 1.2 openqaworker14:openqa 03972fdb-874d-cbec-4cb8-bca5412d90a2
└─md127 ext2 1.0 4c30279b-d757-4a97-b636-539b18bc9e22 2,3T 0% /var/lib/openqa
</code></pre>
openQA Infrastructure - action #106594 (Resolved): [tools] openqaworker-arm-3 periodically fails ...
https://progress.opensuse.org/issues/106594
2022-02-10T11:36:16Z
osukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>from journalctl -xe -u os-autoinst-openvswitch</p>
<pre><code>úno 09 21:56:21 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 300s left ...
úno 09 21:56:22 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 299s left ...
....
úno 09 22:01:20 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 3s left ...
úno 09 22:01:21 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 2s left ...
úno 09 22:01:22 openqaworker-arm-3 os-autoinst-openvswitch[2924]: can't parse bridge local port IP at /usr/lib/os-autoinst/os-autoinst-openvswitch line 43.
úno 09 22:01:22 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 1s left ...
úno 09 22:01:22 openqaworker-arm-3 systemd[1]: os-autoinst-openvswitch.service: Main process exited, code=exited, status=255/EXCEPTION
</code></pre>
<p>Default timeout is 60 seconds, on openqaworker-arm-3 is now 5 minutes, but still isn't enough after system reboot</p>
<a name="Rollback-steps"></a>
<h2 >Rollback steps<a href="#Rollback-steps" class="wiki-anchor">¶</a></h2>
<ul>
<li>Unpause alert "Failed systemd services alert (except openqa.suse.de)"systemd services (</li>
</ul>
openQA Infrastructure - action #106365 (Resolved): Improve security for OSD worker credentials br...
https://progress.opensuse.org/issues/106365
2022-02-09T10:25:15Z
osukup
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p><a href="https://progress.opensuse.org/issues/105405" class="external">https://progress.opensuse.org/issues/105405</a> .. changed visibility of salt-pillars-openqa broke <code>deploy</code> stage of CI</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: Working salt-states+salt-pillars pipelines in gitlab</li>
<li><strong>AC2:</strong> salt-pillars repo stays non-public</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Try out deploy tokens on OSD to fetch the git repo</li>
</ul>
openQA Infrastructure - action #106035 (Rejected): [qe-tools] dehydrated service fails on osd
https://progress.opensuse.org/issues/106035
2022-02-07T08:09:49Z
osukup
<p>OSD has systemd in degraded state because system service dehydrated ends in failed state ..</p>
<pre><code>dehydrated.service - Certificate Update Runner for Dehydrated
Loaded: loaded (/usr/lib/systemd/system/dehydrated.service; static)
Active: failed (Result: exit-code) since Mon 2022-02-07 09:03:35 CET; 4min 58s ago
TriggeredBy: ● dehydrated.timer
Process: 26947 ExecStart=/usr/bin/dehydrated --cron (code=exited, status=1/FAILURE)
Main PID: 26947 (code=exited, status=1/FAILURE)
Feb 07 09:03:34 openqa systemd[1]: Starting Certificate Update Runner for Dehydrated...
Feb 07 09:03:34 openqa dehydrated[26947]: # INFO: Using main config file /etc/dehydrated/config
Feb 07 09:03:34 openqa dehydrated[26947]: # INFO: Using additional config file /etc/dehydrated/config.d/suse-ca.sh
Feb 07 09:03:34 openqa dehydrated[26947]: # INFO: Running /usr/bin/dehydrated as dehydrated/dehydrated
Feb 07 09:03:34 openqa sudo[26947]: root : PWD=/ ; USER=dehydrated ; GROUP=dehydrated ; COMMAND=/usr/bin/dehydrated --cron
Feb 07 09:03:35 openqa dehydrated[27267]: {}
Feb 07 09:03:35 openqa systemd[1]: dehydrated.service: Main process exited, code=exited, status=1/FAILURE
Feb 07 09:03:35 openqa systemd[1]: dehydrated.service: Failed with result 'exit-code'.
Feb 07 09:03:35 openqa systemd[1]: Failed to start Certificate Update Runner for Dehydrated.
</code></pre>