openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842024-03-26T10:10:15ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #158041 (Resolved): grenache needs upgrade to 15.5https://progress.opensuse.org/issues/1580412024-03-26T10:10:15Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>grenache-1 was offline for many months so it was not online when we upgraded our infrastructure to Leap 15.5 so grenache is still on 15.4 so we should upgrade that as well.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> grenache-1 runs a stable Leap 15.5</li>
<li><strong>AC2:</strong> osd-deployment and salt states deployment and alerts are good regarding grenache-1</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Conduct the distribution upgrade according to <a href="https://progress.opensuse.org/projects/openqav3/wiki/#Distribution-upgrades" class="external">https://progress.opensuse.org/projects/openqav3/wiki/#Distribution-upgrades</a></li>
<li>Apply according necessary package locks</li>
<li>Remove obsolete package locks</li>
<li>Ensure system is fully upgraded</li>
<li>Try multiple reboots</li>
<li>Ensure that there are no related alerts</li>
</ul>
<a name="Rollback-actions"></a>
<h2 >Rollback actions<a href="#Rollback-actions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Remove silence "alertname=Failed systemd services alert (except openqa.suse.de)" from <a href="https://monitor.qa.suse.de/alerting/silences" class="external">https://monitor.qa.suse.de/alerting/silences</a></li>
<li>Remove silence "alertname=grenache-1: host up alert" from <a href="https://monitor.qa.suse.de/alerting/silences" class="external">https://monitor.qa.suse.de/alerting/silences</a></li>
</ul>
openQA Infrastructure - action #158026 (Resolved): osd-deployment exceeds 2h maximum runtime duri...https://progress.opensuse.org/issues/1580262024-03-26T08:23:01Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/osd-deployment/-/jobs/2426666" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/jobs/2426666</a></p>
<pre><code>+ retry -r 3 -- zypper --no-refresh -n dup --replacefiles
Loading repository data...
..Reading installed packages...
.Warning: You are about to do a distribution upgrade with all enabled repositories. Make sure these repositories are compatible before you continue. See 'man zypper' for more information about this command.
Computing distribution upgrade...
.
The following 7 packages are going to be upgraded:
openQA openQA-client openQA-common openQA-doc openQA-local-db system-user-velociraptor velociraptor
7 packages to upgrade.
Overall download size: 0 B. Already cached: 21.2 MiB. After the operation, additional 4.7 KiB will be used.
[...]
Checking for file conflicts: [..done]
(1/4) Installing: openQA-common-4.6.1711372491.18a87328-lp155.6447.1.ppc64le [...done]
(2/4) Installing: os-autoinst-distri-opensuse-deps-1.1711423505.d81d6831-lp155.14058.1.noarch [...done]
(3/4) Installing: openQA-client-4.6.1711372491.18a87328-lp155.6447.1.ppc64le [...done]
(4/4) Installing: openQA-worker-4.6.1711372491.18a87328-lp155.6447.1.ppc64le [....done]....................................................................................................................................................
[...]
............................................................................
ERROR: Job failed: execution took longer than 2h0m0s seconds
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li></li>
</ul>
openQA Infrastructure - action #158023 (Resolved): salt-states-openqa pipeline invalid arguments ...https://progress.opensuse.org/issues/1580232024-03-26T08:18:33Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2425817" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2425817</a> and also <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2422794" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2422794</a></p>
<pre><code>monitor.qe.nue2.suse.org:
Passed invalid arguments to state.highstate: expected str, bytes or os.PathLike object, not list
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li></li>
</ul>
openQA Infrastructure - action #158020 (Resolved): salt-states-openqa pipeline times outhttps://progress.opensuse.org/issues/1580202024-03-26T08:13:58Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2425611" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2425611</a></p>
<pre><code> ID: SUSE:SLE-15-SP6:Update:BCI
Function: cmd.run
Name: su geekotest -c 'mkdir -p SUSE:SLE-15-SP6:Update:BCI && python3 script/sctimeout: sending signal TERM to command 'ssh'
</code></pre>
<p><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2425891" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2425891</a></p>
<pre><code> ID: stop_and_disable_all_not_configured_workers
Function: cmd.run
Name: services=$(systemctl list-units --all 'openqa-worker-auto-restart@*.service' | sed -e '/.*openqa-worker-auto-restart@.*\.service.*/!d' -e 's|.*openqa-worker-auto-restart@\(.*\)\.service.*|\1|' | awk '{ if($0 > 16) print "openqa-worker-auto-restart@" $0 ".service openqa-reload-worker-auto-restart@" $0 ".path" }' | tr '\n' ' '); [ -z "$services" ] || systemctl disable --ntimeout: sending signal TERM to command 'ssh'
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li></li>
</ul>
QA - action #157858 (Resolved): Repeated reminder comments about SLO's for openqatests size:Shttps://progress.opensuse.org/issues/1578582024-03-25T08:37:52Zlivdywanliv.dywan@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p><a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: No ticket reminder comments about SLO's for openqatests size:M (Resolved)" href="https://progress.opensuse.org/issues/157522">#157522</a> addressed a bug that prevented reminder comments from being sent. Unfortunately comments are added even if a comment was already present. This is especially visible in <em>immediate</em> tickets, for example #153115, which get daily reminders - as per <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Automated alerts and reminders about SLO's for openqatests (only one reminder) size:M (Resolved)" href="https://progress.opensuse.org/issues/116545">#116545</a> only one comment is supposed to be added. Maybe this is a regression or the check is not comprehensive enough.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Reminders are only added once</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>We already have the code that should handle that: Review the implementation from <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Automated alerts and reminders about SLO's for openqatests (only one reminder) size:M (Resolved)" href="https://progress.opensuse.org/issues/116545">#116545</a> for gaps in the current logic in <a href="https://github.com/openSUSE/backlogger/blob/main/backlogger.py" class="external">https://github.com/openSUSE/backlogger/blob/main/backlogger.py</a></li>
<li>Investigate if something changed with current comments, maybe the Redmine upgrade made a difference here (complete guess)?</li>
<li>Maybe the regex needs to be adapted and/or better covered with unit testing</li>
</ul>
QA - action #157819 (Resolved): Can't login to walter1 and walter2 offlinehttps://progress.opensuse.org/issues/1578192024-03-24T18:55:53Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<pre><code>$ ssh root@walter1.qe.nue2.suse.org
Password:
</code></pre><pre><code>$ ping walter2.qe.nue2.suse.org
PING walter2.qe.nue2.suse.org(2a07:de40:a102:5:10:168:192:2 (2a07:de40:a102:5:10:168:192:2)) 56 data bytes
From 2a07:de40:a100:1:ffff:ffff:ffff:ffff (2a07:de40:a100:1:ffff:ffff:ffff:ffff) icmp_seq=3 Destination unreachable: Address unreachable
$ ping -4 root@walter2.qe.nue2.suse.org
ping: root@walter2.qe.nue2.suse.org: Name or service not known
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> SSH login for QE tools members works for both walter1+2</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Try to reproduce yourself</li>
<li>Create SUSE IT ticket <a href="https://progress.opensuse.org/projects/qa/wiki/Tools#SUSE-IT-ticket-handling" class="external">https://progress.opensuse.org/projects/qa/wiki/Tools#SUSE-IT-ticket-handling</a></li>
</ul>
QA - action #157522 (Resolved): No ticket reminder comments about SLO's for openqatests size:Mhttps://progress.opensuse.org/issues/1575222024-03-19T11:20:59Zlivdywanliv.dywan@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p><a href="https://opensuse.github.io/openqa-tests-backlog/" class="external">https://opensuse.github.io/openqa-tests-backlog/</a> contains various queries which when flagged as red should also result in comments on relevant tickets. No comments <a href="https://progress.opensuse.org/activity?from=2023-03-01&user_id=40859" class="external">have been observed</a> in several months</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Neglected tickets receive a reminder comment</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Investigate if reminder comments are correctly enabled, e.g. by checking <a href="https://github.com/openSUSE/openqa-tests-backlog/blob/main/.github/workflows/backlogger.yml#L22" class="external">https://github.com/openSUSE/openqa-tests-backlog/blob/main/.github/workflows/backlogger.yml#L22</a>
<ul>
<li><code>--reminder-comment-on-issues</code> checks that there is no existing comment - maybe there is a flaw in the logic?</li>
<li>Comments are created via API calls - <a href="https://github.com/openSUSE/openqa-tests-backlog/actions/runs/8341719003/job/22828346106" class="external">pipelines on GitHub</a> don't show any errors, though?</li>
</ul></li>
<li>Consider adding more verbose logging to see if things work correctly in production</li>
</ul>
QA - action #157237 (Resolved): dependabot PRs for the dashboard are not getting approved and mer...https://progress.opensuse.org/issues/1572372024-03-14T11:05:52Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>See also <a href="https://suse.slack.com/archives/C02AJ1E568M/p1710360358208519" class="external">the conversation in Slack</a>.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> dependabot PR's are merged without any human interaction</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Investigate if the <a href="https://github.com/openSUSE/qem-dashboard/blob/main/.mergify.yml#L27" class="external">mergify config</a> is effective and works as intended (hypothesis being that it doesn't)</li>
</ul>
QA - action #156775 (Resolved): cpanspec should adopt new %patch syntax size:Shttps://progress.opensuse.org/issues/1567752024-03-06T15:19:43Ztinitatina.mueller+trick-redmine@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p><a href="https://lists.opensuse.org/archives/list/factory@lists.opensuse.org/thread/YNXIWWHY7E2ZDMLKL44K7RR4Y2LCDV45/" class="external">https://lists.opensuse.org/archives/list/factory@lists.opensuse.org/thread/YNXIWWHY7E2ZDMLKL44K7RR4Y2LCDV45/</a></p>
<p><code>%patchN</code> is deprecated and should be replaced with <code>%patch -PN</code>.<br>
In cpanspec we are using <code>%autosetup</code> if possible, but otherwise we still use <code>%patchN</code>.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Perl packages in Factory (especially openQA ones) continue to build after the RPM change</li>
</ul>
<a name="Acceptance-tests"></a>
<h2 >Acceptance tests<a href="#Acceptance-tests" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AT1-1:</strong> Given RPM 4.20 is available all packages in devel:languages:perl still build without an error related to (at least) <code>%patchN</code></li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Check all packages in devel:languages:perl to ensure that they don't use <code>%patchN</code></li>
<li>Look into
<a href="https://github.com/openSUSE/cpanspec/blob/master/cpanspec#L1356-L1358" class="external">https://github.com/openSUSE/cpanspec/blob/master/cpanspec#L1356-L1358</a>
and adapt/extend to use explicit <code>%patch -PN</code> instead</li>
<li>Ensure all packages devel:languages:perl use that, e.g. with individual batch creation of submit requests. Possibly most are done already or don't rely on <code>%patch</code></li>
<li>Ensure that all submit requests are accepted and also forwarded into openSUSE:Factory accordingly</li>
<li>Crosscheck with scripting that no packages are left in d:l:p (and possibly openSUSE:Factory) using <code>%patchN</code></li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>Ignore devel:languages:perl:CPAN-[A-Z] as they would be only updated on new releases on CPAN but we don't care because those are not accepted (yet) into devel:languages:perl</li>
</ul>
openQA Project - action #156754 (Resolved): "DBIx::Class::Row::update(): Can't update OpenQA::Sch...https://progress.opensuse.org/issues/1567542024-03-06T11:43:54Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>As seen in <a class="issue tracker-4 status-3 priority-4 priority-default closed" title="action: [alert] "HTTP Response" alert fired shortly on 2024-02-12 and 2024-03-04 size:M (Resolved)" href="https://progress.opensuse.org/issues/155326">#155326</a></p>
<p>OSD journal logs show some DBIx error:</p>
<pre><code>Feb 12 00:38:12 openqa openqa[11635]: [error] [ztQJ1_pAsMiS] DBIx::Class::Row::update(): Can't update OpenQA::Schema::Result::JobLocks=HASH(0x55b77ea45e28): row not found at /usr/share/openqa/script/../lib/OpenQA/Resource/Locks.pm line 139
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Specifically look into the only error in the log excerpt "[ztQJ1_pAsMiS] DBIx::Class::Row::update(): Can't update OpenQA::Schema::Result::JobLocks=HASH(0x55b77ea45e28): row not found at /usr/share/openqa/script/../lib/OpenQA/Resource/Locks.pm line 139"</li>
</ul>
openQA Infrastructure - action #156514 (Resolved): Cron <root@openqa-service> (date; fetch_openqa...https://progress.opensuse.org/issues/1565142024-03-04T07:37:00Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Cron <a href="mailto:root@openqa-service">root@openqa-service</a> (date; fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log at Mon, 4 Mar 2024 00:20:40 +0000 (UTC):</p>
<pre><code>Exception occured while fetching bsc#1158056
Traceback (most recent call last):
File "/usr/bin/fetch_openqa_bugs", line 62, in <module>
raise e
File "/usr/bin/fetch_openqa_bugs", line 48, in <module>
issue = issue_fetcher.get_issue(bugid)
File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/__init__.py", line 88, in get_issue
return self.prefix_table[prefix](self.conf, bugid)
File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/__init__.py", line 24, in __init__
self.fetch(conf)
File "/usr/lib/python3.6/site-packages/openqa_bugfetcher/issues/bugzilla_issue.py", line 28, in fetch
data = req.json()
File "/usr/lib/python3.6/site-packages/requests/models.py", line 898, in json
return complexjson.loads(self.text, **kwargs)
File "/usr/lib64/python3.6/site-packages/simplejson/__init__.py", line 525, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python3.6/site-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/usr/lib64/python3.6/site-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Investigate what is causing the JSON error - likely an error string was returned instead of JSON</li>
<li>Handle the exception in bugzilla_issue.py and print a meaningful error</li>
<li>Investigate why 3 seemingly identical errors are sent in separate emails around the same time</li>
</ul>
openQA Infrastructure - action #156481 (Resolved): cron -> (fetch_openqa_bugs)> /tmp/fetch_openqa...https://progress.opensuse.org/issues/1564812024-03-01T17:06:37Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Cron <a href="mailto:root@openqa-service">root@openqa-service</a> (date; fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log:</p>
<pre><code>openqa_client.exceptions.ConnectionError: HTTPSConnectionPool(host='openqa.suse.de', port=443): Max retries exceeded with url: /api/v1/bugs?refreshable=1&delta=86400 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f91f6877080>: Failed to establish a new connection: [Errno 113] No route to host',))
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li></li>
</ul>
openQA Infrastructure - action #156460 (Resolved): Potential FS corruption on osd due to 2 VMs ac...https://progress.opensuse.org/issues/1564602024-03-01T13:51:21Zjbaier_czjbaier@suse.cz
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Users noticed slowness of osd in <a href="https://suse.slack.com/archives/C02CANHLANP/p1709297645213609" class="external">https://suse.slack.com/archives/C02CANHLANP/p1709297645213609</a>; openqa-monitor.qa.suse.de also show problem with availability. </p>
<p>Logs on osd shows potential problem with FS</p>
<pre><code>Mar 01 14:29:14 openqa salt-master[25856]: [ERROR ] Unable to remove /var/cache/salt/master/jobs/26/4669e8a06e5502583ba67b138a9c30b97efbfff1f8af0b92f937ad8b70035d: [Errno 117] Structure needs cleaning: '.min>
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #467326: comm salt-master: deleted inode referenced: 467329
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #467326: comm salt-master: deleted inode referenced: 467329
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #428053: comm salt-master: deleted inode referenced: 428056
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #428053: comm salt-master: deleted inode referenced: 428056
Mar 01 14:29:14 openqa salt-master[25856]: [ERROR ] Unable to remove /var/cache/salt/master/jobs/08/96cf9ed4cc58d8c044fe257e5e977516e49383070eea5680e3f8d53fc31712: [Errno 117] Structure needs cleaning: '.min>
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #358221: comm salt-master: deleted inode referenced: 358225
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #358221: comm salt-master: deleted inode referenced: 358225
Mar 01 14:29:14 openqa salt-master[25856]: [ERROR ] Unable to remove /var/cache/salt/master/jobs/eb/8843afe01ce61b501612957cc3df3a3d8371a9c2694ebd800b47d514066853: [Errno 117] Structure needs cleaning: '.min>
Mar 01 14:29:14 openqa openqa-websockets-daemon[15372]: [debug] [pid:15372] Updating seen of worker 1951 from worker_status (free)
</code></pre>
<p>There might be a situation where two VMs were running with the same backing device according to <a href="https://suse.slack.com/archives/C02CANHLANP/p1709299401351479?thread_ts=1709297645.213609&cid=C02CANHLANP" class="external">https://suse.slack.com/archives/C02CANHLANP/p1709299401351479?thread_ts=1709297645.213609&cid=C02CANHLANP</a></p>
<p>The server was rebooted to get it to consistent state, but unfortunately due the FS corruption osd is currently in the maintenance mode and needs recovery.</p>
openQA Infrastructure - action #156331 (Resolved): [gitlab] New pipeline schedules cannot be crea...https://progress.opensuse.org/issues/1563312024-02-29T12:50:10Zjbaier_czjbaier@suse.cz
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>New pipeline schedules can’t be created.</p>
<a name="Steps-to-reproduce"></a>
<h2 >Steps to reproduce<a href="#Steps-to-reproduce" class="wiki-anchor">¶</a></h2>
<ol>
<li>Visit pipeline schedules of any project with CI/CD enabled.</li>
<li>Observe message: You have exceeded the maximum number of pipeline schedules for your plan. To create a new schedule, either increase your plan limit or delete an exisiting schedule.</li>
<li>See disabled button “New schedule”.</li>
</ol>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>New pipeline schedules can be created.</p>
<a name="Impact"></a>
<h2 >Impact<a href="#Impact" class="wiki-anchor">¶</a></h2>
<p>Without the ability to create more schedules, the automation process might be hindered.</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>This issue can be easily solved by following the steps mentioned in <a href="https://gitlab.suse.de/help/administration/instance_limits#number-of-pipeline-schedules" class="external">https://gitlab.suse.de/help/administration/instance_limits#number-of-pipeline-schedules</a></p>
openQA Infrastructure - action #156301 (Resolved): [bot-ng] Pipeline failed / KeyError: 'priority...https://progress.opensuse.org/issues/1563012024-02-29T08:54:46Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2327183" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2327183</a></p>
<pre><code>++ retry -r 30 -e -- ./qem-bot/bot-ng.py -c /etc/openqabot --token [MASKED] incidents-run
[...]
KeyError: 'priority'
Retrying up to 19 more times after sleeping 6144s …
2024-02-29 06:28:46 INFO Bot schedule starts now
Traceback (most recent call last):
File "./qem-bot/bot-ng.py", line 7, in <module>
main()
File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/main.py", line 32, in main
sys.exit(cfg.func(cfg))
File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/args.py", line 24, in do_incident_schedule
bot = OpenQABot(args)
File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/openqabot.py", line 24, in __init__
self.incidents = get_incidents(self.token)
File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/loader/qem.py", line 41, in get_incidents
xs.append(Incident(i))
File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/types/incident.py", line 23, in __init__
self.priority = incident["priority"]
KeyError: 'priority'
Retrying up to 18 more times after sleeping 12288s …
ERROR: Job failed: execution took longer than 4h0m0s seconds
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>DONE</strong> Restart pipelines</li>
<li>Investigate if there is new data the bot is not handling correctly</li>
<li>Don't provoke timeouts with retrying on reproducible errors</li>
<li>Look into unit test coverage</li>
</ul>