https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842020-09-01T12:03:43ZopenSUSE Project Management ToolopenQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3216912020-09-01T12:03:43Zokurzokurz@suse.com
<ul><li><strong>Tags</strong> set to <i>obs_rsync, alert, minion</i></li><li><strong>Project</strong> changed from <i>openQA Project</i> to <i>openQA Infrastructure</i></li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/321691/diff?detail_id=318850">diff</a>)</li><li><strong>Status</strong> changed from <i>New</i> to <i>Workable</i></li><li><strong>Target version</strong> set to <i>Ready</i></li></ul><p>mkittler wrote: "Maybe there's also an actual bug we need to fix." but so far it looks like a problem regarding actual project directories not existing on IBS so I think this fits better in "openQA Infrastructure" until we know if there <em>is</em> an actual issue to fix in our code</p>
openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3217242020-09-01T12:21:53Zmkittlermarius.kittler@suse.com
<ul><li><strong>Subject</strong> changed from <i>obs_rsync_run Minion tasks fail frequently</i> to <i>obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequently</i></li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/321724/diff?detail_id=318910">diff</a>)</li></ul> openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3226932020-09-04T11:35:07Zokurzokurz@suse.com
<ul></ul><p>deleted all but the oldest failed minion jobs from <a href="https://openqa.suse.de/minion/jobs?state=failed" class="external">https://openqa.suse.de/minion/jobs?state=failed</a></p>
openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3227022020-09-04T11:40:09Zokurzokurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-6 priority-high2 closed" href="/issues/70975">action #70975</a>: [alert] too many failed minion jobs</i> added</li></ul> openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3483732020-11-07T12:26:45Zokurzokurz@suse.com
<ul><li><strong>Priority</strong> changed from <i>Normal</i> to <i>High</i></li></ul><p>I have to repeatedly delete the according minion jobs. Happened again as reported in <a class="issue tracker-4 status-3 priority-4 priority-default closed parent" title="action: [osd][retrospective] multiple unattended alerts, unattended gitlab CI pipeline fails, all osd aar... (Resolved)" href="https://progress.opensuse.org/issues/77089">#77089</a>, increasing prio. At the very least if we do not have a better idea we need to adjust monitoring to not alert us about this.</p>
openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3520022020-11-19T08:07:15ZXiaojing_liuxliu1@suse.com
<ul></ul><p>There are some <code>obs_rsync_run</code> minion jobs failed. Some results show that:<br>
<a href="https://openqa.suse.de/minion/jobs?id=933724" class="external">https://openqa.suse.de/minion/jobs?id=933724</a></p>
<pre><code> "result" => {
"code" => 512,
"message" => "SUSE:SLE-15-SP3:GA:TEST/base/ exit code: 1 (1 failures total so far)\nSUSE:SLE-15-SP3:GA:TEST/jeos/ exit code: 1 (2 failures total so far)"
},
</code></pre>
<p>And <a href="https://openqa.suse.de/minion/jobs?id=933542" class="external">https://openqa.suse.de/minion/jobs?id=933542</a> shows:</p>
<pre><code> "result" => {
"code" => 256,
"message" => "No changes found since last run, skipping {SUSE:SLE-15-SP3:GA:TEST/base/}\nSUSE:SLE-15-SP3:GA:TEST/jeos/ exit code: 1 (1 failures total so far)\nNo changes found since last run, skipping {SUSE:SLE-15-SP3:GA:TEST/migration/}"
},
</code></pre>
<p>obs_rsync_update_builds_text failed as this:<br>
<a href="https://openqa.suse.de/minion/jobs?id=932921" class="external">https://openqa.suse.de/minion/jobs?id=932921</a></p>
<pre><code> "result" => {
"code" => 256,
"message" => ""
},
</code></pre> openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3585522020-12-16T11:07:10Zokurzokurz@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/358552/diff?detail_id=355662">diff</a>)</li></ul> openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3588922020-12-18T10:15:21Zokurzokurz@suse.com
<ul></ul><p>grafana does not know which jobs are failing, only how many. The data comes from the route in openQA /admin/influxdb/minion<br>
e.g.<br>
<a href="https://openqa.suse.de/admin/influxdb/minion" class="external">https://openqa.suse.de/admin/influxdb/minion</a></p>
<p>with example:</p>
<pre><code>openqa_minion_jobs,url=https://openqa.suse.de active=0i,delayed=0i,failed=5i,inactive=0i openqa_minion_workers,url=https://openqa.suse.de active=0i,inactive=1i
</code></pre>
<p>The best thing I could think of right is to change openQA code to allow to filter which minion jobs are included in the count, for example based on a config variable with a regex string.</p>
<p>In <a href="https://github.com/os-autoinst/openQA/blob/e8470f0ab1b197c8dac5914645350d1412b62a2d/lib/OpenQA/WebAPI/Controller/Admin/Influxdb.pm#L107" class="external">https://github.com/os-autoinst/openQA/blob/e8470f0ab1b197c8dac5914645350d1412b62a2d/lib/OpenQA/WebAPI/Controller/Admin/Influxdb.pm#L107</a></p>
<p>one can filter out jobs based on this regex string, for example</p>
<pre><code>$self->app->config->{global}->{influxdb_minion_job_blocklist}
</code></pre>
<p>and then the only thing to do for OSD would be to add this parameter in our config, for example after the line<br>
<a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/server.sls#L37" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/server.sls#L37</a><br>
add</p>
<pre><code>influxdb_minion_job_blocklist: .*obs_rsync.*
</code></pre> openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3589062020-12-18T10:34:08ZXiaojing_liuxliu1@suse.com
<ul><li><strong>Assignee</strong> set to <i>Xiaojing_liu</i></li></ul> openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3594002020-12-22T11:36:20Zmkittlermarius.kittler@suse.com
<ul></ul><p>Adding generic filtering here seems reasonable. However, in practice it is not completely straight forward to implement. So far we're using the Minion framework to provide us with the statistics. Annoyingly, this <a href="https://metacpan.org/pod/Minion#stats" class="external">statistics function</a> is quite limited so we can not easily filtering here.</p>
<p>I see multiple ways to workaround the limitation:</p>
<ol>
<li>We could use the <a href="https://metacpan.org/pod/Minion#jobs" class="external">jobs function</a> instead. However, from the documentation it isn't clear how negative conditions would work and maybe it is not even possible. One way to workaround this would be to query for the tasks we want to filter out and subtract the number of returned jobs from the statistics we've got so far. Within this ticket we only care about failed jobs so one such query for the failed jobs would be sufficient. If the filtering should work regardless of the job state we needed to invoke the jobs function 4 times (one time for each state).</li>
<li>We could also try to access the database tables Minion uses under the hood directly. Using SQL directly we would likely be able to apply the filtering more efficiently and it would likely still be a single query. In order to use the database directly, one would use <code>$app->minion->backend->pg</code> which will return a <a href="https://metacpan.org/pod/Mojo::Pg" class="external">Mojo::Pg</a> object.</li>
</ol>
openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3597762020-12-25T09:05:34ZXiaojing_liuxliu1@suse.com
<ul></ul><p><a href="https://github.com/os-autoinst/openQA/pull/3654" class="external">https://github.com/os-autoinst/openQA/pull/3654</a></p>
openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3602642020-12-30T10:36:57Zlivdywanliv.dywan@suse.com
<ul><li><strong>Status</strong> changed from <i>Workable</i> to <i>In Progress</i></li></ul><p>I suspect this should be <em>In Progress</em> since there's a PR proposing the fix</p>
openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3774172021-01-13T06:46:31ZXiaojing_liuxliu1@suse.com
<ul></ul><p>PR in openQA has been merged.<br>
Here is the related MR for OSD: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/427" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/427</a></p>
openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3776902021-01-13T18:00:03Zlivdywanliv.dywan@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul> openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=3783252021-01-18T04:05:45ZXiaojing_liuxliu1@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li><li><strong>Estimated time</strong> set to <i>16.00 h</i></li></ul><p>on OSD <a href="https://openqa.suse.de/minion/jobs?state=failed" class="external">https://openqa.suse.de/minion/jobs?state=failed</a>, it shows there are 10 failed jobs, includes 4 'obs_rsync_run' and 1 'obs_rsync_update_builds_text'. Then querying <a href="https://openqa.suse.de/admin/influxdb/minion" class="external">https://openqa.suse.de/admin/influxdb/minion</a>, it shows <code>openqa_minion_jobs,url=https://openqa.suse.de active=4i,delayed=0i,failed=5i,inactive=36i openqa_minion_workers,url=https://openqa.suse.de active=1i,inactive=0i</code>, only 5 fained jobs, so I considered this ticket as resolved.</p>
openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=4452362021-09-13T09:01:13Zmkittlermarius.kittler@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-6 status-1 priority-4 priority-default child parent" href="/issues/96263">coordination #96263</a>: [epic] Exclude certain Minion tasks from "Too many Minion job failures alert" alert</i> added</li></ul> openQA Infrastructure - action #70768: obs_rsync_run and obs_rsync_update_builds_text Minion tasks fail frequentlyhttps://progress.opensuse.org/issues/70768?journal_id=5315812022-06-22T11:01:41Zlivdywanliv.dywan@suse.com
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-4 status-12 priority-3 priority-lowest" href="/issues/112871">action #112871</a>: obs_rsync_run Minion tasks fail with no error message size:M</i> added</li></ul>