https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842023-03-17T13:58:21ZopenSUSE Project Management ToolQA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6146722023-03-17T13:58:21Zjbaier_czjbaier@suse.cz
<ul></ul><p>Also see the related slack conversation: <a href="https://app.slack.com/client/T02863RC2AC/C02CANHLANP/thread/C02CANHLANP-1678977155.383529" class="external">https://app.slack.com/client/T02863RC2AC/C02CANHLANP/thread/C02CANHLANP-1678977155.383529</a></p>
<p>There is one more use-case where a deleted job might happen:</p>
<blockquote>
<p>yes, I've deleted the two failed ltp_aio_stress jobs because they've been merged into a single runfile in the new LTP release and failed during env setup. I've cloned the correct ltp_aio_stress job manually instead.</p>
</blockquote>
<p>So the non-existent jobs just might got deleted by the users. Maybe we want a simple way to delete them in the dashboard or we might document that deleting jobs is not a good idea and should be replaced by force resulting to soft-fail and/or creating ignore for auto-approval comment (feature from #95479)</p>
QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6146752023-03-17T13:59:19Zkraihsebastian.riedel@suse.com
<ul></ul><p>Lets take a look at what's in the dashboard database:</p>
<pre><code>dashboard_db=# select * from incidents where number = 28181;
id | number | rr_number | project | approved | emu | active | packages | review | review_qam
---------+--------+-----------+------------------------+----------+-----+--------+----------------------------------------------------------------------------------------------------------------------------------------------+--------+------------
7765521 | 28181 | 292112 | SUSE:Maintenance:28181 | f | f | t | {kernel-debug,kernel-default,kernel-docs,kernel-ec2,kernel-obs-build,kernel-obs-qa,kernel-source,kernel-syms,kernel-vanilla,kernel-zfcpdump} | t | t
(1 row)
</code></pre><pre><code>dashboard_db=# select id, flavor, version, settings::json->'BUILD' as build from incident_openqa_settings where incident = 7765521 order by id desc;
id | flavor | version | build
---------+--------------------------------------+---------+---------------------
1986102 | Server-DVD-TERADATA-Incidents-Kernel | 12-SP3 | ":28181:kernel-ec2"
1986101 | Server-DVD-Incidents-TERADATA | 12-SP3 | ":28181:kernel-ec2"
(2 rows)
</code></pre><pre><code>dashboard_db=# SELECT oj.id, job_id, status, build, updated FROM incident_openqa_settings ios JOIN openqa_jobs oj ON oj.incident_settings=ios.id WHERE incident=7765521 ORDER BY updated;
id | job_id | status | build | updated
-----------+----------+--------+-------------------+-------------------------------
404426218 | 10689482 | failed | :28181:kernel-ec2 | 2023-03-14 11:23:35.780127+01
404426219 | 10689483 | failed | :28181:kernel-ec2 | 2023-03-14 11:23:35.790424+01
404426011 | 10689476 | passed | :28181:kernel-ec2 | 2023-03-17 14:45:51.9446+01
404426213 | 10689477 | passed | :28181:kernel-ec2 | 2023-03-17 14:45:59.898062+01
404426214 | 10689478 | passed | :28181:kernel-ec2 | 2023-03-17 14:45:59.906756+01
404426215 | 10689479 | passed | :28181:kernel-ec2 | 2023-03-17 14:45:59.91699+01
404426216 | 10689480 | passed | :28181:kernel-ec2 | 2023-03-17 14:45:59.929089+01
404426217 | 10689481 | passed | :28181:kernel-ec2 | 2023-03-17 14:45:59.937704+01
404426220 | 10689484 | passed | :28181:kernel-ec2 | 2023-03-17 14:45:59.946364+01
404426221 | 10689485 | passed | :28181:kernel-ec2 | 2023-03-17 14:45:59.95557+01
404426222 | 10689486 | passed | :28181:kernel-ec2 | 2023-03-17 14:45:59.964067+01
404426223 | 10689487 | passed | :28181:kernel-ec2 | 2023-03-17 14:45:59.972838+01
404426224 | 10689488 | passed | :28181:kernel-ec2 | 2023-03-17 14:45:59.981982+01
404426225 | 10689489 | passed | :28181:kernel-ec2 | 2023-03-17 14:45:59.991306+01
404426226 | 10689490 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.000173+01
404426227 | 10689491 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.009543+01
404426228 | 10689492 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.019818+01
404426229 | 10689493 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.028551+01
404426230 | 10689494 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.038049+01
404426231 | 10689495 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.04872+01
404426232 | 10689496 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.065565+01
404426233 | 10689497 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.080267+01
404426234 | 10689498 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.091491+01
404426235 | 10689499 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.103288+01
404426236 | 10689500 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.113283+01
404426237 | 10689501 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.122294+01
404426238 | 10689502 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.131599+01
404426239 | 10689503 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.140256+01
404426240 | 10689504 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.15304+01
404426241 | 10689505 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.165617+01
404426242 | 10689506 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.175326+01
404426243 | 10689507 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.185705+01
404426244 | 10689508 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.196715+01
404426245 | 10689509 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.205899+01
404426246 | 10689510 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.214893+01
404426247 | 10689511 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.224048+01
404426248 | 10689512 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.233713+01
404426249 | 10689513 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.245921+01
404426250 | 10689514 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.256369+01
404426251 | 10689515 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.267382+01
404426252 | 10689516 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.277881+01
404426253 | 10689517 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.288087+01
404426254 | 10689518 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.297678+01
404426255 | 10689519 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.307895+01
404426256 | 10689520 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.317602+01
404426257 | 10689521 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.328972+01
404426258 | 10689522 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.339168+01
404426259 | 10689523 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.350696+01
404426260 | 10689524 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.362105+01
404426261 | 10689525 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.372032+01
404426262 | 10689526 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.382003+01
404426263 | 10689527 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.391538+01
404426264 | 10689528 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.40127+01
404426265 | 10689529 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.410842+01
407680891 | 10690141 | passed | :28181:kernel-ec2 | 2023-03-17 14:46:00.421098+01
(55 rows)
</code></pre> QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6146812023-03-17T14:03:29Zkraihsebastian.riedel@suse.com
<ul></ul><p>jbaier_cz wrote:</p>
<blockquote>
<p>So the non-existent jobs just might got deleted by the users. Maybe we want a simple way to delete them in the dashboard or we might document that deleting jobs is not a good idea and should be replaced by force resulting to soft-fail and/or creating ignore for auto-approval comment (feature from #95479)</p>
</blockquote>
<p>That's what it looks like indeed. Should we maybe have an API endpoint in the dashboard like <code>DELETE /api/jobs/<job_id></code> that the bot calls, since it knows when a job is missing in openQA?</p>
<p><a href="https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1458064:" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1458064:</a></p>
<pre><code>2023-03-16 14:04:42 INFO Job 10689483 not found in openQA
</code></pre> QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6146872023-03-17T14:11:52Zjbaier_czjbaier@suse.cz
<ul></ul><p>I would generally agree, the only issue here is that I am not 100% sure that it is ok to delete missing openQA job without any manual intervention. My example case: an incident has two openQA jobs, one will pass and the other one will failed. After some period of time, the failing one get deleted (for example due to retention settings in the job group). Now the incident has only one successful job and will be auto-approved despite the bug indicated by the (now already deleted) job is still there.</p>
QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6146932023-03-17T14:37:00Zkraihsebastian.riedel@suse.com
<ul></ul><p>jbaier_cz wrote:</p>
<blockquote>
<p>I would generally agree, the only issue here is that I am not 100% sure that it is ok to delete missing openQA job without any manual intervention. My example case: an incident has two openQA jobs, one will pass and the other one will failed. After some period of time, the failing one get deleted (for example due to retention settings in the job group). Now the incident has only one successful job and will be auto-approved despite the bug indicated by the (now already deleted) job is still there.</p>
</blockquote>
<p>I got the impression that from the reviewer perspective all jobs no longer present in openQA are not considered by them anyway. If they do matter after all then we need a whole new dashboard feature here. Perhaps flag missing jobs as such in the database and present them accordingly in the dashboard ui.</p>
QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6147202023-03-17T15:50:58ZMDouchamartin.doucha@suse.com
<ul></ul><p>I recommend flagging the missing jobs in dashboard. Block autoreview but allow manual approval. Dashboard could also collect some info about the missing jobs from OpenQA audit log, mainly who deleted the jobs and when. The reviewer should then double check whether deleting the jobs was appropriate and either reschedule the missing jobs or approve manually.</p>
<p>Deleting jobs should happen very rarely when we decide to drop some jobs from schedule because they're obsolete and the jobs in question become broken for a few incidents before the removal gets approved and merged.</p>
QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6147292023-03-17T16:03:51Zkraihsebastian.riedel@suse.com
<ul><li><strong>Assignee</strong> set to <i>kraih</i></li></ul> QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6147322023-03-17T16:04:37Zkraihsebastian.riedel@suse.com
<ul><li><strong>Tags</strong> set to <i>reactive work</i></li></ul> QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6147622023-03-18T08:57:33Zokurzokurz@suse.com
<ul><li><strong>Target version</strong> set to <i>Ready</i></li></ul> QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6150502023-03-20T10:30:50Zmgrifalconi
<ul></ul><p>The direction we are going to with openQA review is to minimize manual actions to minimize mistakes and make the process more efficient but for this special occasion I agree to still require one, considering how rarely it happens and a risk to approve something by mistake.</p>
<p><a class="user active user-mention" href="https://progress.opensuse.org/users/33956">@MDoucha</a> a comment about: "The reviewer should then double check whether deleting the jobs was appropriate and either reschedule the missing jobs or approve manually."</p>
<p>I agree with that statement only if by "reviewer" you mean your squad internal reviewer, when finding out a RR is blocked (by looking at the dashboard and finds a red box with your squad name).</p>
<p>The "openqa review" should be only a safety net to make sure RR do not rot in the queue when squads fail to do their internal review on time.</p>
QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6157702023-03-21T15:45:47Zosukup
<ul></ul><ul>
<li>we really need ability to force reschedule jobs --> some element in UI which forces remove records of already sheduled jobs for incidents or mark them in database and don't serve them to qem-bot in schedule incident run to reschedule tests</li>
</ul>
QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6166222023-03-23T10:45:56Zlivdywanliv.dywan@suse.com
<ul><li><strong>Tracker</strong> changed from <i>action</i> to <i>coordination</i></li><li><strong>Subject</strong> changed from <i>[qem-bot] Inconsistent job counts in qem-dashboard</i> to <i>[epic][qem-bot] Inconsistent job counts in qem-dashboard size:M</i></li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/616622/diff?detail_id=579020">diff</a>)</li><li><strong>Status</strong> changed from <i>New</i> to <i>Blocked</i></li></ul> QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6166372023-03-23T11:05:32Zokurzokurz@suse.com
<ul></ul><p>Blocked by what?</p>
QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6167542023-03-23T15:22:10Zkraihsebastian.riedel@suse.com
<ul></ul><p>okurz wrote:</p>
<blockquote>
<p>Blocked by what?</p>
</blockquote>
<p>In the estimation meeting I promised to make 3 followup tickets that will block this one. And i'm about to start writing them. :)</p>
QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6167842023-03-23T16:00:15Zkraihsebastian.riedel@suse.com
<ul></ul><p>Blocked by <a class="issue tracker-4 status-3 priority-4 priority-default closed child behind-schedule" title="action: [qem-dashboard] Add an API endpoint to flag openQA jobs as missing in openQA size:M (Resolved)" href="https://progress.opensuse.org/issues/126548">#126548</a>.</p>
QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6189412023-03-30T20:38:46Zkraihsebastian.riedel@suse.com
<ul></ul><p>Blocked by <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: [qem-bot] Flag missing openQA jobs with qem-dashboard API size:M (Resolved)" href="https://progress.opensuse.org/issues/126551">#126551</a>.</p>
QA - coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:Mhttps://progress.opensuse.org/issues/126167?journal_id=6459262023-06-22T12:37:19Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Blocked</i> to <i>New</i></li><li><strong>Assignee</strong> deleted (<del><i>kraih</i></del>)</li><li><strong>Target version</strong> changed from <i>Ready</i> to <i>future</i></li></ul><p>Two subtasks resolved, third is in future</p>