action #121588
openJobs shown on qem-dashboard are missing in OSD
0%
Description
Observation¶
Missing tests for SLES 15-SP3 on dashboard:
http://dashboard.qam.suse.de/incident/26100
When using the dashboard, it shows the icon for SLES-15-SP3 in RED, but when opening the link, the page is empty:
https://openqa.suse.de/tests/overview?build=%3A26100%3Atcl&distri=sle&groupid=367
Files
Updated by kraih about 2 years ago
- Subject changed from incident jobs are not created to Jobs shown on qem-dashboard are missing in OSD
- Description updated (diff)
Updated by kraih about 2 years ago
I've looked at the dashboard database and these are the OSD jobs in question:
dashboard_db=# SELECT job_group, group_id, job_id, status, updated FROM openqa_jobs oj JOIN incident_openqa_settings os ON oj.incident_settings = os.id WHERE os.incident = 5449942 AND group_id = 367;
job_group | group_id | job_id | status | updated
-----------------------------------+----------+---------+---------+-------------------------------
Maintenance: SLE 15 SP3 Incidents | 367 | 9587249 | failed | 2022-11-01 05:46:32.415023+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587251 | failed | 2022-11-01 05:46:32.431005+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587252 | stopped | 2022-11-01 05:46:32.43856+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587250 | failed | 2022-11-01 05:46:32.422772+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587253 | stopped | 2022-11-01 05:46:32.446451+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587254 | failed | 2022-11-01 05:46:32.374261+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587265 | passed | 2022-11-01 05:46:32.707045+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587256 | passed | 2022-11-01 05:46:32.62426+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587258 | passed | 2022-11-01 05:46:32.644131+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587262 | passed | 2022-11-01 05:46:32.680608+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587261 | passed | 2022-11-01 05:46:32.672063+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587257 | failed | 2022-11-01 05:46:32.63335+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587260 | passed | 2022-11-01 05:46:32.663346+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587255 | passed | 2022-11-01 05:46:32.610727+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587259 | passed | 2022-11-01 05:46:32.654128+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587264 | passed | 2022-11-01 05:46:32.69896+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587266 | passed | 2022-11-01 05:46:32.716857+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587263 | passed | 2022-11-01 05:46:32.689074+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587269 | failed | 2022-11-01 05:46:31.736583+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587268 | failed | 2022-11-01 05:46:31.757022+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587270 | failed | 2022-11-01 05:46:32.043626+01
Maintenance: SLE 15 SP3 Incidents | 367 | 9587267 | failed | 2022-11-01 05:46:31.766053+01
(22 rows)
Looking at the OSD database, they are indeed not present there.
Edit: Updated data with states and timestamps.
Updated by okurz about 2 years ago
- Target version set to future
Well, likely the jobs have just been pruned in the meantime. We can check if we can save results for longer based on available space. That would help to mitigate. But also we should think of solutions that would make sure that the dashboard really only reflects jobs that are present in OSD. Well, ok, there can be some minutes of gracetime until the new data is synchronized.
Updated by kraih about 2 years ago
- Target version deleted (
future)
okurz wrote:
Well, likely the jobs have just been pruned in the meantime. We can check if we can save results for longer based on available space. That would help to mitigate. But also we should think of solutions that would make sure that the dashboard really only reflects jobs that are present in OSD. Well, ok, there can be some minutes of gracetime until the new data is synchronized.
That's something the new /api/v1/job_settings/jobs
API could be used for. Although i suspect that its current limit to the 20k most recent job ids would cause new cases of missing data. So maybe that should be adressed first. :)