Project

General

Profile

action #156535

Updated by dheidler 3 months ago

## Observation 
 See #156460-8 
 > I'm checking results in maintenance dashboard and i can see http://dashboard.qam.suse.de/blocked?group_names=hpc&incident=32814 that jobs either are running or not finished. But job groups in openQA are green and empty https://openqa.suse.de/tests/overview?build=%3A32814%3Apython3&groupid=364 https://openqa.suse.de/tests/overview?build=%3A32814%3Apython3&groupid=434. 

 ## Suggestions 
 * Possibly an easy workaround is to retrigger the build of the according release requests 
 * Check audit logs for a trace of how incident test are scheduled in general (not the specific ones we lost), or ask in #discuss-qa-maintenance, ask maintenance coordinators 
 * messages like  
 https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2340034#L2244 
  suggests we have some unmatched jobs 
 * Remove according inconsistent results from the qem-dashboard database 
 * Trigger according jobs to sync and schedule pending data on https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules 
 * Also trigger aggregate tests to not need to wait until the end of the day for tests to start 
 * Monitor the execution of jobs and the presentation on the dashboard 
 * Re-enable scheduling aggregates again on https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules 
 * Again monitor the execution of jobs and the presentation on the dashboard 


 ## Rollback actions 
 * Remove silence `alertname=Queue: State (SUSE) alert` from https://stats.openqa-monitor.qa.suse.de/alerting/silences 
 * Reactivate `Schedule updates/aggregates (0 20 * * 1-5,7)` at https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules on 2024-03-05 

 ## Out of scope 
 * Fix the problematic design of qem-dashboard

Back