action #181382
closedOpenQA Jobs test - Incomplete jobs (not restarted) of last 24h alert Salt 2025-04-24 size:S
0%
Description
Observation¶
Alert on 2025-04-24 14:21:00 +0000 UTC
Suggestions¶
- Check the reason for why openQA jobs end up as incomplete: Use https://github.com/os-autoinst/scripts/blob/master/openqa-incompletes-stats
- Based on that fix the most prominent issues
Updated by ybonatakis about 1 month ago
Recovered at 2025-04-24 14:21:00 +0000 UTC
Updated by okurz about 1 month ago
- Target version set to Ready
ybonatakis wrote:
Alert on 2025-04-24 14:21:00 +0000 UTC
I think the service restarted around that time
iob@openqa:~> systemctl status salt-minion.service ● salt-minion.service - The Salt Minion Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; preset: disabled) Active: active (running) since Thu 2025-04-24 14:09:40 UTC; 2h 14min ago Main PID: 1458 (salt-minion) Tasks: 5 (limit: 4915) CPU: 1min 21.762s CGroup: /system.slice/salt-minion.service ├─1458 /usr/bin/python3.6 /usr/bin/salt-minion └─1838 /usr/bin/python3.6 /usr/bin/salt-minion Warning: some journal files were not opened due to insufficient permissions.
Why did you show the status of salt-minion? That has nothing to do with openQA jobs. And sharing the warning message that you get because you run systemctl status
w/o root is not really helpful.
Updated by okurz about 1 month ago
- Subject changed from OpenQA Jobs test - Incomplete jobs (not restarted) of last 24h alert Salt 2025-04-24 to OpenQA Jobs test - Incomplete jobs (not restarted) of last 24h alert Salt 2025-04-24 size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by nicksinger about 1 month ago
- Status changed from Workable to In Progress
- Assignee set to nicksinger
This was right after the longer OSD downtime. I will check the references of these jobs anyway and see if we missed something else.
Updated by openqa_review about 1 month ago
- Due date set to 2025-05-17
Setting due date based on mean cycle time of SUSE QE Tools
Updated by nicksinger 29 days ago
- Status changed from In Progress to Resolved
I was looking at the jobs already last week with openqa-incompletes-stats
and found most of the not restarted jobs to be caused by https://progress.opensuse.org/issues/181184 - all of them have the proper label and a look around on OSD showed that by now most tests ran again (either the restarts finished, proper references where added after the fact or tests ran as part of the normal schedule). I don't think there is much we can improve here and should focus on the stability of OSD in general as we already describe in other tasks.