Project

General

Profile

Actions

action #181382

closed

OpenQA Jobs test - Incomplete jobs (not restarted) of last 24h alert Salt 2025-04-24 size:S

Added by ybonatakis about 1 month ago. Updated 28 days ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Start date:
2025-04-24
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

Alert on 2025-04-24 14:21:00 +0000 UTC

https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&viewPanel=panel-17&from=2025-04-23T16:18:30.947Z&to=2025-04-24T16:12:52.123Z&timezone=UTC

Suggestions

Actions #1

Updated by ybonatakis about 1 month ago

Recovered at 2025-04-24 14:21:00 +0000 UTC

Actions #2

Updated by okurz about 1 month ago

  • Target version set to Ready

ybonatakis wrote:

Alert on 2025-04-24 14:21:00 +0000 UTC

https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&viewPanel=panel-17&from=2025-04-23T16:18:30.947Z&to=2025-04-24T16:12:52.123Z&timezone=UTC

I think the service restarted around that time

iob@openqa:~> systemctl status salt-minion.service 
● salt-minion.service - The Salt Minion
     Loaded: loaded (/usr/lib/systemd/system/salt-minion.service; enabled; preset: disabled)
     Active: active (running) since Thu 2025-04-24 14:09:40 UTC; 2h 14min ago
   Main PID: 1458 (salt-minion)
      Tasks: 5 (limit: 4915)
        CPU: 1min 21.762s
     CGroup: /system.slice/salt-minion.service
             ├─1458 /usr/bin/python3.6 /usr/bin/salt-minion
             └─1838 /usr/bin/python3.6 /usr/bin/salt-minion

Warning: some journal files were not opened due to insufficient permissions.

Why did you show the status of salt-minion? That has nothing to do with openQA jobs. And sharing the warning message that you get because you run systemctl status w/o root is not really helpful.

Actions #3

Updated by okurz about 1 month ago

  • Subject changed from OpenQA Jobs test - Incomplete jobs (not restarted) of last 24h alert Salt 2025-04-24 to OpenQA Jobs test - Incomplete jobs (not restarted) of last 24h alert Salt 2025-04-24 size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by nicksinger about 1 month ago

  • Status changed from Workable to In Progress
  • Assignee set to nicksinger

This was right after the longer OSD downtime. I will check the references of these jobs anyway and see if we missed something else.

Actions #5

Updated by openqa_review about 1 month ago

  • Due date set to 2025-05-17

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by nicksinger 29 days ago

  • Status changed from In Progress to Resolved

I was looking at the jobs already last week with openqa-incompletes-stats and found most of the not restarted jobs to be caused by https://progress.opensuse.org/issues/181184 - all of them have the proper label and a look around on OSD showed that by now most tests ran again (either the restarts finished, proper references where added after the fact or tests ran as part of the normal schedule). I don't think there is much we can improve here and should focus on the stability of OSD in general as we already describe in other tasks.

Actions #7

Updated by okurz 28 days ago

  • Due date deleted (2025-05-17)
Actions

Also available in: Atom PDF