action #111926
closedosd-deployment pipeline failed: test 481 -le 0, due to job age alert, likely just the raspberry pi based tests stuck in schedule
0%
Description
Observation¶
https://gitlab.suse.de/openqa/osd-deployment/-/jobs/996509
++ echo '$ test ${#new} -le ${#old}'
$ test ${#new} -le ${#old}
++ test 481 -le 0
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: command terminated with exit code 1
Updated by tinita over 2 years ago
- Project changed from QA to openQA Infrastructure
Updated by tinita over 2 years ago
I retriggered and now it passes.
https://gitlab.suse.de/openqa/osd-deployment/-/jobs/997379
Updated by okurz over 2 years ago
- Related to action #111590: [alert] HPC jobs not picked up for multiple days, job age alert triggered added
Updated by okurz over 2 years ago
- Tags set to reactive work
- Priority changed from Normal to High
- Target version set to Ready
Updated by livdywan over 2 years ago
The pipeline apparently finished June 1 8:18am GMT+2 which is before #111590#note-13 meaning this was just reported later than the assessment that the alert was fine.
Updated by okurz over 2 years ago
- Tags changed from reactive work to reactive work, next-office-day
- Subject changed from osd-deployment pipeline failed: test 481 -le 0 to osd-deployment pipeline failed: test 481 -le 0, due to job age alert, likely just the raspberry pi based tests stuck in schedule
- Assignee set to nicksinger
@nicksinger you mentioned this issue so you can look into this anyway the next day in office.
For now https://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?tab=alert&viewPanel=5&orgId=1&from=1653896632482&to=1654210000358 looks fine but if we are nearing the alert threshold and if this is still not solved until then or if we even see an additional alert at time then please pause the alert until the problem is fixed.
Updated by nicksinger over 2 years ago
- Status changed from New to Feedback
okurz wrote:
@nicksinger you mentioned this issue so you can look into this anyway the next day in office.
I asked @dheidler for support in https://suse.slack.com/archives/C02CANHLANP/p1654596394537099 - without any information about the setup there is not much I can do
Updated by dheidler over 2 years ago
The worker host (which is a RPi 400 atm) was restarted but the openQA-worker@*.service didn't come up due to https://progress.opensuse.org/issues/105855
I fixed it for now by restarting the worker deamons.
Updated by dheidler over 2 years ago
- Related to action #105855: [easy][beginner] openqa-worker@.service should handle NTP unavailability gracefully added
Updated by nicksinger over 2 years ago
- Status changed from Feedback to Resolved
I think we can resolve this again as we have a follow-up ticket about the real problem