Project

General

Profile

Actions

action #111926

closed

osd-deployment pipeline failed: test 481 -le 0, due to job age alert, likely just the raspberry pi based tests stuck in schedule

Added by tinita almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2022-06-01
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://gitlab.suse.de/openqa/osd-deployment/-/jobs/996509

++ echo '$ test ${#new} -le ${#old}'
$ test ${#new} -le ${#old}
++ test 481 -le 0
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: command terminated with exit code 1

Related issues 2 (0 open2 closed)

Related to openQA Project - action #111590: [alert] HPC jobs not picked up for multiple days, job age alert triggeredResolvedlivdywan2022-05-252022-06-14

Actions
Related to openQA Project - action #105855: [easy][beginner] openqa-worker@.service should handle NTP unavailability gracefullyResolvedlivdywan2022-02-022022-07-22

Actions
Actions #1

Updated by tinita almost 2 years ago

  • Project changed from QA to openQA Infrastructure
Actions #2

Updated by tinita almost 2 years ago

Actions #3

Updated by okurz almost 2 years ago

  • Related to action #111590: [alert] HPC jobs not picked up for multiple days, job age alert triggered added
Actions #4

Updated by okurz almost 2 years ago

  • Tags set to reactive work
  • Priority changed from Normal to High
  • Target version set to Ready
Actions #5

Updated by livdywan almost 2 years ago

The pipeline apparently finished June 1 8:18am GMT+2 which is before #111590#note-13 meaning this was just reported later than the assessment that the alert was fine.

Actions #6

Updated by okurz almost 2 years ago

  • Tags changed from reactive work to reactive work, next-office-day
  • Subject changed from osd-deployment pipeline failed: test 481 -le 0 to osd-deployment pipeline failed: test 481 -le 0, due to job age alert, likely just the raspberry pi based tests stuck in schedule
  • Assignee set to nicksinger

@nicksinger you mentioned this issue so you can look into this anyway the next day in office.

For now https://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?tab=alert&viewPanel=5&orgId=1&from=1653896632482&to=1654210000358 looks fine but if we are nearing the alert threshold and if this is still not solved until then or if we even see an additional alert at time then please pause the alert until the problem is fixed.

Actions #7

Updated by nicksinger almost 2 years ago

  • Status changed from New to Feedback

okurz wrote:

@nicksinger you mentioned this issue so you can look into this anyway the next day in office.

I asked @dheidler for support in https://suse.slack.com/archives/C02CANHLANP/p1654596394537099 - without any information about the setup there is not much I can do

Actions #8

Updated by dheidler almost 2 years ago

The worker host (which is a RPi 400 atm) was restarted but the openQA-worker@*.service didn't come up due to https://progress.opensuse.org/issues/105855

I fixed it for now by restarting the worker deamons.

Actions #9

Updated by dheidler almost 2 years ago

  • Related to action #105855: [easy][beginner] openqa-worker@.service should handle NTP unavailability gracefully added
Actions #10

Updated by nicksinger almost 2 years ago

  • Status changed from Feedback to Resolved

I think we can resolve this again as we have a follow-up ticket about the real problem

Actions

Also available in: Atom PDF