Project

General

Profile

action #111926

osd-deployment pipeline failed: test 481 -le 0, due to job age alert, likely just the raspberry pi based tests stuck in schedule

Added by tinita 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2022-06-01
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://gitlab.suse.de/openqa/osd-deployment/-/jobs/996509

++ echo '$ test ${#new} -le ${#old}'
$ test ${#new} -le ${#old}
++ test 481 -le 0
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: command terminated with exit code 1

Related issues

Related to openQA Project - action #111590: [alert] HPC jobs not picked up for multiple days, job age alert triggeredResolved2022-05-252022-06-14

Related to openQA Project - action #105855: [easy][beginner] openqa-worker@.service should handle NTP unavailability gracefullyResolved2022-02-022022-07-22

History

#1 Updated by tinita 2 months ago

  • Project changed from QA to openQA Infrastructure

#2 Updated by tinita 2 months ago

#3 Updated by okurz 2 months ago

  • Related to action #111590: [alert] HPC jobs not picked up for multiple days, job age alert triggered added

#4 Updated by okurz 2 months ago

  • Tags set to reactive work
  • Priority changed from Normal to High
  • Target version set to Ready

#5 Updated by cdywan 2 months ago

The pipeline apparently finished June 1 8:18am GMT+2 which is before #111590#note-13 meaning this was just reported later than the assessment that the alert was fine.

#6 Updated by okurz 2 months ago

  • Tags changed from reactive work to reactive work, next-office-day
  • Subject changed from osd-deployment pipeline failed: test 481 -le 0 to osd-deployment pipeline failed: test 481 -le 0, due to job age alert, likely just the raspberry pi based tests stuck in schedule
  • Assignee set to nicksinger

nicksinger you mentioned this issue so you can look into this anyway the next day in office.

For now https://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?tab=alert&viewPanel=5&orgId=1&from=1653896632482&to=1654210000358 looks fine but if we are nearing the alert threshold and if this is still not solved until then or if we even see an additional alert at time then please pause the alert until the problem is fixed.

#7 Updated by nicksinger 2 months ago

  • Status changed from New to Feedback

okurz wrote:

nicksinger you mentioned this issue so you can look into this anyway the next day in office.

I asked dheidler for support in https://suse.slack.com/archives/C02CANHLANP/p1654596394537099 - without any information about the setup there is not much I can do

#8 Updated by dheidler 2 months ago

The worker host (which is a RPi 400 atm) was restarted but the openQA-worker@*.service didn't come up due to https://progress.opensuse.org/issues/105855

I fixed it for now by restarting the worker deamons.

#9 Updated by dheidler 2 months ago

  • Related to action #105855: [easy][beginner] openqa-worker@.service should handle NTP unavailability gracefully added

#10 Updated by nicksinger about 2 months ago

  • Status changed from Feedback to Resolved

I think we can resolve this again as we have a follow-up ticket about the real problem

Also available in: Atom PDF