Project

General

Profile

Actions

action #178249

open

coordination #176337: [saga][epic] Stable os-autoinst backends with stable command execution (no mistyping)

coordination #176340: [epic] Stable qemu backend with no unexpected mistyping

load detection and job flagging under high load conditions in openQA job execution size:S

Added by gpuliti 30 days ago. Updated 6 days ago.

Status:
Workable
Priority:
Low
Assignee:
-
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Related to point 2 of #175060#note-13

Recent test failures in openQA seems that can be related to the high system load in circleCI. Similar situations can happen in any environment running openQA so we should introduce a load detection during openQA job execution and flag openQA jobs failing or "flagging" jobs where load was high and result might be tainted

Acceptance criteria

  • AC1: Failed, Unreviewed openQA jobs clearly show if the load of executing openQA workers was high
  • AC2: No too verbose output is shown if the load is not high

Suggestions

  • https://app.circleci.com/pipelines/github/os-autoinst/openQA/16047/workflows/be893671-47af-4c11-a2a7-a69140b42c57/jobs/154592
  • Override the load detection to more easily reproduce the issue?
  • Consider the number of CPU's in this context
  • Come up with a general definition of "high load" or provide a mechanism easily customized to individual workers
  • Should isotovideo be aware of this mechanism? Or provide data at least? isotovideo could print out the system load during execution regardless of final result. Then in one of our investigation scripts or worker we could read this value and present it, e.g. in an openqa-investigate comment. As alternative implement this within the openQA worker (using existing code for the load check while the worker is idling).

Related issues 1 (1 open0 closed)

Precedes openQA Project (public) - action #175060: [sporadic] [Workflow] Failed: os-autoinst/openQA on master / test (7dc9d82) size:MBlockedgpuliti

Actions
Actions #1

Updated by gpuliti 30 days ago

  • Parent task deleted (#175060)
Actions #2

Updated by gpuliti 30 days ago

  • Precedes action #175060: [sporadic] [Workflow] Failed: os-autoinst/openQA on master / test (7dc9d82) size:M added
Actions #3

Updated by okurz 30 days ago

  • Parent task set to #176340
Actions #4

Updated by tinita 29 days ago

  • Description updated (diff)
Actions #5

Updated by tinita 29 days ago ยท Edited

Is the word "job" here referring to an openQA job or a CircleCI job?

Actions #6

Updated by tinita 21 days ago

  • Subject changed from load detection and job flagging when high load conditions in openQA to load detection and job flagging under high load conditions in openQA job execution size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #7

Updated by okurz 16 days ago

  • Priority changed from Normal to Low
Actions #8

Updated by gpuliti 6 days ago

Sorry for the delay, didn't saw the comment. It refer to openqa jobs.

Actions

Also available in: Atom PDF