Project

General

Profile

Actions

action #70774

open

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #96263: [epic] Exclude certain Minion tasks from "Too many Minion job failures alert" alert

save_needle Minion tasks fail frequently and needles could get lost

Added by mkittler over 4 years ago. Updated 7 months ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2020-09-01
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

The save_needle Minion task fails frequently on OSD and also sometimes on o3.

This can be observed using the following query parameters: https://openqa.suse.de/minion/jobs?state=failed&offset=0&task=save_needle
I'm going to remove most of these jobs to calm down the alert but right now 24 jobs have piled up over 2 month. However, the problem actually exists longer than 2 month but the failures have been manually cleaned up so far.

The problem here is always that the Git working tree is in a state which can not be handled by the task:

1.

  "result" => {
    "error" => "<strong>Failed to save addon_products-module-dev-tools-pvm-20200805.</strong><br><pre>Unable to commit via Git: On branch master\nYour branch is up to date with 'origin/master'.\n\nnothing to commit, working tree clean</pre>"
  },

2.

  "result" => {
    "error" => "<strong>Failed to save manually_add_profile-AppArmor-Chose-a-program-to-generate-a-profile-20200827.</strong><br><pre>Unable to reset repository to origin/master: error: cannot rebase: Your index contains uncommitted changes.\nerror: Please commit or stash them.</pre>"
  },

Acceptance criteria

  • AC1: The save_needle task can handle problematic situations mentioned below.

Suggestions

It would be useful if the task would be able to handle the problematic situations itself instead of requiring manual intervention. Note that the delete_needle task (which shares the same Git code) is also affected. We have likely less problems there because that task is not executed that often.

Problematic situations

  1. No diff has been produced which could be committed: Maybe that's simply when there's no actual change and we can simply return early in that case.
  2. The Git directory contains uncommitted changes: We could save these changes on a new branch before rebasing.
  3. We can not push the new commit because in the meantime new commits have been pushed to the remote from elsewhere: Just repeat the procedure.
  4. The fetch needles script is interfering.

Related issues 4 (0 open4 closed)

Related to openQA Project (public) - coordination #33745: [epic] Improve handling of external Git repositories (for needles)Resolvedmkittler2024-06-20

Actions
Related to openQA Infrastructure (public) - action #61221: osd: unable to save needles, minion fails with "fatal: Unable to create '/var/lib/openqa/.../needles/.git/index.lock'"Resolvedokurz2019-12-20

Actions
Related to openQA Infrastructure (public) - action #98499: [alert] web UI: Too many Minion job failures alert size:SResolvedmkittler2021-09-13

Actions
Has duplicate openQA Project (public) - action #75070: save_needle minion task fails because "Your branch is ahead of 'origin/master'"Rejected2020-10-22

Actions
Actions

Also available in: Atom PDF