Project

General

Profile

action #61221

openQA Project - coordination #58184: [saga][epic][use case] full version control awareness within openQA, e.g. user forks and branches, fully versioned test schedules and configuration settings

openQA Project - coordination #45302: [epic] smarter fetchneedles (was: fetchneedles should ensure we are always on a branch (and try to self-repair))

osd: unable to save needles, minion fails with "fatal: Unable to create '/var/lib/openqa/.../needles/.git/index.lock'"

Added by okurz almost 2 years ago. Updated about 1 year ago.

Status:
New
Priority:
Low
Assignee:
-
Target version:
Start date:
2019-12-20
Due date:
% Done:

0%

Estimated time:

Description

Observation

grafana monitoring alert failed:

[osd-admins] [Alerting] Minion Jobs alert
From:   Grafana <osd-admin@suse.de>
To: osd-admins@suse.de
Sender: osd-admins <osd-admins-bounces+okurz=suse.de@suse.de>
List-Id:    <osd-admins.suse.de>
Date:   19/12/2019 16.05
Note: This is an HTML message. For security reasons, only the raw HTML code is shown. If you trust the sender of this message then you can activate formatted HTML display for this message by clicking here.

*/[Alerting] Minion Jobs alert/* 
Too many failed Minion jobs 
*Metric name* 
*Value* 
Failed 
21.505 

referencing https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?fullscreen&edit&tab=alert&panelId=19&orgId=1&refresh=30s

https://openqa.suse.de/minion/jobs?state=failed&offset=0 shows e.g. https://openqa.suse.de/minion/jobs?id=300982 with the details:

{
  "args" => [
    {
      "imagedir" => "",
      "imagedistri" => undef,
      "imagename" => "partitioning_raid-9.png",
      "imageversion" => undef,
      "job_id" => 3727183,
      "needle_json" => "{\r\n  \"area\": [\r\n    {\r\n      \"height\": 43,\r\n      \"ypos\": 725,\r\n      \"xpos\": 956,\r\n      \"type\": \"match\",\r\n      \"width\": 68\r\n    },\r\n    {\r\n      \"type\": \"match\",\r\n      \"width\": 20,\r\n      \"xpos\": 84,\r\n      \"ypos\": 209,\r\n      \"height\": 54\r\n    }\r\n  ],\r\n  \"properties\": [],\r\n  \"tags\": [\r\n    \"ENV-15SP2ORLATER-1\",\r\n    \"partitioning_raid-hard_disks-unfolded\",\r\n    \"storage-ng\"\r\n  ]\r\n}",
      "needledir" => "/var/lib/openqa/share/tests/sle/products/sle/needles",
      "needlename" => "partitioning_raid-hard_disks-unfolded-icon_scheme-hyperv-20191219",
      "overwrite" => undef,
      "user_id" => 194
    }
  ],
  "attempts" => 1,
  "children" => [],
  "created" => "2019-12-19T15:28:05.15569Z",
  "delayed" => "2019-12-19T15:28:05.15569Z",
  "finished" => "2019-12-19T15:28:07.02545Z",
  "id" => 300982,
  "notes" => {
    "gru_id" => 27415232,
    "ttl" => 60
  },
  "parents" => [],
  "priority" => 10,
  "queue" => "default",
  "result" => {
    "error" => "<strong>Failed to save partitioning_raid-hard_disks-unfolded-icon_scheme-hyperv-20191219.</strong><br><pre>Unable to add via Git: fatal: Unable to create '/var/lib/openqa/share/tests/sle/products/sle/needles/.git/index.lock': File exists.\n\nAnother git process seems to be running in this repository, e.g.\nan editor opened by 'git commit'. Please make sure all processes\nare terminated then try again. If it still fails, a git process\nmay have crashed in this repository earlier:\nremove the file manually to continue.</pre>"
  },
  "retried" => undef,
  "retries" => 0,
  "started" => "2019-12-19T15:28:05.17528Z",
  "state" => "failed",
  "task" => "save_needle",
  "time" => "2019-12-20T06:15:21.83364Z",
  "worker" => 294
}

In the needle directory on osd I can see:

geekotest@openqa:~/share/tests/sle/products/sle/needles> git status
On branch master
Your branch is up to date with 'origin/master'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        partitioning_raid-hard_disks-unfolded-icon_scheme-hyperv-20191219.json
        partitioning_raid-hard_disks-unfolded-icon_scheme-hyperv-20191219.png

nothing added to commit but untracked files present (use "git add" to track)

so files are created, branch is clean but files are not commited and not pushed.

There are many more failed minion jobs, mainly about "TTL Expired".

Problem

  • H1: While fetchneedles-sles was running the needle commit minion job was running and failing on the already locked directory.

Workaround

Commit manually and push.


Related issues

Related to openQA Project - action #70774: save_needle Minion tasks fail frequentlyNew2020-09-01

Related to openQA Project - action #89560: Add alert for blocked gitlab account when users are unable to save/commit needlesWorkable2021-03-05

History

#1 Updated by okurz about 1 year ago

  • Target version set to Ready

#2 Updated by okurz about 1 year ago

  • Priority changed from Normal to Low
  • Parent task set to #45302

#3 Updated by okurz about 1 year ago

  • Subject changed from osd: unable to save needles, "fatal: Unable to create '/var/lib/openqa/share/tests/sle/products/sle/needles/.git/index.lock'" to osd: unable to save needles, minion fails with "fatal: Unable to create '/var/lib/openqa/.../needles/.git/index.lock'"
  • Target version changed from Ready to future

#4 Updated by okurz about 1 year ago

  • Related to action #70774: save_needle Minion tasks fail frequently added

#5 Updated by okurz 8 months ago

  • Related to action #89560: Add alert for blocked gitlab account when users are unable to save/commit needles added

Also available in: Atom PDF