Project

General

Profile

Actions

action #61221

closed

openQA Project (public) - coordination #58184: [saga][epic][use case] full version control awareness within openQA

openQA Project (public) - coordination #45302: [epic] smarter fetchneedles (was: fetchneedles should ensure we are always on a branch (and try to self-repair))

osd: unable to save needles, minion fails with "fatal: Unable to create '/var/lib/openqa/.../needles/.git/index.lock'"

Added by okurz almost 5 years ago. Updated 6 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Regressions/Crashes
Start date:
2019-12-20
Due date:
% Done:

0%

Estimated time:

Description

Observation

grafana monitoring alert failed:

[osd-admins] [Alerting] Minion Jobs alert
From:   Grafana <osd-admin@suse.de>
To: osd-admins@suse.de
Sender: osd-admins <osd-admins-bounces+okurz=suse.de@suse.de>
List-Id:    <osd-admins.suse.de>
Date:   19/12/2019 16.05
Note: This is an HTML message. For security reasons, only the raw HTML code is shown. If you trust the sender of this message then you can activate formatted HTML display for this message by clicking here.

*/[Alerting] Minion Jobs alert/* 
Too many failed Minion jobs 
*Metric name* 
*Value* 
Failed 
21.505 

referencing https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?fullscreen&edit&tab=alert&panelId=19&orgId=1&refresh=30s

https://openqa.suse.de/minion/jobs?state=failed&offset=0 shows e.g. https://openqa.suse.de/minion/jobs?id=300982 with the details:

{
  "args" => [
    {
      "imagedir" => "",
      "imagedistri" => undef,
      "imagename" => "partitioning_raid-9.png",
      "imageversion" => undef,
      "job_id" => 3727183,
      "needle_json" => "{\r\n  \"area\": [\r\n    {\r\n      \"height\": 43,\r\n      \"ypos\": 725,\r\n      \"xpos\": 956,\r\n      \"type\": \"match\",\r\n      \"width\": 68\r\n    },\r\n    {\r\n      \"type\": \"match\",\r\n      \"width\": 20,\r\n      \"xpos\": 84,\r\n      \"ypos\": 209,\r\n      \"height\": 54\r\n    }\r\n  ],\r\n  \"properties\": [],\r\n  \"tags\": [\r\n    \"ENV-15SP2ORLATER-1\",\r\n    \"partitioning_raid-hard_disks-unfolded\",\r\n    \"storage-ng\"\r\n  ]\r\n}",
      "needledir" => "/var/lib/openqa/share/tests/sle/products/sle/needles",
      "needlename" => "partitioning_raid-hard_disks-unfolded-icon_scheme-hyperv-20191219",
      "overwrite" => undef,
      "user_id" => 194
    }
  ],
  "attempts" => 1,
  "children" => [],
  "created" => "2019-12-19T15:28:05.15569Z",
  "delayed" => "2019-12-19T15:28:05.15569Z",
  "finished" => "2019-12-19T15:28:07.02545Z",
  "id" => 300982,
  "notes" => {
    "gru_id" => 27415232,
    "ttl" => 60
  },
  "parents" => [],
  "priority" => 10,
  "queue" => "default",
  "result" => {
    "error" => "<strong>Failed to save partitioning_raid-hard_disks-unfolded-icon_scheme-hyperv-20191219.</strong><br><pre>Unable to add via Git: fatal: Unable to create '/var/lib/openqa/share/tests/sle/products/sle/needles/.git/index.lock': File exists.\n\nAnother git process seems to be running in this repository, e.g.\nan editor opened by 'git commit'. Please make sure all processes\nare terminated then try again. If it still fails, a git process\nmay have crashed in this repository earlier:\nremove the file manually to continue.</pre>"
  },
  "retried" => undef,
  "retries" => 0,
  "started" => "2019-12-19T15:28:05.17528Z",
  "state" => "failed",
  "task" => "save_needle",
  "time" => "2019-12-20T06:15:21.83364Z",
  "worker" => 294
}

In the needle directory on osd I can see:

geekotest@openqa:~/share/tests/sle/products/sle/needles> git status
On branch master
Your branch is up to date with 'origin/master'.

Untracked files:
  (use "git add <file>..." to include in what will be committed)

        partitioning_raid-hard_disks-unfolded-icon_scheme-hyperv-20191219.json
        partitioning_raid-hard_disks-unfolded-icon_scheme-hyperv-20191219.png

nothing added to commit but untracked files present (use "git add" to track)

so files are created, branch is clean but files are not commited and not pushed.

There are many more failed minion jobs, mainly about "TTL Expired".

Problem

  • H1: While fetchneedles-sles was running the needle commit minion job was running and failing on the already locked directory.

Workaround

Commit manually and push.


Related issues 2 (2 open0 closed)

Related to openQA Project (public) - action #70774: save_needle Minion tasks fail frequently and needles could get lostNew2020-09-01

Actions
Related to openQA Project (public) - action #89560: Add alert for blocked gitlab account when users are unable to save/commit needlesWorkable2021-03-05

Actions
Actions #1

Updated by okurz over 4 years ago

  • Target version set to Ready
Actions #2

Updated by okurz about 4 years ago

  • Priority changed from Normal to Low
  • Parent task set to #45302
Actions #3

Updated by okurz about 4 years ago

  • Subject changed from osd: unable to save needles, "fatal: Unable to create '/var/lib/openqa/share/tests/sle/products/sle/needles/.git/index.lock'" to osd: unable to save needles, minion fails with "fatal: Unable to create '/var/lib/openqa/.../needles/.git/index.lock'"
  • Target version changed from Ready to future
Actions #4

Updated by okurz about 4 years ago

  • Related to action #70774: save_needle Minion tasks fail frequently and needles could get lost added
Actions #5

Updated by okurz almost 4 years ago

  • Related to action #89560: Add alert for blocked gitlab account when users are unable to save/commit needles added
Actions #6

Updated by okurz 9 months ago

Just from yesterday https://openqa.suse.de/minion/jobs?id=10638935

error: 'Failed to save import-untrusted-gpg-key-nvidia-compute-9CD0A493D42D0685-2-20240313.Unable
to commit via Git: fatal: unable to write new_index file'

Actions #7

Updated by okurz 6 months ago

  • Category set to Regressions/Crashes
  • Status changed from New to Resolved
  • Assignee set to okurz
  • Target version changed from future to Ready
Actions

Also available in: Atom PDF