action #61221
closedopenQA Project (public) - coordination #58184: [saga][epic][use case] full version control awareness within openQA
openQA Project (public) - coordination #45302: [epic] smarter fetchneedles (was: fetchneedles should ensure we are always on a branch (and try to self-repair))
osd: unable to save needles, minion fails with "fatal: Unable to create '/var/lib/openqa/.../needles/.git/index.lock'"
0%
Description
Observation¶
grafana monitoring alert failed:
[osd-admins] [Alerting] Minion Jobs alert
From: Grafana <osd-admin@suse.de>
To: osd-admins@suse.de
Sender: osd-admins <osd-admins-bounces+okurz=suse.de@suse.de>
List-Id: <osd-admins.suse.de>
Date: 19/12/2019 16.05
Note: This is an HTML message. For security reasons, only the raw HTML code is shown. If you trust the sender of this message then you can activate formatted HTML display for this message by clicking here.
*/[Alerting] Minion Jobs alert/*
Too many failed Minion jobs
*Metric name*
*Value*
Failed
21.505
https://openqa.suse.de/minion/jobs?state=failed&offset=0 shows e.g. https://openqa.suse.de/minion/jobs?id=300982 with the details:
{
"args" => [
{
"imagedir" => "",
"imagedistri" => undef,
"imagename" => "partitioning_raid-9.png",
"imageversion" => undef,
"job_id" => 3727183,
"needle_json" => "{\r\n \"area\": [\r\n {\r\n \"height\": 43,\r\n \"ypos\": 725,\r\n \"xpos\": 956,\r\n \"type\": \"match\",\r\n \"width\": 68\r\n },\r\n {\r\n \"type\": \"match\",\r\n \"width\": 20,\r\n \"xpos\": 84,\r\n \"ypos\": 209,\r\n \"height\": 54\r\n }\r\n ],\r\n \"properties\": [],\r\n \"tags\": [\r\n \"ENV-15SP2ORLATER-1\",\r\n \"partitioning_raid-hard_disks-unfolded\",\r\n \"storage-ng\"\r\n ]\r\n}",
"needledir" => "/var/lib/openqa/share/tests/sle/products/sle/needles",
"needlename" => "partitioning_raid-hard_disks-unfolded-icon_scheme-hyperv-20191219",
"overwrite" => undef,
"user_id" => 194
}
],
"attempts" => 1,
"children" => [],
"created" => "2019-12-19T15:28:05.15569Z",
"delayed" => "2019-12-19T15:28:05.15569Z",
"finished" => "2019-12-19T15:28:07.02545Z",
"id" => 300982,
"notes" => {
"gru_id" => 27415232,
"ttl" => 60
},
"parents" => [],
"priority" => 10,
"queue" => "default",
"result" => {
"error" => "<strong>Failed to save partitioning_raid-hard_disks-unfolded-icon_scheme-hyperv-20191219.</strong><br><pre>Unable to add via Git: fatal: Unable to create '/var/lib/openqa/share/tests/sle/products/sle/needles/.git/index.lock': File exists.\n\nAnother git process seems to be running in this repository, e.g.\nan editor opened by 'git commit'. Please make sure all processes\nare terminated then try again. If it still fails, a git process\nmay have crashed in this repository earlier:\nremove the file manually to continue.</pre>"
},
"retried" => undef,
"retries" => 0,
"started" => "2019-12-19T15:28:05.17528Z",
"state" => "failed",
"task" => "save_needle",
"time" => "2019-12-20T06:15:21.83364Z",
"worker" => 294
}
In the needle directory on osd I can see:
geekotest@openqa:~/share/tests/sle/products/sle/needles> git status
On branch master
Your branch is up to date with 'origin/master'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
partitioning_raid-hard_disks-unfolded-icon_scheme-hyperv-20191219.json
partitioning_raid-hard_disks-unfolded-icon_scheme-hyperv-20191219.png
nothing added to commit but untracked files present (use "git add" to track)
so files are created, branch is clean but files are not commited and not pushed.
There are many more failed minion jobs, mainly about "TTL Expired".
Problem¶
- H1: While
fetchneedles-sles
was running the needle commit minion job was running and failing on the already locked directory.
Workaround¶
Commit manually and push.
Updated by okurz about 4 years ago
- Priority changed from Normal to Low
- Parent task set to #45302
Updated by okurz about 4 years ago
- Subject changed from osd: unable to save needles, "fatal: Unable to create '/var/lib/openqa/share/tests/sle/products/sle/needles/.git/index.lock'" to osd: unable to save needles, minion fails with "fatal: Unable to create '/var/lib/openqa/.../needles/.git/index.lock'"
- Target version changed from Ready to future
Updated by okurz about 4 years ago
- Related to action #70774: save_needle Minion tasks fail frequently and needles could get lost added
Updated by okurz almost 4 years ago
- Related to action #89560: Add alert for blocked gitlab account when users are unable to save/commit needles added
Updated by okurz 9 months ago
Just from yesterday https://openqa.suse.de/minion/jobs?id=10638935
error: 'Failed to save import-untrusted-gpg-key-nvidia-compute-9CD0A493D42D0685-2-20240313.Unable
to commit via Git: fatal: unable to write new_index file'