action #70774
coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
coordination #96263: [epic] Exclude certain Minion tasks from "Too many Minion job failures alert" alert
save_needle Minion tasks fail frequently
0%
Description
Observation¶
The save_needle
Minion task fails frequently on OSD and also sometimes on o3.
This can be observed using the following query parameters: https://openqa.suse.de/minion/jobs?state=failed&offset=0&task=save_needle
I'm going to remove most of these jobs to calm down the alert but right now 24 jobs have piled up over 2 month. However, the problem actually exists longer than 2 month but the failures have been manually cleaned up so far.
The problem here is always that the Git working tree is in a state which can not be handled by the task:
1.
"result" => { "error" => "<strong>Failed to save addon_products-module-dev-tools-pvm-20200805.</strong><br><pre>Unable to commit via Git: On branch master\nYour branch is up to date with 'origin/master'.\n\nnothing to commit, working tree clean</pre>" },
2.
"result" => { "error" => "<strong>Failed to save manually_add_profile-AppArmor-Chose-a-program-to-generate-a-profile-20200827.</strong><br><pre>Unable to reset repository to origin/master: error: cannot rebase: Your index contains uncommitted changes.\nerror: Please commit or stash them.</pre>" },
Suggestions¶
It would be useful if the task would be able to handle the problematic situations itself instead of requiring manual intervention. Note that the delete_needle
task (which shares the same Git code) is also affected. We have likely less problems there because that task is not executed that often.
Problematic situations¶
- No diff has been produced which could be committed: Maybe that's simply when there's no actual change and we can simply return early in that case.
- The Git directory contains uncommitted changes: We could save these changes on a new branch before rebasing.
- We can not push the new commit because in the meantime new commits have been pushed to the remote from elsewhere: Just repeat the procedure.
- The fetch needles script is interfering.
Related issues
History
#1
Updated by mkittler almost 2 years ago
- Related to action #33745: Improve handling external Git repositories (for needles) added
#2
Updated by okurz almost 2 years ago
- Target version set to Ready
#3
Updated by mkittler almost 2 years ago
- Tags set to alert
#4
Updated by okurz over 1 year ago
- Description updated (diff)
- Category set to Feature requests
I think you once already proposed a solution to "repair" the git state. Can you reference that again so that we can think about what we can take from that?
#5
Updated by okurz over 1 year ago
- Related to action #61221: osd: unable to save needles, minion fails with "fatal: Unable to create '/var/lib/openqa/.../needles/.git/index.lock'" added
#6
Updated by okurz over 1 year ago
- Target version changed from Ready to future
#7
Updated by Xiaojing_liu over 1 year ago
- Has duplicate action #75070: save_needle minion task fails because "Your branch is ahead of 'origin/master'" added
#8
Updated by mkittler 10 months ago
- Related to coordination #96263: [epic] Exclude certain Minion tasks from "Too many Minion job failures alert" alert added
#9
Updated by okurz 10 months ago
- Related to action #98499: [alert] web UI: Too many Minion job failures alert size:S added
#11
Updated by cdywan 7 months ago
From yesterday:
---
args:
- commit_message: ''
imagedir: ''
imagedistri: ''
imagename: hexchat-23.png
imageversion: ''
job_id: 7760510
needle_json: "{\r\n \"area\": [\r\n {\r\n \"ypos\": 181,\r\n \"type\":
\"match\",\r\n \"xpos\": 335,\r\n \"click_point\": {\r\n \"ypos\":
18,\r\n \"xpos\": 119\r\n },\r\n \"height\": 34,\r\n \"width\":
180\r\n }\r\n ],\r\n \"properties\": [],\r\n \"tags\": [\r\n \"hexchat-nick-bernhard\"\r\n
\ ]\r\n}"
needledir: /var/lib/openqa/share/tests/sle/products/sle/needles
needlename: hexchat-nick-bernhard--20211130
overwrite: '1'
user_id: 175
attempts: 1
children: []
created: 2021-11-30T11:20:59.50934Z
delayed: 2021-11-30T11:20:59.50934Z
expires: 2021-11-30T11:21:59.50934Z
finished: 2021-11-30T11:21:01.01916Z
id: 3536401
lax: 0
notes:
gru_id: 30650350
parents: []
priority: 20
queue: default
result:
error: "<strong>Failed to save hexchat-nick-bernhard--20211130.</strong><br><pre>Unable
to commit via Git: On branch master\nYour branch is up to date with 'origin/master'.\n\nUntracked
files:\n (use \"git add <file>...\" to include in what will be committed)\n\tnautilus-1-20211102.json\n\tnautilus-1-20211102.png\n\tseahorse_sshkey-seahorse-display-sshkey-20211022.json\n\tseahorse_sshkey-seahorse-display-sshkey-20211022.png\n\nnothing
added to commit but untracked files present (use \"git add\" to track)</pre>"
retried: ~
retries: 0
started: 2021-11-30T11:20:59.51323Z
state: failed
task: save_needle
time: 2021-12-01T10:16:52.2364Z
worker: 575
The error is this one:
Failed to save hexchat-nick-bernhard--20211130.
Unable to commit via Git: On branch master Your branch is up to date with 'origin/master'. Untracked files: (use "git add <file>..." to include in what will be committed) nautilus-1-20211102.json nautilus-1-20211102.png seahorse_sshkey-seahorse-display-sshkey-20211022.json seahorse_sshkey-seahorse-display-sshkey-20211022.png nothing added to commit but untracked files present (use "git add" to track)
#12
Updated by cdywan 13 days ago
We have some very similar looking new cases:
<strong>Failed to save foreground-winget-install-20220621.</strong><br><pre>Unable to commit via Git: On branch master\nYour branch is up to date with 'origin/master'.\n\nUntracked files:\n (use \"git add <file>...\" to include in what will be committed)\n\tfirefox-private-facebook-20220412.json\n\tfirefox-private-facebook-20220412.png\n\tnautilus-1-20211102.json\n\tnautilus-1-20211102.png\n\tseahorse_sshkey-seahorse-display-sshkey-20211022.json\n\tseahorse_sshkey-seahorse-display-sshkey-20211022.png\n\tsystem-indicator-20220511.json\n\tsystem-indicator-20220511.png\n\tyast2_lan_hostname_tab-20220615.json\n\tyast2_lan_hostname_tab-20220615.png\n\nnothing added to commit but untracked files present (use \"git add\" to track)\n</pre>
And another one with a different error:
<strong>Failed to save foreground-winget-install-20220621.</strong><br><pre>Unable to commit via Git: fatal: Unable to create '/var/lib/openqa/share/tests/sle/products/sle/needles/.git/index.lock': File exists. Another git process seems to be running in this repository, e.g. an editor opened by 'git commit'. Please make sure all processes are terminated then try again. If it still fails, a git process may have crashed in this repository earlier: remove the file manually to continue.</pre>