action #100712
closedInvestigate what broke git checkouts on o3
0%
Description
Pretty much all git repos on o3 broke over night:
fatal: .git/index: index file smaller than expected
was printed for /var/lib/openqa/tests/{opensuse,openqa,obs}/
as well as /var/lib/openqa/tests/opensuse/products/opensuse/needles`.
The .git/index
file had 0 size for them.
The index file of the needles repo had a birthtime of 2021-10-09 22:41:05.923862229 +0000
, which conincides with a cron run of fetchneedles
:
From geekotest@ariel.suse-dmz.opensuse.org Sat Oct 9 22:41:09 2021
Return-Path: <geekotest@ariel.suse-dmz.opensuse.org>
X-Original-To: geekotest
Delivered-To: geekotest@ariel.suse-dmz.opensuse.org
Received: by ariel.suse-dmz.opensuse.org (Postfix, from userid 493)
id 66A5C18B5F; Sat, 9 Oct 2021 22:41:08 +0000 (UTC)
From: "(Cron Daemon)" <geekotest@ariel.suse-dmz.opensuse.org>
To: geekotest@ariel.suse-dmz.opensuse.org
Subject: Cron <geekotest@ariel> env updateall=1 force=1 /usr/share/openqa/script/fetchneedles
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
Precedence: bulk
X-Cron-Env: <XDG_SESSION_ID=19181>
X-Cron-Env: <XDG_RUNTIME_DIR=/run/user/493>
X-Cron-Env: <DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/493/bus>
X-Cron-Env: <XDG_SESSION_TYPE=unspecified>
X-Cron-Env: <XDG_SESSION_CLASS=background>
X-Cron-Env: <LANG=en_US.UTF-8>
X-Cron-Env: <LC_CTYPE=en_US.UTF-8>
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/var/lib/openqa>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=geekotest>
X-Cron-Env: <USER=geekotest>
Message-Id: <20211009224109.66A5C18B5F@ariel.suse-dmz.opensuse.org>
Date: Sat, 9 Oct 2021 22:41:04 +0000 (UTC)
fatal: It seems that there is already a rebase-merge directory, and
I wonder if you are in the middle of another rebase. If that is the
case, please try
git rebase (--continue | --abort | --skip)
If that is not the case, please
rm -fr ".git/rebase-merge"
and run me again. I am stopping in case you still have something
valuable there.
Use force=1 to discard uncommitted changes before rebasing
From geekotest@ariel.suse-dmz.opensuse.org Sun Oct 10 08:31:04 2021
Return-Path: <geekotest@ariel.suse-dmz.opensuse.org>
X-Original-To: geekotest
Delivered-To: geekotest@ariel.suse-dmz.opensuse.org
Received: by ariel.suse-dmz.opensuse.org (Postfix, from userid 493)
id D457D18B5F; Sun, 10 Oct 2021 08:31:04 +0000 (UTC)
From: "(Cron Daemon)" <geekotest@ariel.suse-dmz.opensuse.org>
To: geekotest@ariel.suse-dmz.opensuse.org
Subject: Cron <geekotest@ariel> env updateall=1 force=1 /usr/share/openqa/script/fetchneedles
Content-Type: text/plain; charset=UTF-8
Auto-Submitted: auto-generated
Precedence: bulk
X-Cron-Env: <XDG_SESSION_ID=1>
X-Cron-Env: <XDG_RUNTIME_DIR=/run/user/493>
X-Cron-Env: <DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/493/bus>
X-Cron-Env: <XDG_SESSION_TYPE=unspecified>
X-Cron-Env: <XDG_SESSION_CLASS=background>
X-Cron-Env: <LANG=en_US.UTF-8>
X-Cron-Env: <LC_CTYPE=en_US.UTF-8>
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/var/lib/openqa>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=geekotest>
X-Cron-Env: <USER=geekotest>
Message-Id: <20211010083104.D457D18B5F@ariel.suse-dmz.opensuse.org>
Date: Sun, 10 Oct 2021 08:31:04 +0000 (UTC)
fatal: .git/index: index file smaller than expected
fatal: .git/index: index file smaller than expected
fatal: .git/index: index file smaller than expected
Use force=1 to discard uncommitted changes before rebasing
fatal: .git/index: index file smaller than expected
fatal: .git/index: index file smaller than expected
fatal: It seems that there is already a rebase-merge directory, and
I wonder if you are in the middle of another rebase. If that is the
case, please try
git rebase (--continue | --abort | --skip)
If that is not the case, please
rm -fr ".git/rebase-merge"
and run me again. I am stopping in case you still have something
valuable there.
Use force=1 to discard uncommitted changes before rebasing
fatal: .git/index: index file smaller than expected
fatal: .git/index: index file smaller than expected
fatal: .git/index: index file smaller than expected
Use force=1 to discard uncommitted changes before rebasing
I fixed those repos manually by doing git reset --quiet; git status; git pull
.
The openqa
tests repo needed a manual rebase to deal with a conflict.
The fatal: It seems that there is already a rebase-merge directory
error is probably a red herring, because it is only printed once while multiple repos are broken, it's still printed after the index purge and has been going on since Wed, 14 Jul 2021 17:30:10 +0000 (UTC)
according to root's mailbox.
All of the error logs ended up in /var/spool/mail/root
, which probably should be archived:
ariel:~ # ll -h /var/spool/mail/root
-rw------- 1 root root 508M Oct 11 08:32 /var/spool/mail/root
ariel:~ # wc -l /var/spool/mail/root
10018188 /var/spool/mail/root
Updated by okurz about 3 years ago
- Assignee set to okurz
- Target version set to Ready
@fvogt as I understood you already fixed the individual repos. In the past we have seen similar errors and I always tried to improve fetchneedles one step at a time. There had been no recent change in fetchneedles however. There was just https://github.com/os-autoinst/openQA/commit/986bac2a9b8a42cd8cd673f061d5407ccd893717#diff-0bb3fc4e32c66e0e4e124d1288c9e57e8e32f17d020d88fc2f085693996814f6 in August in past months so after the incident and also it looks it can not possibly cause such error.
As you fixed the current situation should we still treat it as "Urgent"?
favogt wrote:
All of the error logs ended up in
/var/spool/mail/root
, which probably should be archived
What do you mean with "should be archived"?
Updated by favogt about 3 years ago
iforster has the suspicion that this might be related to/caused by the recent NetApp failure which brought down some services. I asked infra about that, maybe it fits.
It's a bit weird though that it would only hit .git/index
files, and that in multiple subsequent fetch runs.
okurz wrote:
@fvogt as I understood you already fixed the individual repos. In the past we have seen similar errors
Also corrupt git files? Merge conflicts and similar probably aren't related.
and I always tried to improve fetchneedles one step at a time. There had been no recent change in fetchneedles however. There was just https://github.com/os-autoinst/openQA/commit/986bac2a9b8a42cd8cd673f061d5407ccd893717#diff-0bb3fc4e32c66e0e4e124d1288c9e57e8e32f17d020d88fc2f085693996814f6 in August in past months so after the incident and also it looks it can not possibly cause such error.
Yeah, I couldn't find anything either.
As you fixed the current situation should we still treat it as "Urgent"?
Without knowing the cause it's not unlikely that it happens again.
favogt wrote:
All of the error logs ended up in
/var/spool/mail/root
, which probably should be archivedWhat do you mean with "should be archived"?
Stored elsewhere in compressed form to keep it, but make it easier to inspect for future events.
Updated by favogt about 3 years ago
- Status changed from New to Closed
Got an answer:
yes, very likely related. From geekotest@ariel.suse-dmz.opensuse.org Sat Oct 9 22:41:09 2021 is within a few seconds of when the trouble on the host started
So let's close this for now, and hope it doesn't happen again.