tickets #45614
closedA couple of openSUSE machines run out of disk space
Added by Anonymous almost 6 years ago. Updated almost 6 years ago.
0%
Description
Hi
Sorry to say, but while debugging a problem with one of the hypervisor
machines, I noticed that some openSUSE machines are running out of disk
space. Namely:
- boosters.infra.opensuse.org
- mirrordb3.infra.opensuse.org
- mirrordb4.infra.opensuse.org
- narwal3.infra.opensuse.org
- osc-collab.infra.opensuse.org
Please inform the administrators of those boxes, so they can start a
cleanup round.
Another topic:
- icc.infra.opensuse.org hangs
- narwal2.infra.opensuse.org hangs in maintenance mode (see screen)
Please investigate.
Regards
Lars
Checklist
- disk space: boosters
- disk space: mirrordb3
- disk space: mirrordb4
- disk space: narwal3
- disk space: osc-collab
- down: icc
- down: narwal2
- down: aedir1
- down: aedir2
- down: lnt
- down: CaaSP cluster (endpoints fail)
- down: 101.opensuse.org (CaaSP?)
- down: provo-mirror
Updated by Anonymous almost 6 years ago
Dear sender
I'm out of office until Tuesday, 2019-01-02, and will not read my Email regulary.
In urgent cases, please contact my manager, Roland Haidl rhaidl@suse.com.
You might also contact:
- autobuild@suse.de for all questions around Autobuild and the Build Service
With kind regards
Lars Vogdt
--
Lars Vogdt Lars.Vogdt@suse.com
- BuildOPS Team Lead - SUSE Linux GmbH - GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer Maxfeldstraße 5, 90409 Nuernberg, Germany - HRB 16746 (AG Nuernberg)
admin@opensuse.org 12/30/18 09:33 >>>
[openSUSE Tracker]
Issue #45614 has been reported by Lars.Vogdt@suse.com.
tickets #45614: A couple of openSUSE machines run out of disk space
https://progress.opensuse.org/issues/45614
- Author: Lars.Vogdt@suse.com
- Status: New
- Priority: Normal
- Assignee:
- Category:
* Target version: ¶
Hi
Sorry to say, but while debugging a problem with one of the hypervisor
machines, I noticed that some openSUSE machines are running out of disk
space. Namely:
- boosters.infra.opensuse.org
- mirrordb3.infra.opensuse.org
- mirrordb4.infra.opensuse.org
- narwal3.infra.opensuse.org
- osc-collab.infra.opensuse.org
Please inform the administrators of those boxes, so they can start a
cleanup round.
Another topic:
- icc.infra.opensuse.org hangs
- narwal2.infra.opensuse.org hangs in maintenance mode (see screen)
Please investigate.
Regards
Lars
--
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here: http://progress.opensuse.org/my/account
Updated by tampakrap almost 6 years ago
- Category set to Servers hosted in NBG
- Assignee set to tampakrap
@Lars, thanks a lot for handling the hypervisor issue while everyone was on Christmas break, and also for bringing back the failed VMs, including the very important mirrordb1! Your effort is really appreciated!
As for the rest of the still failed VMs, I'll get to them with a bit of delay though, as I'm about to leave on a business trip for the whole week and I'll be on very limited availability.
A few more VMs that have been reported directly to me as broken are:
- aedir[1-2].i.o.o
- lnt.i.o.o
- the CaaSP cluster (not all of the VMs of the cluster seem to be down though, but the endpoint fails)
Updated by cboltz almost 6 years ago
I have good and bad news:
bad: provo-mirror is also down (no idea why, I'd guess it's unrelated to the NBG hypervisor problems)
good 1: I manually compressed the nginx logs on narwal3 some days ago, so the disk space issue is fixed for now (interestingly, the logs were rotated, but not compressed)
good 2: I'm working on replacing the old narwals with some salt (both the webservers and automated git pull) and hope to have it ready in the next days, so maybe you won't need to spend too much time to fix narwal2 ;-)
I'll also add a checklist to the ticket (one item per server) to make sure nothing gets lost ;-)
Updated by cboltz almost 6 years ago
- Checklist item changed from to [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [ ] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: provo-mirror
Updated by mcaj almost 6 years ago
- Checklist item changed from [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [ ] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: provo-mirror to [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [x] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: provo-mirror
FYI I checked the status of the machine lnt.infra.opensuse.org aka lnt.opensuse.org.
The machine was not responding on ping. I found only one message on the serial console output:
[16776824.048003] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd:1]
I was not able to log there via virt-manager. The machine did not react on soft reboot so I had to do
the force reboot.
After the force reboot its seem to be up and running. Also the web https://lnt.opensuse.org/ is working.
But admin of the machine should check logs of the machine.
Martin
Updated by mcaj almost 6 years ago
The VM machine icc is broken and reboot does not help.
The is a message from kernel:
Probing EDD (edd=off to disable)... ok
and then this message :
PANIC early exception 0d rip 10:ffffffff810321f5 error 0 cr2 0
Updated by cboltz almost 6 years ago
mcaj wrote:
The VM machine icc is broken and reboot does not help.
The is a message from kernel: [...]
PANIC early exception 0d rip 10:ffffffff810321f5 error 0 cr2 0
Wild guess: try booting the previous kernel
Updated by cboltz almost 6 years ago
- Checklist item changed from [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [x] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: provo-mirror to [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [x] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: 101.opensuse.org (CaaSP?), [ ] down: provo-mirror
101.opensuse.org shows "404 Not Found: Requested route ('101.cf.infra.opensuse.org') does not exist.", added to the checklist
Updated by cboltz almost 6 years ago
- Checklist item changed from [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [x] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: 101.opensuse.org (CaaSP?), [ ] down: provo-mirror to [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [x] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: 101.opensuse.org (CaaSP?), [x] down: provo-mirror
provo-mirror is back since about 17 hours - and we instantly got ticket #46031 because it's outdated ;-)
Thanks to whoever brought provo-mirror back!
Updated by Anonymous almost 6 years ago
Am Fri, 11 Jan 2019 16:54:18 +0000
schrieb admin@opensuse.org:
provo-mirror is back since about 17 hours - and we instantly got
ticket #46031 because it's outdated ;-)Thanks to whoever brought provo-mirror back!
FYI: provo-mirror had "disk full". Luckily we found someone with big
pockets at SUSE who sponsored some more space (30TB).
The machine is and will provide outdated packages for the weekend
12./13.Jan as we decided to stop updating with latest builds but
instead speeding up the sync of the underlying lvm move process.
provo-mirror should be back on track (and hopefully stay online and
up-to date for a longer time) early next week. Until than, it might be
a good idea to rely on download.opensuse.org to get the latest
packages. For installation media and some (not updated) packages or
repositories, the packages on provo-mirror should be good enough
(that's the reason why we leave it online). Thankfully MirrorBrain
behind download.opensuse.org knows which packages or ISO images can be
used and which not - and will redirect you to other mirrors in case the
files on provo-mirror are outdated.
I hope this explains the situation.
With kind regards,
Lars
Updated by cboltz almost 6 years ago
That's the best reason I ever heard for making a server read-only :-)
Updated by tampakrap almost 6 years ago
- Checklist item changed from to [x] down: CaaSP cluster (endpoints fail)
Updated by tampakrap almost 6 years ago
all CaaSP nodes are back up again. Also, the NFS server that k8s uses as storage was also down. I brought it up but it still didn't catch up. Thus cloud foundry and the websites on top of it are down atm
Updated by tampakrap almost 6 years ago
- Checklist item changed from to [x] disk space: mirrordb3
Updated by tampakrap almost 6 years ago
- Checklist item changed from to [x] disk space: mirrordb4
Updated by tampakrap almost 6 years ago
I marked mirrordb3/4 as done because they are not actually used any more, and they are pending destruction. I'm waiting for darix's ok first
Updated by tampakrap almost 6 years ago
- Checklist item changed from to [x] down: 101.opensuse.org (CaaSP?)
Updated by tampakrap almost 6 years ago
- Checklist item changed from to [x] down: aedir1
Updated by tampakrap almost 6 years ago
- Checklist item changed from to [x] down: aedir2
Updated by tampakrap almost 6 years ago
- Checklist item changed from to [x] down: icc
Updated by cboltz almost 6 years ago
- Checklist item changed from [ ] disk space: boosters, [x] disk space: mirrordb3, [x] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [x] down: icc, [ ] down: narwal2, [x] down: aedir1, [x] down: aedir2, [x] down: lnt, [x] down: CaaSP cluster (endpoints fail), [x] down: 101.opensuse.org (CaaSP?), [x] down: provo-mirror to [ ] disk space: boosters, [x] disk space: mirrordb3, [x] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [x] down: aedir1, [x] down: aedir2, [x] down: lnt, [x] down: CaaSP cluster (endpoints fail), [x] down: 101.opensuse.org (CaaSP?), [x] down: provo-mirror
icc.o.o still shows the 503 maintenance page :-(
I can ping the VM, so maybe "only" the service is down.
Updated by tampakrap almost 6 years ago
- Checklist item changed from to [x] down: narwal2
Updated by tampakrap almost 6 years ago
- Checklist item changed from to [x] disk space: boosters
Updated by tampakrap almost 6 years ago
- Status changed from New to Closed
closing this one as icc and osc-collab have dedicated maintainers that are aware of the issues already. Anyone feel free to file separate tickets for those