tickets #45614: A couple of openSUSE machines run out of disk space - openSUSE admin - openSUSE Project Management Tool

Custom queries

Events of the openSUSE Heroes
my assigned stuff
obs-admin-tickets
openQA Infrastructure Project
openqa-review - Closed tickets last updated by openqa-review, last 30 days
QA roadmap long-term
QA SLE functional
QA SLE Functional - closed in last 14 days
QA SLE Functional - High, need to be refined
QA SLE Functional - over cycle time median
QA SLE u
QA SLE y
QA tools (tag not necessary in openQA and subprojects)
QA tools tag (tag not necessary in openQA and subprojects; excluding tickets in "Ready" version as they are already on the backlog)
QAC - Backlog
QE tools team - backlog (dev)
QE tools team - backlog (ready issues)
QE tools team - backlog SLA high
QE tools team - backlog SLA immediate
QE tools team - backlog SLA no immediate/urgent in feedback/blocked
QE tools team - backlog SLA normal
QE tools team - backlog SLA urgent
QE tools team - backlog SLO high
QE tools team - backlog SLO normal
QE tools team - backlog SLO urgent
QE tools team - backlog, high-level view (epics and higher)
QE tools team - backlog, non-reactive work, needs parent
QE tools team - backlog, top-level view (all sagas)
QE Tools Team - Beginner
QE tools team - closed within last 14 days
QE tools team - closed within last 60 days
QE tools team - closed yesterday
QE Tools Team - Collaborative Session
QE tools team - due date forecast
QE tools team - exceeding due-date
QE Tools Team - Expert
QE tools team - infrastructure backlog
QE tools team - next - sorted by update time
QE tools team - next issues
QE tools team - non-estimated (unblocked) issues (dev)
QE tools team - non-estimated (unblocked) issues (infra)
QE tools team - ready issues - Workable
QE tools team - ready, not assigned/blocked/low
QE tools team - SLO high forecast
QE tools team - update forecast
QE tools team - updated by priority
QE tools team - what members of the team are working on - Feedback (not-low)
QE Tools Team Backlog By Assignee
Tools Team Retrospective
Tools Team Retrospective (not estimated or assigned)

Actions

Copy link

tickets #45614

closed

A couple of openSUSE machines run out of disk space

Added by Anonymous over 6 years ago. Updated about 6 years ago.

Status:

Closed

Priority:

Normal

Assignee:

tampakrap

Category:

Servers hosted in NBG

Target version:

Start date:

Due date:

% Done:

Estimated time:

Description

Sorry to say, but while debugging a problem with one of the hypervisor
machines, I noticed that some openSUSE machines are running out of disk
space. Namely:

boosters.infra.opensuse.org
mirrordb3.infra.opensuse.org
mirrordb4.infra.opensuse.org
narwal3.infra.opensuse.org
osc-collab.infra.opensuse.org

Please inform the administrators of those boxes, so they can start a
cleanup round.

Another topic:

icc.infra.opensuse.org hangs
narwal2.infra.opensuse.org hangs in maintenance mode (see screen)

Please investigate.

Regards
Lars

Hide closed

Checklist

disk space: boosters
disk space: mirrordb3
disk space: mirrordb4
disk space: narwal3
disk space: osc-collab
down: icc
down: narwal2
down: aedir1
down: aedir2
down: lnt
down: CaaSP cluster (endpoints fail)
down: 101.opensuse.org (CaaSP?)
down: provo-mirror

History
Notes
Property changes

Actions

Copy link

Updated by Anonymous over 6 years ago

Dear sender

I'm out of office until Tuesday, 2019-01-02, and will not read my Email regulary.
In urgent cases, please contact my manager, Roland Haidl rhaidl@suse.com.

You might also contact:

autobuild@suse.de for all questions around Autobuild and the Build Service

With kind regards
Lars Vogdt

--
Lars Vogdt Lars.Vogdt@suse.com

BuildOPS Team Lead - SUSE Linux GmbH - GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer Maxfeldstraße 5, 90409 Nuernberg, Germany - HRB 16746 (AG Nuernberg)

admin@opensuse.org 12/30/18 09:33 >>>

[openSUSE Tracker]
Issue #45614 has been reported by Lars.Vogdt@suse.com.

tickets #45614: A couple of openSUSE machines run out of disk space
https://progress.opensuse.org/issues/45614

Author: Lars.Vogdt@suse.com
Status: New
Priority: Normal
Assignee:
Category:

* Target version: ¶

Sorry to say, but while debugging a problem with one of the hypervisor
machines, I noticed that some openSUSE machines are running out of disk
space. Namely:

boosters.infra.opensuse.org
mirrordb3.infra.opensuse.org
mirrordb4.infra.opensuse.org
narwal3.infra.opensuse.org
osc-collab.infra.opensuse.org

Please inform the administrators of those boxes, so they can start a
cleanup round.

Another topic:

icc.infra.opensuse.org hangs
narwal2.infra.opensuse.org hangs in maintenance mode (see screen)

Please investigate.

Regards
Lars

--
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here: http://progress.opensuse.org/my/account

Actions

Copy link

Updated by tampakrap over 6 years ago

Category set to Servers hosted in NBG
Assignee set to tampakrap

@Lars, thanks a lot for handling the hypervisor issue while everyone was on Christmas break, and also for bringing back the failed VMs, including the very important mirrordb1! Your effort is really appreciated!

As for the rest of the still failed VMs, I'll get to them with a bit of delay though, as I'm about to leave on a business trip for the whole week and I'll be on very limited availability.

A few more VMs that have been reported directly to me as broken are:

aedir[1-2].i.o.o
lnt.i.o.o
the CaaSP cluster (not all of the VMs of the cluster seem to be down though, but the endpoint fails)

Actions

Copy link

Updated by tampakrap over 6 years ago

Private changed from Yes to No

Actions

Copy link

Updated by cboltz over 6 years ago

I have good and bad news:

bad: provo-mirror is also down (no idea why, I'd guess it's unrelated to the NBG hypervisor problems)

good 1: I manually compressed the nginx logs on narwal3 some days ago, so the disk space issue is fixed for now (interestingly, the logs were rotated, but not compressed)

good 2: I'm working on replacing the old narwals with some salt (both the webservers and automated git pull) and hope to have it ready in the next days, so maybe you won't need to spend too much time to fix narwal2 ;-)

I'll also add a checklist to the ticket (one item per server) to make sure nothing gets lost ;-)

Actions

Copy link

Updated by cboltz about 6 years ago

Checklist item changed from to [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [ ] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: provo-mirror

Actions

Copy link

Updated by mcaj about 6 years ago

Checklist item changed from [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [ ] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: provo-mirror to [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [x] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: provo-mirror

FYI I checked the status of the machine lnt.infra.opensuse.org aka lnt.opensuse.org.

The machine was not responding on ping. I found only one message on the serial console output:

[16776824.048003] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [systemd:1]

I was not able to log there via virt-manager. The machine did not react on soft reboot so I had to do
the force reboot.

After the force reboot its seem to be up and running. Also the web https://lnt.opensuse.org/ is working.
But admin of the machine should check logs of the machine.

Martin

Actions

Copy link

Updated by mcaj about 6 years ago

The VM machine icc is broken and reboot does not help.

The is a message from kernel:
Probing EDD (edd=off to disable)... ok

and then this message :

PANIC early exception 0d rip 10:ffffffff810321f5 error 0 cr2 0

Actions

Copy link

Updated by cboltz about 6 years ago

mcaj wrote:

The VM machine icc is broken and reboot does not help.

The is a message from kernel: [...]
PANIC early exception 0d rip 10:ffffffff810321f5 error 0 cr2 0

Wild guess: try booting the previous kernel

Actions

Copy link

Updated by cboltz about 6 years ago

Checklist item changed from [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [x] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: provo-mirror to [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [x] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: 101.opensuse.org (CaaSP?), [ ] down: provo-mirror

101.opensuse.org shows "404 Not Found: Requested route ('101.cf.infra.opensuse.org') does not exist.", added to the checklist

Actions

Copy link

#10

Updated by cboltz about 6 years ago

Checklist item changed from [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [x] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: 101.opensuse.org (CaaSP?), [ ] down: provo-mirror to [ ] disk space: boosters, [ ] disk space: mirrordb3, [ ] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [ ] down: aedir1, [ ] down: aedir2, [x] down: lnt, [ ] down: CaaSP cluster (endpoints fail), [ ] down: 101.opensuse.org (CaaSP?), [x] down: provo-mirror

provo-mirror is back since about 17 hours - and we instantly got ticket #46031 because it's outdated ;-)

Thanks to whoever brought provo-mirror back!

Actions

Copy link

#11

Updated by Anonymous about 6 years ago

Am Fri, 11 Jan 2019 16:54:18 +0000
schrieb admin@opensuse.org:

provo-mirror is back since about 17 hours - and we instantly got
ticket #46031 because it's outdated ;-)

Thanks to whoever brought provo-mirror back!

FYI: provo-mirror had "disk full". Luckily we found someone with big
pockets at SUSE who sponsored some more space (30TB).

The machine is and will provide outdated packages for the weekend
12./13.Jan as we decided to stop updating with latest builds but
instead speeding up the sync of the underlying lvm move process.

provo-mirror should be back on track (and hopefully stay online and
up-to date for a longer time) early next week. Until than, it might be
a good idea to rely on download.opensuse.org to get the latest
packages. For installation media and some (not updated) packages or
repositories, the packages on provo-mirror should be good enough
(that's the reason why we leave it online). Thankfully MirrorBrain
behind download.opensuse.org knows which packages or ISO images can be
used and which not - and will redirect you to other mirrors in case the
files on provo-mirror are outdated.

I hope this explains the situation.

With kind regards,
Lars

Actions

Copy link

#12

Updated by cboltz about 6 years ago

That's the best reason I ever heard for making a server read-only :-)

Actions

Copy link

#13

Updated by tampakrap about 6 years ago

Checklist item changed from to [x] down: CaaSP cluster (endpoints fail)

Actions

Copy link

#14

Updated by tampakrap about 6 years ago

all CaaSP nodes are back up again. Also, the NFS server that k8s uses as storage was also down. I brought it up but it still didn't catch up. Thus cloud foundry and the websites on top of it are down atm

Actions

Copy link

#15

Updated by tampakrap about 6 years ago

Checklist item changed from to [x] disk space: mirrordb3

Actions

Copy link

#16

Updated by tampakrap about 6 years ago

Checklist item changed from to [x] disk space: mirrordb4

Actions

Copy link

#17

Updated by tampakrap about 6 years ago

I marked mirrordb3/4 as done because they are not actually used any more, and they are pending destruction. I'm waiting for darix's ok first

Actions

Copy link

#18

Updated by tampakrap about 6 years ago

Checklist item changed from to [x] down: 101.opensuse.org (CaaSP?)

Actions

Copy link

#19

Updated by tampakrap about 6 years ago

Checklist item changed from to [x] down: aedir1

Actions

Copy link

#20

Updated by tampakrap about 6 years ago

Checklist item changed from to [x] down: aedir2

Actions

Copy link

#21

Updated by tampakrap about 6 years ago

Checklist item changed from to [x] down: icc

Actions

Copy link

#22

Updated by cboltz about 6 years ago

Checklist item changed from [ ] disk space: boosters, [x] disk space: mirrordb3, [x] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [x] down: icc, [ ] down: narwal2, [x] down: aedir1, [x] down: aedir2, [x] down: lnt, [x] down: CaaSP cluster (endpoints fail), [x] down: 101.opensuse.org (CaaSP?), [x] down: provo-mirror to [ ] disk space: boosters, [x] disk space: mirrordb3, [x] disk space: mirrordb4, [x] disk space: narwal3, [ ] disk space: osc-collab, [ ] down: icc, [ ] down: narwal2, [x] down: aedir1, [x] down: aedir2, [x] down: lnt, [x] down: CaaSP cluster (endpoints fail), [x] down: 101.opensuse.org (CaaSP?), [x] down: provo-mirror

icc.o.o still shows the 503 maintenance page :-(

I can ping the VM, so maybe "only" the service is down.

Actions

Copy link

#23

Updated by tampakrap about 6 years ago

Checklist item changed from to [x] down: narwal2

Actions

Copy link

#24

Updated by tampakrap about 6 years ago

Checklist item changed from to [x] disk space: boosters

Actions

Copy link

#25

Updated by tampakrap about 6 years ago

Status changed from New to Closed

closing this one as icc and osc-collab have dedicated maintainers that are aware of the issues already. Anyone feel free to file separate tickets for those

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

openSUSE admin

Tags

Custom queries

tickets #45614

A couple of openSUSE machines run out of disk space

Updated by Anonymous over 6 years ago

* Target version: ¶

Updated by tampakrap over 6 years ago

Updated by tampakrap over 6 years ago

Updated by cboltz over 6 years ago

Updated by cboltz about 6 years ago

Updated by mcaj about 6 years ago

Updated by mcaj about 6 years ago

Updated by cboltz about 6 years ago

Updated by cboltz about 6 years ago

Updated by cboltz about 6 years ago

Updated by Anonymous about 6 years ago

Updated by cboltz about 6 years ago

Updated by tampakrap about 6 years ago

Updated by tampakrap about 6 years ago

Updated by tampakrap about 6 years ago

Updated by tampakrap about 6 years ago

Updated by tampakrap about 6 years ago

Updated by tampakrap about 6 years ago

Updated by tampakrap about 6 years ago

Updated by tampakrap about 6 years ago

Updated by tampakrap about 6 years ago

Updated by cboltz about 6 years ago

Updated by tampakrap about 6 years ago

Updated by tampakrap about 6 years ago

Updated by tampakrap about 6 years ago