Project

General

Profile

Actions

tickets #114851

closed

mirrorcache.o.o down (workaround in place)

Added by cboltz over 1 year ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Mirrors
Target version:
-
Start date:
2022-07-31
Due date:
% Done:

90%

Estimated time:

Description

At around 15:00 CEST today, I noticed that mirrorcache.o.o is down.

The server is pingable and in theory allows ssh logins, but asks for a password I don't know (ssh key doesn't get accepted).

However, mirrorcache isn't reachable:

[16:31:24] <acidsys> "failed to start mirrorcache webapp" but reboot did not help

Also, the server doesn't connect to the saltmaster, which prevented using the "salt backdoor" to get access to the server.

As a workaround, I changed the haproxy config so that requests to mirrorcache.o.o now get routed to mirrorcache-eu.i.o.o. (Please revert that after fixing the server.)

Bonus points if you

  • make sure the server connects to the saltmaster
  • apply at least the basic salt highstate so that ssh logins etc. work
  • add the root password of all mirrorcache* machines to the infra pass (gitlab.infra.opensuse.org/infra/pass.git)
Actions #1

Updated by cboltz over 1 year ago

  • Private changed from Yes to No
Actions #2

Updated by cboltz over 1 year ago

The main part (mirrorcache) is fixed, and I reverted the workarounds in the haproxy config yesterday.

However, the things listed under "bonus points" are still not fixed - which means nobody will be able to login (and be able to fix things) if the server fails next time.

Actions #3

Updated by andriinikitin over 1 year ago

  • Status changed from New to Workable
Actions #4

Updated by andriinikitin over 1 year ago

  • % Done changed from 0 to 90

cboltz wrote:

At around 15:00 CEST today, I noticed that mirrorcache.o.o is down.

The problem was that WebUI couldn't connect to database at startup after reboot. This should be fixed by https://github.com/openSUSE/MirrorCache/commit/278719a6753
I am not sure if I should put retry there: I have impression that it may cause some damage to DB in extreme cases, but maybe will add it at some point.

Bonus points if you

  • make sure the server connects to the saltmaster

It should be fixed now. (I was testing salt on mirrorcache some time ago, so minion was configured to local only).

  • apply at least the basic salt highstate so that ssh logins etc. work

I am not sure which exact state should be applied here, but at least cmd.run does work, so I guess it can be marked as complete.

  • add the root password of all mirrorcache* machines to the infra pass (gitlab.infra.opensuse.org/infra/pass.git)

I don't know current root password. But since it is now possible to add ssh key with salt - this can be skipped? (Or I will appreciate if somebody does it or instruct me how to do it).

I will mark the ticket as resolved in few days unless any comments.

Actions #5

Updated by cboltz over 1 year ago

andriinikitin wrote:

cboltz wrote:

At around 15:00 CEST today, I noticed that mirrorcache.o.o is down.

The problem was that WebUI couldn't connect to database at startup after reboot. This should be fixed by https://github.com/openSUSE/MirrorCache/commit/278719a6753

I guess that means you have mysql/mariadb running locally?

I am not sure if I should put retry there: I have impression that it may cause some damage to DB in extreme cases, but maybe will add it at some point.

I don't know the details, but in general, I can't imagine that retrying to connect to the database would cause damage.

Bonus points if you

  • make sure the server connects to the saltmaster

It should be fixed now. (I was testing salt on mirrorcache some time ago, so minion was configured to local only).

Confirmed, test.ping works now :-)

  • apply at least the basic salt highstate so that ssh logins etc. work

I am not sure which exact state should be applied here, but at least cmd.run does work, so I guess it can be marked as complete.

cmd.run (aka "the salt backdoor") is a workaround, and I added my ssh key to more than one machine this way ;-) - but it's far from perfect.

role.base makes sure that for example ssh logins as user (via ldap/freeipa) work, and that sudo works for users in the wheel group. It also does some other useful config we want everywhere, see the salt repo for details.

So - please run a highstate (maybe first with test=True) to apply these things. (Note: This needs the pillar/id/ files, so you'll have to wait until your MR gets merged.)

  • add the root password of all mirrorcache* machines to the infra pass (gitlab.infra.opensuse.org/infra/pass.git)

I don't know current root password. But since it is now possible to add ssh key with salt - this can be skipped? (Or I will appreciate if somebody does it or instruct me how to do it).

Since you can login as root via ssh, you could simply set a new password ;-)

Actions #6

Updated by pjessen over 1 year ago

cboltz wrote:

andriinikitin wrote:

I am not sure if I should put retry there: I have impression that it may cause some damage to DB in extreme cases, but maybe will add it at some point.

I don't know the details, but in general, I can't imagine that retrying to connect to the database would cause damage.

Me neither, surely every application will always have some reconnect code.

Actions #7

Updated by crameleon 8 months ago

  • Status changed from Workable to Resolved

Doesn't sound like anything is left to do here.

Actions

Also available in: Atom PDF