General

Profile

nicksinger

  • Email: nsinger@suse.com
  • Role: Sysadmin
  • Connect profile: 1
  • Registered on: 2017-03-01
  • Last connection: 2017-03-01

Issues

Projects

Activity

2020-12-04

09:25 openQA Infrastructure action #78010: unreliable reboots on openqaworker3, likely due do openqa_nvme_format (was: [alert] PROBLEM Host Alert: openqaworker3.suse.de is DOWN)
So I diffed with what fvogt did:
~~~ diff
diff --git a/openqa/nvme_store/openqa_nvme_create.service b/openqa/nvme...

2020-12-03

14:18 openQA Infrastructure action #68053: powerqaworker-qam-1 fails to come up on reboot
nicksinger wrote:
> I've created PPC ticket #179273 to address the "Permanent IOA failure" (I/O adapter seems broken...
14:16 openQA Infrastructure action #80688 (Resolved): Upgrade IO firmware for powerqaworker-qam-1
Also see https://progress.opensuse.org/issues/68053#note-26
To resolve [RT-PPC #179273] we got asked to upgrade the ...
13:08 openQA Project coordination #65271: [epic] Various feature requests
* re-read workers.ini if you send a SIGHUP to openqa-worker service
* use this to make complete automated salt-de...
12:41 openQA Infrastructure action #80656: OSD deployment failed at 2020-12-02 because 'malbec.arch.suse.de' is down
```
[ 41.796572] Btrfs loaded
[ 41.797425] BTRFS: device fsid ae18adf5-d27e-4fa1-93a1-6ab55263c29d devid 1 tran...

2020-12-02

10:05 openQA Infrastructure action #80656 (In Progress): OSD deployment failed at 2020-12-02 because 'malbec.arch.suse.de' is down
10:05 openQA Infrastructure action #80656: OSD deployment failed at 2020-12-02 because 'malbec.arch.suse.de' is down
Seems like a more severe issue. I can't find the systems boot disk at all:
```
/ # blkid
/dev/sdm1: UUID="6c7adf...
09:26 openQA Infrastructure action #80656: OSD deployment failed at 2020-12-02 because 'malbec.arch.suse.de' is down
I'm not sure how the machine booted previously. I *assume* we booted PXE and from there "timed out" into "boot from H...
07:35 openQA Infrastructure action #80656: OSD deployment failed at 2020-12-02 because 'malbec.arch.suse.de' is down
I'm currently in the process of recovering the machine. Afterwards I will re-add the salt-key on OSD

2020-11-30

13:59 openQA Infrastructure action #78010: unreliable reboots on openqaworker3, likely due do openqa_nvme_format (was: [alert] PROBLEM Host Alert: openqaworker3.suse.de is DOWN)
I've rejected the salt-key for now on OSD to prevent automatic startup of workers. What I found while booting is that...

Also available in: Atom