Project

General

Profile

Actions

tickets #26832

closed

openQA PowerPC workers are Offline since 5 days

Added by michel_mno over 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
OBS
Target version:
-
Start date:
Due date:
% Done:

100%

Estimated time:

Description

Hi there,
openQA PowerPC workers are Offline since 5 days (1)

Could somebody restart them ?

(1) https://openqa.opensuse.org/admin/workers

power8:1 Offline
power8:10 Offline
power8:11 Offline
power8:12 Offline
power8:13 Offline
power8:14 Offline
power8:15 Offline
power8:16 Offline
power8:2 Offline
power8:3 Offline
power8:4 Offline
power8:5 Offline
power8:6 Offline
power8:7 Offline
power8:8 Offline

power8:9 Offline

Worker power8:1
Host: power8
Instance: 1
Seen: 5 days ago

===

--
Michel Normand


Related issues 1 (0 open1 closed)

Copied to openSUSE admin - tickets #38789: time on progress.o.o is off by 20 minutesClosedokurz2018-07-24

Actions
Actions #1

Updated by michel_mno over 6 years ago

as per openQA log somebody restarted them an hour ago:


about an hour ago k0da worker_register { "host": "power8", "id": 17, "instance": "5", "caps": { "cpu_arch": "ppc64le", "mem_max": "130609", "cpu_modelname": "POWER8E (raw), altivec supported", "cpu_opmode": null, "worker_class": "qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload" } }
about an hour ago k0da worker_register { "id": 14, "instance": "2", "caps": { "cpu_arch": "ppc64le", "mem_max": "130609", "cpu_modelname": "POWER8E (raw), altivec supported", "cpu_opmode": null, "worker_class": "qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload" }, "host": "power8" }
about an hour ago k0da worker_register { "instance": "4", "caps": { "worker_class": "qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload", "cpu_opmode": null, "mem_max": "130609", "cpu_modelname": "POWER8E (raw), altivec supported", "cpu_arch": "ppc64le" }, "id": 16, "host": "power8" }
about an hour ago k0da worker_register { "id": 18, "instance": "6", "caps": { "cpu_arch": "ppc64le", "mem_max": "130609", "cpu_modelname": "POWER8E (raw), altivec supported", "worker_class": "qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload", "cpu_opmode": null }, "host": "power8" }
about an hour ago k0da worker_register { "host": "power8", "instance": "3", "caps": { "cpu_arch": "ppc64le", "mem_max": "130609", "cpu_modelname": "POWER8E (raw), altivec supported", "worker_class": "qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload", "cpu_opmode": null }, "id": 15 }
about an hour ago k0da worker_register { "id": 13, "caps": { "cpu_opmode": null, "worker_class": "qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload", "cpu_arch": "ppc64le", "mem_max": "130609", "cpu_modelname": "POWER8E (raw), altivec supported" }, "instance": "1", "host": "power8" }

Actions #2

Updated by Anonymous over 6 years ago

  • Category set to OBS
  • Status changed from New to Closed
  • Assignee set to Anonymous
  • % Done changed from 0 to 100

Sorry for the delay - "someone" was needed to find the power button ;-)

=> will create a new issue requesting monitoring of these machines now. Might be that I'm coming back to you with the question on "how to do this" ;)

Actions #3

Updated by michel_mno over 6 years ago

lrupp, could you switch the issue to Public ? I am not allowed to do it.

Actions #4

Updated by TBro over 6 years ago

  • Private changed from Yes to No

it's public now.
cheers,
Thorsten

Actions #5

Updated by michel_mno almost 6 years ago

Again same problem with PowerPC workers not active since 8 days.
all PowerPC workers are offline and TW snapshot 20180608 is stuck waiting for workers.
https://openqa.opensuse.org/tests/overview?distri=opensuse&version=Tumbleweed&build=20180608&groupid=4

question was asked on irc two times, but not answer; could somebody handle it ?

irc opensuse-factory extract:
juin 15 15:11:59 <maxlin>       just out of curiosity, do we have any problem on power8 openqa worker? I asking because powerpc jobs were scheduled 5 days and we have no power worker available
juin 15 15:12:08 <maxlin>       on o3
...
june 18 11:50:28 <michel_mno>   Hello there, I am just reconnecting to irc,  was there an update about the reason why  PowerPC workers are not active for o3 since at least 8 days ? https://openqa.opensuse.org/admin/workers/13
Actions #6

Updated by okurz almost 6 years ago

  • Status changed from Closed to Workable
Actions #7

Updated by okurz almost 6 years ago

From the logs on power8:

$ journalctl -u openqa-worker@1.service
Jun 15 13:26:29 power8 worker[18252]: [error] ignoring server - server refused with code 403: {"error":"timestamp mismatch","error_status":403}

And the time is off by about 5 minutes. ntpd is running but reporting problems to synchronize team. Someone else from heroes that knows how the machines within the openSUSE network should synchronize their time?

Actions #8

Updated by pjessen almost 6 years ago

okurz wrote:

And the time is off by about 5 minutes. ntpd is running but reporting problems to synchronize team. Someone else from heroes that knows how the machines within the openSUSE network should synchronize their time?

ntp or chrony should be fine. what is ntpd reporting?

Actions #9

Updated by tampakrap almost 6 years ago

If your machines are inside the heroes vlan, you can use ntp[1-3].infra.opensuse.org. If your machines are outside the heroes vlan (which is the case for openqa workers), then you can use ntp[1-2].opensuse.org

Actions #10

Updated by okurz almost 6 years ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

Thanks for the information. I assume this is something to be properly covered by salt-recipes again in the future :/

Content from /etc/ntp.conf on power8:

# grep -v '^#' /etc/ntp.conf
tinker panic 0

disable monitor

driftfile /var/lib/ntp/drift/ntp.drift
logfile /var/log/ntp.log
restrict -6 default ignore
restrict default  ignore
restrict -6 ::1
restrict 127.0.0.1
server 192.168.128.17 iburst
restrict 192.168.128.17 nomodify notrap nopeer
server 192.168.128.18 iburst
restrict 192.168.128.18 nomodify notrap nopeer
disable monitor
keys /etc/ntp.keys              # path for keys file
trustedkey 1                    # define trusted keys
requestkey 1                    # key (7) for accessing server variables
controlkey 1                    # key (6) for accessing server variables

whereas on openqaworker1 we have:

# grep -v '^#' /etc/ntp.conf

restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery

restrict 127.0.0.1
restrict ::1



driftfile /var/lib/ntp/drift/ntp.drift # path for drift file

logfile   /var/log/ntp      # alternate log file


keys /etc/ntp.keys      # path for keys file
trustedkey 1            # define trusted keys
requestkey 1            # key (7) for accessing server variables
controlkey 1                    # key (6) for accessing server variables

even though both are openSUSE Leap 42.3 they look completely different.

The configured NTP servers "192.168.128.17" and "192.168.128.18" are not reachable (anymore?) so I configured instead the servers "ntp1.opensuse.org" and "ntp2.opensuse.org"

Actions #11

Updated by okurz almost 6 years ago

Is there a common approach how to configure ntp on these machines and is there any idea why for example openqaworker1 is not having any NTP server configured?

Actions #12

Updated by okurz over 5 years ago

  • Status changed from In Progress to Resolved

ok, fine. Everything manual then … until we reopen the ticket in some months… :)

Actions #13

Updated by okurz over 5 years ago

  • Copied to tickets #38789: time on progress.o.o is off by 20 minutes added
Actions

Also available in: Atom PDF