tickets #26832
closedopenQA PowerPC workers are Offline since 5 days
100%
Description
Hi there,
openQA PowerPC workers are Offline since 5 days (1)
Could somebody restart them ?
(1) https://openqa.opensuse.org/admin/workers¶
power8:1 Offline
power8:10 Offline
power8:11 Offline
power8:12 Offline
power8:13 Offline
power8:14 Offline
power8:15 Offline
power8:16 Offline
power8:2 Offline
power8:3 Offline
power8:4 Offline
power8:5 Offline
power8:6 Offline
power8:7 Offline
power8:8 Offline
power8:9 Offline¶
Worker power8:1
Host: power8
Instance: 1
Seen: 5 days ago
===
--
Michel Normand
Updated by michel_mno over 6 years ago
as per openQA log somebody restarted them an hour ago:
about an hour ago k0da worker_register { "host": "power8", "id": 17, "instance": "5", "caps": { "cpu_arch": "ppc64le", "mem_max": "130609", "cpu_modelname": "POWER8E (raw), altivec supported", "cpu_opmode": null, "worker_class": "qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload" } }
about an hour ago k0da worker_register { "id": 14, "instance": "2", "caps": { "cpu_arch": "ppc64le", "mem_max": "130609", "cpu_modelname": "POWER8E (raw), altivec supported", "cpu_opmode": null, "worker_class": "qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload" }, "host": "power8" }
about an hour ago k0da worker_register { "instance": "4", "caps": { "worker_class": "qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload", "cpu_opmode": null, "mem_max": "130609", "cpu_modelname": "POWER8E (raw), altivec supported", "cpu_arch": "ppc64le" }, "id": 16, "host": "power8" }
about an hour ago k0da worker_register { "id": 18, "instance": "6", "caps": { "cpu_arch": "ppc64le", "mem_max": "130609", "cpu_modelname": "POWER8E (raw), altivec supported", "worker_class": "qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload", "cpu_opmode": null }, "host": "power8" }
about an hour ago k0da worker_register { "host": "power8", "instance": "3", "caps": { "cpu_arch": "ppc64le", "mem_max": "130609", "cpu_modelname": "POWER8E (raw), altivec supported", "worker_class": "qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload", "cpu_opmode": null }, "id": 15 }
about an hour ago k0da worker_register { "id": 13, "caps": { "cpu_opmode": null, "worker_class": "qemu_ppc64le,qemu_ppc64,qemu_ppc,heavyload", "cpu_arch": "ppc64le", "mem_max": "130609", "cpu_modelname": "POWER8E (raw), altivec supported" }, "instance": "1", "host": "power8" }
Updated by Anonymous over 6 years ago
- Category set to OBS
- Status changed from New to Closed
- Assignee set to Anonymous
- % Done changed from 0 to 100
Sorry for the delay - "someone" was needed to find the power button ;-)
=> will create a new issue requesting monitoring of these machines now. Might be that I'm coming back to you with the question on "how to do this" ;)
Updated by michel_mno over 6 years ago
lrupp, could you switch the issue to Public ? I am not allowed to do it.
Updated by michel_mno almost 6 years ago
Again same problem with PowerPC workers not active since 8 days.
all PowerPC workers are offline and TW snapshot 20180608 is stuck waiting for workers.
https://openqa.opensuse.org/tests/overview?distri=opensuse&version=Tumbleweed&build=20180608&groupid=4
question was asked on irc two times, but not answer; could somebody handle it ?
irc opensuse-factory extract:
juin 15 15:11:59 <maxlin> just out of curiosity, do we have any problem on power8 openqa worker? I asking because powerpc jobs were scheduled 5 days and we have no power worker available
juin 15 15:12:08 <maxlin> on o3
...
june 18 11:50:28 <michel_mno> Hello there, I am just reconnecting to irc, was there an update about the reason why PowerPC workers are not active for o3 since at least 8 days ? https://openqa.opensuse.org/admin/workers/13
Updated by okurz almost 6 years ago
From the logs on power8:
$ journalctl -u openqa-worker@1.service
Jun 15 13:26:29 power8 worker[18252]: [error] ignoring server - server refused with code 403: {"error":"timestamp mismatch","error_status":403}
And the time is off by about 5 minutes. ntpd is running but reporting problems to synchronize team. Someone else from heroes that knows how the machines within the openSUSE network should synchronize their time?
Updated by pjessen almost 6 years ago
okurz wrote:
And the time is off by about 5 minutes. ntpd is running but reporting problems to synchronize team. Someone else from heroes that knows how the machines within the openSUSE network should synchronize their time?
ntp or chrony should be fine. what is ntpd reporting?
Updated by tampakrap almost 6 years ago
If your machines are inside the heroes vlan, you can use ntp[1-3].infra.opensuse.org. If your machines are outside the heroes vlan (which is the case for openqa workers), then you can use ntp[1-2].opensuse.org
Updated by okurz almost 6 years ago
- Status changed from Workable to In Progress
- Assignee set to okurz
Thanks for the information. I assume this is something to be properly covered by salt-recipes again in the future :/
Content from /etc/ntp.conf on power8:
# grep -v '^#' /etc/ntp.conf
tinker panic 0
disable monitor
driftfile /var/lib/ntp/drift/ntp.drift
logfile /var/log/ntp.log
restrict -6 default ignore
restrict default ignore
restrict -6 ::1
restrict 127.0.0.1
server 192.168.128.17 iburst
restrict 192.168.128.17 nomodify notrap nopeer
server 192.168.128.18 iburst
restrict 192.168.128.18 nomodify notrap nopeer
disable monitor
keys /etc/ntp.keys # path for keys file
trustedkey 1 # define trusted keys
requestkey 1 # key (7) for accessing server variables
controlkey 1 # key (6) for accessing server variables
whereas on openqaworker1 we have:
# grep -v '^#' /etc/ntp.conf
restrict -4 default kod notrap nomodify nopeer noquery
restrict -6 default kod notrap nomodify nopeer noquery
restrict 127.0.0.1
restrict ::1
driftfile /var/lib/ntp/drift/ntp.drift # path for drift file
logfile /var/log/ntp # alternate log file
keys /etc/ntp.keys # path for keys file
trustedkey 1 # define trusted keys
requestkey 1 # key (7) for accessing server variables
controlkey 1 # key (6) for accessing server variables
even though both are openSUSE Leap 42.3 they look completely different.
The configured NTP servers "192.168.128.17" and "192.168.128.18" are not reachable (anymore?) so I configured instead the servers "ntp1.opensuse.org" and "ntp2.opensuse.org"
Updated by okurz almost 6 years ago
Is there a common approach how to configure ntp on these machines and is there any idea why for example openqaworker1 is not having any NTP server configured?
Updated by okurz over 5 years ago
- Status changed from In Progress to Resolved
ok, fine. Everything manual then … until we reopen the ticket in some months… :)
Updated by okurz over 5 years ago
- Copied to tickets #38789: time on progress.o.o is off by 20 minutes added