action #7190
The worker died due to the scheduler not responding
0%
Description
See this issue on openqa.oo, the log can found https://openqa.opensuse.org/tests/55761 <- 3 months life
History
#1
Updated by mlin7442 almost 8 years ago
looks I pasted the wrong log, the log is here, http://susepaste.org/95635732
#2
Updated by mlin7442 almost 8 years ago
- Subject changed from The worker died due to the scheduler doesn't reponding to The worker died due to the scheduler doesn't responding
#3
Updated by oholecek almost 8 years ago
This https://github.com/os-autoinst/openQA/pull/329 should make worker stop complaining on undefined vars and quit peacefully. Worker quitting is in fact by design when scheduler starts to return 4XX result codes.
To further investigate I would need scheduler part of logs. If it wasn't on opensuse.o.o I would guess Demo account API keys timed out.
#4
Updated by mlin7442 almost 8 years ago
- File openqa-20150409.xz openqa-20150409.xz added
- Status changed from New to In Progress
- Assignee set to oholecek
- Target version set to Sprint 16
assign to Ondřej.
and attached the openqa log.
#5
Updated by oholecek almost 8 years ago
- Subject changed from The worker died due to the scheduler doesn't responding to The worker died due to the scheduler not responding
- Status changed from In Progress to Feedback
Scheduler logs ruled out API key expiry, so auth failure must come from hmac validation failure. Given the inactivity timeout reported by both worker and scheduler, there were indeed network related problems (how about possible MITM attack?).
Because worker is expected to quit when scheduler return 4XX results, I don't think there is anything more to do. Or maybe change systemd unit to restart worker after some time?