action #7190
closedThe worker died due to the scheduler not responding
Description
See this issue on openqa.oo, the log can found https://openqa.opensuse.org/tests/55761 <- 3 months life
Files
Updated by mlin7442 over 9 years ago
looks I pasted the wrong log, the log is here, http://susepaste.org/95635732
Updated by mlin7442 over 9 years ago
- Subject changed from The worker died due to the scheduler doesn't reponding to The worker died due to the scheduler doesn't responding
Updated by oholecek over 9 years ago
This https://github.com/os-autoinst/openQA/pull/329 should make worker stop complaining on undefined vars and quit peacefully. Worker quitting is in fact by design when scheduler starts to return 4XX result codes.
To further investigate I would need scheduler part of logs. If it wasn't on opensuse.o.o I would guess Demo account API keys timed out.
Updated by mlin7442 over 9 years ago
- File openqa-20150409.xz openqa-20150409.xz added
- Status changed from New to In Progress
- Assignee set to oholecek
- Target version set to Sprint 16
assign to Ondřej.
and attached the openqa log.
Updated by oholecek over 9 years ago
- Subject changed from The worker died due to the scheduler doesn't responding to The worker died due to the scheduler not responding
- Status changed from In Progress to Feedback
Scheduler logs ruled out API key expiry, so auth failure must come from hmac validation failure. Given the inactivity timeout reported by both worker and scheduler, there were indeed network related problems (how about possible MITM attack?).
Because worker is expected to quit when scheduler return 4XX results, I don't think there is anything more to do. Or maybe change systemd unit to restart worker after some time?