action #19732
closed[tools][openqa-monitoring] "openqaworker<> wants to grab a new job - killing the old one: …"
0%
Description
observation¶
[Sun Jun 11 00:34:56 2017] [scheduler:warn] openqaworker5:20 wants to grab a new job - killing the old one: 993795
[Sun Jun 11 00:34:56 2017] [scheduler:warn] openqaworker7:20 wants to grab a new job - killing the old one: 993749
[Sun Jun 11 00:34:58 2017] [scheduler:warn] openqaworker7:10 wants to grab a new job - killing the old one: 993785
[Sun Jun 11 00:34:58 2017] [scheduler:warn] openqaworker6:15 wants to grab a new job - killing the old one: 993807
[and many more…]
suggestion¶
- check if it is really acceptable behaviour that we just do not care about the old job and kill it
- if acceptable, downgrade warn to info/debug
workaround¶
monitoring silenced with https://github.com/okurz/openqa_monitoring/pull/12
Updated by okurz over 7 years ago
- Copied from action #19730: [tools][openqa-monitoring] "can't remove <needle_path>" added
Updated by coolo about 7 years ago
- Status changed from New to Resolved
No, this is not acceptable - but in case it happens there is no harm done as the system cured itself. If this happens in masses, we have some problem though.
I'm not sure how you want to handle such cases in the monitoring - I can tell you, it didn't happen the last days. So I close the issue
Updated by okurz about 7 years ago
- Status changed from Resolved to In Progress
- Assignee set to okurz
Ok, I will see what I can do about "forward message only if appearing more than a few"
Updated by okurz almost 7 years ago
- Assignee deleted (
okurz)
I checked the monitoring log messages and there are very seldomly or never the same message twice. It's at least always another instance on the same worker.
Updated by EDiGiacinto almost 7 years ago
- Status changed from In Progress to Resolved
I think this can be closed now - the code it's not executing that path anymore [1] [2] (it was left to keep the old behavor intact and let us run test against scheduling logic/priority)
1: https://github.com/os-autoinst/openQA/blob/2dbaf4975c6a3e8f1639f34df4784e31c6ac4d72/lib/OpenQA/Scheduler/Scheduler.pm#L250
2: https://github.com/os-autoinst/openQA/blob/2dbaf4975c6a3e8f1639f34df4784e31c6ac4d72/lib/OpenQA/Scheduler/Scheduler.pm#L618
Updated by okurz almost 7 years ago
- Subject changed from [tools][openqa-monitoring] "openqaworker<> wants to grab a new job - killing the old one: …" to [tools][openqa-monitoring] "openqaworker<> wants to grab a new job - killing the old one: …"
shouldn't we remove the check from the logwarn monitoring script then? If you are not interested in that way of monitoring we can also think about putting it to rest for good.
Updated by EDiGiacinto almost 7 years ago
Yeah sure, we can remove it but i'm not receiving emails from any monitoring service.. :)
Updated by okurz almost 7 years ago
hm, I think we should be more explicit: Would you like to update https://github.com/os-autoinst/openqa-logwarn ?
If you want I could add you as an email recipient to the monitoring alerts. Side-note: I don't recall a recent message received by email so I think currently there are none sent at all. Probably the mail sending is broken or disabled on osd.