Workers cannot share webUI with different versions. Was: SLE ppc64le workers incomplete immediately after starting jobs, no autoinst-log.txt uploaded.
With openQA-worker 4.5.1502956099.9fa63536-38.1 it seems our SLE ppc64le workers all fail immediately with incomplete after starting a job with no autoinst-log.txt uploaded, e.g. take a look for the history of malbec:1 or QA-Power8-5-kvm:2.
Steps to reproduce¶
- (Re-)Start any job on malbec or QA-Power8-5-kvm
- Observe the job failing as incomplete soon
Linked to #20544 but it seems to be even more severe now on these machines. Maybe something about the configuration to include the additional yast teams webui + scheduler host? (I assume mudler tested also a shared worker when one webUI host is not available?!?)
#3 Updated by okurz almost 3 years ago
that job continued further but then ended incomplete later and again no autoinst-log.txt uploaded. Trying the same experiment again on QA-Power8-5-kvm. @2 restarted immediately, @1 restarted after I set the workers.ini to only register against openqa.suse.de, malbec:/etc/openqa/workers.ini has already been reset as expected by salt, that's ok.
#4 Updated by okurz almost 3 years ago
- Category changed from Concrete Bugs to 168
- Assignee set to EDiGiacinto
- Priority changed from Immediate to Normal
So the above diagnosis helped. The yast openQA webui was still at an old version and was trying to announce new jobs and the worker was trying to take them which caused problems (according to rbrown). So the yast openQA webui host was upgraded as well but also the worker was changed to disable sharing for now https://gitlab.suse.de/openqa/salt-pillars-openqa/commit/be58b44240b6686c89859d1bf1093ff7d0040dfe . I guess we can go back after EDiGiacinto tested that workers behave fine also when one webui host is not taking the registration.
#6 Updated by szarate over 2 years ago
- Subject changed from SLE ppc64le workers incomplete immediately after starting jobs, no autoinst-log.txt uploaded. to Workers cannot share webUI with different versions. Was: SLE ppc64le workers incomplete immediately after starting jobs, no autoinst-log.txt uploaded.
- Status changed from In Progress to Rejected
Rejecting this as it is invalid.
However, a new item should be added to ensure that new workers can only work with webUI of their own version... I know there's a variable for that, but I'm not sure if it's being used at all.