action #23476
closed
Workers cannot share webUI with different versions. Was: SLE ppc64le workers incomplete immediately after starting jobs, no autoinst-log.txt uploaded.
Added by okurz over 7 years ago.
Updated over 7 years ago.
Category:
Feature requests
Description
Observation¶
With openQA-worker 4.5.1502956099.9fa63536-38.1 it seems our SLE ppc64le workers all fail immediately with incomplete after starting a job with no autoinst-log.txt uploaded, e.g. take a look for the history of malbec:1 or QA-Power8-5-kvm:2.
Steps to reproduce¶
- (Re-)Start any job on malbec or QA-Power8-5-kvm
- Observe the job failing as incomplete soon
Problem¶
Linked to #20544 but it seems to be even more severe now on these machines. Maybe something about the configuration to include the additional yast teams webui + scheduler host? (I assume mudler tested also a shared worker when one webUI host is not available?!?)
- Related to action #20544: [tools] Research/investigate ways to optimize scheduler grab_job added
- Status changed from New to In Progress
that job continued further but then ended incomplete later and again no autoinst-log.txt uploaded. Trying the same experiment again on QA-Power8-5-kvm. @2 restarted immediately, @1 restarted after I set the workers.ini to only register against openqa.suse.de, malbec:/etc/openqa/workers.ini has already been reset as expected by salt, that's ok.
- Category changed from Regressions/Crashes to 168
- Assignee set to EDiGiacinto
- Priority changed from Immediate to Normal
So the above diagnosis helped. The yast openQA webui was still at an old version and was trying to announce new jobs and the worker was trying to take them which caused problems (according to rbrown). So the yast openQA webui host was upgraded as well but also the worker was changed to disable sharing for now https://gitlab.suse.de/openqa/salt-pillars-openqa/commit/be58b44240b6686c89859d1bf1093ff7d0040dfe . I guess we can go back after EDiGiacinto tested that workers behave fine also when one webui host is not taking the registration.
Yes, worker can't be shared by two different versions of WebUI, expecially since we did lot of changes in those areas in a short period of time.
- Subject changed from SLE ppc64le workers incomplete immediately after starting jobs, no autoinst-log.txt uploaded. to Workers cannot share webUI with different versions. Was: SLE ppc64le workers incomplete immediately after starting jobs, no autoinst-log.txt uploaded.
- Status changed from In Progress to Rejected
Rejecting this as it is invalid.
However, a new item should be added to ensure that new workers can only work with webUI of their own version... I know there's a variable for that, but I'm not sure if it's being used at all.
Also available in: Atom
PDF