action #23476
closedWorkers cannot share webUI with different versions. Was: SLE ppc64le workers incomplete immediately after starting jobs, no autoinst-log.txt uploaded.
0%
Description
Observation¶
With openQA-worker 4.5.1502956099.9fa63536-38.1 it seems our SLE ppc64le workers all fail immediately with incomplete after starting a job with no autoinst-log.txt uploaded, e.g. take a look for the history of malbec:1 or QA-Power8-5-kvm:2.
Steps to reproduce¶
- (Re-)Start any job on malbec or QA-Power8-5-kvm
- Observe the job failing as incomplete soon
Problem¶
Linked to #20544 but it seems to be even more severe now on these machines. Maybe something about the configuration to include the additional yast teams webui + scheduler host? (I assume mudler tested also a shared worker when one webUI host is not available?!?)
Updated by okurz over 7 years ago
- Related to action #20544: [tools] Research/investigate ways to optimize scheduler grab_job added
Updated by okurz over 7 years ago
- Status changed from New to In Progress
I temporarily disabled the registration to the yast openQA host, restarted malbec:1 and it's running https://openqa.suse.de/tests/1120780#live right now, is that a workaround?
Updated by okurz over 7 years ago
that job continued further but then ended incomplete later and again no autoinst-log.txt uploaded. Trying the same experiment again on QA-Power8-5-kvm. @2 restarted immediately, @1 restarted after I set the workers.ini to only register against openqa.suse.de, malbec:/etc/openqa/workers.ini has already been reset as expected by salt, that's ok.
Updated by okurz over 7 years ago
- Category changed from Regressions/Crashes to 168
- Assignee set to EDiGiacinto
- Priority changed from Immediate to Normal
So the above diagnosis helped. The yast openQA webui was still at an old version and was trying to announce new jobs and the worker was trying to take them which caused problems (according to rbrown). So the yast openQA webui host was upgraded as well but also the worker was changed to disable sharing for now https://gitlab.suse.de/openqa/salt-pillars-openqa/commit/be58b44240b6686c89859d1bf1093ff7d0040dfe . I guess we can go back after EDiGiacinto tested that workers behave fine also when one webui host is not taking the registration.
Updated by EDiGiacinto over 7 years ago
Yes, worker can't be shared by two different versions of WebUI, expecially since we did lot of changes in those areas in a short period of time.
Updated by szarate over 7 years ago
- Subject changed from SLE ppc64le workers incomplete immediately after starting jobs, no autoinst-log.txt uploaded. to Workers cannot share webUI with different versions. Was: SLE ppc64le workers incomplete immediately after starting jobs, no autoinst-log.txt uploaded.
- Status changed from In Progress to Rejected
Rejecting this as it is invalid.
However, a new item should be added to ensure that new workers can only work with webUI of their own version... I know there's a variable for that, but I'm not sure if it's being used at all.