Project

General

Profile

action #23476

Workers cannot share webUI with different versions. Was: SLE ppc64le workers incomplete immediately after starting jobs, no autoinst-log.txt uploaded.

Added by okurz almost 3 years ago. Updated over 2 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
-
Start date:
2017-08-19
Due date:
% Done:

0%

Estimated time:
Difficulty:
Duration:

Description

Observation

With openQA-worker 4.5.1502956099.9fa63536-38.1 it seems our SLE ppc64le workers all fail immediately with incomplete after starting a job with no autoinst-log.txt uploaded, e.g. take a look for the history of malbec:1 or QA-Power8-5-kvm:2.

Steps to reproduce

  • (Re-)Start any job on malbec or QA-Power8-5-kvm
  • Observe the job failing as incomplete soon

Problem

Linked to #20544 but it seems to be even more severe now on these machines. Maybe something about the configuration to include the additional yast teams webui + scheduler host? (I assume mudler tested also a shared worker when one webUI host is not available?!?)


Related issues

Related to openQA Project - action #20544: [tools] Research/investigate ways to optimize scheduler grab_jobResolved2017-07-18

History

#1 Updated by okurz almost 3 years ago

  • Related to action #20544: [tools] Research/investigate ways to optimize scheduler grab_job added

#2 Updated by okurz almost 3 years ago

  • Status changed from New to In Progress

I temporarily disabled the registration to the yast openQA host, restarted malbec:1 and it's running https://openqa.suse.de/tests/1120780#live right now, is that a workaround?

#3 Updated by okurz almost 3 years ago

that job continued further but then ended incomplete later and again no autoinst-log.txt uploaded. Trying the same experiment again on QA-Power8-5-kvm. @2 restarted immediately, @1 restarted after I set the workers.ini to only register against openqa.suse.de, malbec:/etc/openqa/workers.ini has already been reset as expected by salt, that's ok.

#4 Updated by okurz almost 3 years ago

  • Category changed from Concrete Bugs to 168
  • Assignee set to EDiGiacinto
  • Priority changed from Immediate to Normal

So the above diagnosis helped. The yast openQA webui was still at an old version and was trying to announce new jobs and the worker was trying to take them which caused problems (according to rbrown). So the yast openQA webui host was upgraded as well but also the worker was changed to disable sharing for now https://gitlab.suse.de/openqa/salt-pillars-openqa/commit/be58b44240b6686c89859d1bf1093ff7d0040dfe . I guess we can go back after EDiGiacinto tested that workers behave fine also when one webui host is not taking the registration.

#5 Updated by EDiGiacinto over 2 years ago

Yes, worker can't be shared by two different versions of WebUI, expecially since we did lot of changes in those areas in a short period of time.

#6 Updated by szarate over 2 years ago

  • Subject changed from SLE ppc64le workers incomplete immediately after starting jobs, no autoinst-log.txt uploaded. to Workers cannot share webUI with different versions. Was: SLE ppc64le workers incomplete immediately after starting jobs, no autoinst-log.txt uploaded.
  • Status changed from In Progress to Rejected

Rejecting this as it is invalid.

However, a new item should be added to ensure that new workers can only work with webUI of their own version... I know there's a variable for that, but I'm not sure if it's being used at all.

Also available in: Atom PDF