Project

General

Profile

Actions

action #20812

closed

Jobs will be assigned to workers with wrong arch unless WORKER_CLASS is set somewhere

Added by AdamWill over 7 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

I think since commit fd3c570f8f4554037ffae1179742b9025390eabe , there doesn't seem to be any simple arch-based protection against jobs running on a worker of the wrong arch any more. The %cando matrix in Common.pm is still there, but if you trace it out, the code which ultimately decides whether a job is appropriate - job_grab , in Scheduler/Scheduler.pm - never actually cares about it any more. The values from it get passed into job_grab as the 'workercaps' arg, and the only thing the function does with 'workercaps' is pass it back to the worker (when it does $worker->seen($workercaps) ); it does nothing else with those values any more. So unless the test suite, machine or product specifies WORKER_CLASS , openQA will happily go ahead and try to run an x86_64 job on a ppc64 worker. To cite an entirely random example. Or, you know, possibly not entirely random:

https://openqa.fedoraproject.org/admin/workers/19

I'm gonna go ahead and add WORKER_CLASS to all our machine definitions in our distri to fix our instance, but I do think it's worth reporting that openQA does the wrong thing if WORKER_CLASS isn't explicitly set.


Related issues 2 (0 open2 closed)

Related to openQA Project (public) - coordination #32851: [tools][EPIC] Scheduling redesignResolvedokurz2018-05-05

Actions
Related to openQA Project (public) - action #33580: Jobs are assigned to workers with different backendRejectedokurz2018-03-21

Actions
Actions #1

Updated by coolo about 7 years ago

  • Target version set to Ready
Actions #2

Updated by coolo about 7 years ago

The protection might not be so useful on larger clusters where requiring WORKER_CLASS would be the easier solution. But test developers have single host installations - and we need to protect them from running random architectures :)

Actions #3

Updated by dasantiago almost 7 years ago

Actions #4

Updated by EDiGiacinto almost 7 years ago

A bit tricky in practice for s390x jobs, where workers are actually x86_64 and CPU_ARCH is set to that value

Just to clarify in terms of ACs:

  • Make the scheduler aware of the worker's jobs capabilities and do not assign jobs to those with a different architecture
  • While doing this, take into account when workers are executing jobs in different platforms - either worker explicitly declaring that, or inferring it with a different mechanism
Actions #5

Updated by AdamWill almost 7 years ago

Indeed, we have a similar case with running an ARM test on x86_64 (using extreeeemeeeely slooooooooow emulation).

I mean, it's possible there's no really great fix here. If attempting to fix it gets too complex there's probably a point at which we should just stop, throw out the %cando matrix, and document "you should do this in instance config with worker classes". I don't think that's too terrible so long as it's written down.

Actions #6

Updated by dasantiago almost 7 years ago

  • Related to action #33580: Jobs are assigned to workers with different backend added
Actions #7

Updated by coolo over 6 years ago

  • Difficulty set to easy

I wouldn't care for deployments with complicated workers - admins of those need to read documentation. But jobs post and isos post should take care that we have a WORKER_CLASS - and default to qemu_$ARCH to make the result predicatable.

Actions #8

Updated by mkittler almost 6 years ago

  • Assignee set to mkittler
  • Target version changed from Ready to Current Sprint
Actions #9

Updated by mkittler almost 6 years ago

  • Status changed from New to In Progress
Actions #10

Updated by mkittler almost 6 years ago

  • Status changed from In Progress to Resolved

PR has been merged

Actions

Also available in: Atom PDF