Project

General

Profile

coordination #47117

[epic] Fix worker->websocket->scheduler->webui connection

Added by coolo almost 3 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
-
Start date:
2019-02-04
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

We have multiple problems with the way we split state between various components. And simple fixes won't help as our architecture is just
too complex for little gains :(


Related issues

Related to openQA Project - action #42986: parallel jobs not reliable killed/restartedResolved2018-10-28

Related to openQA Project - action #47060: [worker service][scheduling] openqaworker2:21 ~ openqaworker2:24 stops getting new jobs for over 1 day.Resolved2019-02-03

Related to openQA Project - action #46886: worker3 keeps jobs Resolved2019-01-31

Related to openQA Project - action #42980: job stayed in assigned but is deadRejected2018-10-27

Related to openQA Project - action #47087: [scheduling] Workers on openqaworker2 stuck frequentlyResolved2019-02-04

Related to openQA Project - action #46187: Create list of "worker responsibilities"Resolved2019-01-15

Blocked by openQA Project - action #46802: Replace D-Bus with plain HTTPResolved2019-01-29

History

#1 Updated by coolo almost 3 years ago

  • Related to action #42986: parallel jobs not reliable killed/restarted added

#2 Updated by coolo almost 3 years ago

  • Related to action #47060: [worker service][scheduling] openqaworker2:21 ~ openqaworker2:24 stops getting new jobs for over 1 day. added

#3 Updated by coolo almost 3 years ago

#4 Updated by coolo almost 3 years ago

  • Blocks action #41066: Scheduling jobs for IPMI (bare metal) on the same worker (aka FOLLOW_TEST_DIRECTLY aka START_DIRECTLY_AFTER_TEST). added

#5 Updated by coolo almost 3 years ago

#6 Updated by coolo almost 3 years ago

  • Related to action #42980: job stayed in assigned but is dead added

#7 Updated by coolo almost 3 years ago

  • Related to action #47087: [scheduling] Workers on openqaworker2 stuck frequently added

#8 Updated by mkittler almost 3 years ago

A few notes from our last tools team meeting regarding the worker:

  • Remove engines to simplify the code. The worker was never really independent from isotovideo anyways.
  • Use Minion jobs for more than just the cache service (e.g. for uploading results).
  • Allow multiple slots per (systemd) service.

#9 Updated by mkittler almost 3 years ago

  • Related to action #46187: Create list of "worker responsibilities" added

#10 Updated by okurz over 2 years ago

  • Category changed from 122 to Feature requests

#11 Updated by okurz over 2 years ago

#12 Updated by mkittler over 2 years ago

  • Blocks deleted (action #41066: Scheduling jobs for IPMI (bare metal) on the same worker (aka FOLLOW_TEST_DIRECTLY aka START_DIRECTLY_AFTER_TEST).)

#13 Updated by mkittler over 2 years ago

  • Blocks deleted (action #41027: worker disconnects during cleanup)

#14 Updated by okurz almost 2 years ago

  • Subject changed from EPIC: Fix worker->websocket->scheduler->webui connection to [epic] Fix worker->websocket->scheduler->webui connection
  • Status changed from New to Resolved
  • Assignee set to okurz

Over the past months a lot of work has happened in the area of the communication and architecture. Definitely we have not changed the big picture architecture but have we reworked stuff and we at least added much more coverage. Some points that had been mentioned have been done since then, e.g. "use external processes for uploading results". But for more only more specific tasks would help. I don't see how a ticket "Fix worker…" would help anymore.

#15 Updated by szarate over 1 year ago

  • Tracker changed from action to coordination
  • Difficulty deleted (hard)

Also available in: Atom PDF