Project

General

Profile

Actions

coordination #47117

closed

[epic] Fix worker->websocket->scheduler->webui connection

Added by coolo almost 6 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
-
Start date:
2019-02-04
Due date:
% Done:

0%

Estimated time:

Description

We have multiple problems with the way we split state between various components. And simple fixes won't help as our architecture is just
too complex for little gains :(


Related issues 7 (0 open7 closed)

Related to openQA Project (public) - action #42986: parallel jobs not reliable killed/restartedResolvedcoolo2018-10-28

Actions
Related to openQA Project (public) - action #47060: [worker service][scheduling] openqaworker2:21 ~ openqaworker2:24 stops getting new jobs for over 1 day.Resolvedmkittler2019-02-03

Actions
Related to openQA Project (public) - action #46886: worker3 keeps jobs Resolved2019-01-31

Actions
Related to openQA Project (public) - action #42980: job stayed in assigned but is deadRejectedmkittler2018-10-27

Actions
Related to openQA Project (public) - action #47087: [scheduling] Workers on openqaworker2 stuck frequentlyResolvedmkittler2019-02-04

Actions
Related to openQA Project (public) - action #46187: Create list of "worker responsibilities"Resolvedmkittler2019-01-15

Actions
Blocked by openQA Project (public) - action #46802: Replace D-Bus with plain HTTPResolvedkraih2019-01-29

Actions
Actions #1

Updated by coolo almost 6 years ago

  • Related to action #42986: parallel jobs not reliable killed/restarted added
Actions #2

Updated by coolo almost 6 years ago

  • Related to action #47060: [worker service][scheduling] openqaworker2:21 ~ openqaworker2:24 stops getting new jobs for over 1 day. added
Actions #3

Updated by coolo almost 6 years ago

Actions #4

Updated by coolo almost 6 years ago

  • Blocks action #41066: Scheduling jobs for IPMI (bare metal) on the same worker (aka FOLLOW_TEST_DIRECTLY aka START_DIRECTLY_AFTER_TEST). added
Actions #5

Updated by coolo almost 6 years ago

Actions #6

Updated by coolo almost 6 years ago

  • Related to action #42980: job stayed in assigned but is dead added
Actions #7

Updated by coolo almost 6 years ago

  • Related to action #47087: [scheduling] Workers on openqaworker2 stuck frequently added
Actions #8

Updated by mkittler almost 6 years ago

A few notes from our last tools team meeting regarding the worker:

  • Remove engines to simplify the code. The worker was never really independent from isotovideo anyways.
  • Use Minion jobs for more than just the cache service (e.g. for uploading results).
  • Allow multiple slots per (systemd) service.
Actions #9

Updated by mkittler almost 6 years ago

  • Related to action #46187: Create list of "worker responsibilities" added
Actions #10

Updated by okurz over 5 years ago

  • Category changed from 122 to Feature requests
Actions #11

Updated by okurz over 5 years ago

Actions #12

Updated by mkittler about 5 years ago

  • Blocks deleted (action #41066: Scheduling jobs for IPMI (bare metal) on the same worker (aka FOLLOW_TEST_DIRECTLY aka START_DIRECTLY_AFTER_TEST).)
Actions #13

Updated by mkittler about 5 years ago

  • Blocks deleted (action #41027: worker disconnects during cleanup)
Actions #14

Updated by okurz over 4 years ago

  • Subject changed from EPIC: Fix worker->websocket->scheduler->webui connection to [epic] Fix worker->websocket->scheduler->webui connection
  • Status changed from New to Resolved
  • Assignee set to okurz

Over the past months a lot of work has happened in the area of the communication and architecture. Definitely we have not changed the big picture architecture but have we reworked stuff and we at least added much more coverage. Some points that had been mentioned have been done since then, e.g. "use external processes for uploading results". But for more only more specific tasks would help. I don't see how a ticket "Fix worker…" would help anymore.

Actions #15

Updated by szarate about 4 years ago

  • Tracker changed from action to coordination
  • Difficulty deleted (hard)
Actions

Also available in: Atom PDF