coordination #43934
opencoordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
[epic] Manage o3 infrastructure with salt again
Added by okurz about 6 years ago. Updated about 2 years ago.
33%
Description
Observation¶
See #43823#note-1 . Previously we had a salt-minion on each worker even though no salt recipes were used, at least we used salt for structured remote execution ;)
Expected result¶
As salt was there, is the preferred system management solution, and should be extended to have full recipes we should have a salt-minion available as well on all the workers.
To be covered for o3 in system management, e.g. salt states¶
Updated by okurz about 6 years ago
- Copied from action #43823: o3 workers immediately incompleting all jobs, caching service can not be reached added
Updated by okurz about 6 years ago
- Copied to action #43937: align o3 workers (done: "imagetester" and) "power8" with the others which are currently "transactional-update" hosts added
Updated by RBrownSUSE about 6 years ago
Indeed, this is intentional, but temporarily intentional
Given Salt's habit of exploding spectacularly when the master is not updated but the minions are, and given we patched the minions first and they now auto update, it would be suicidal to have a master running on leap 42.3 o3 talking to minions running on Leap 15.0 workers.
When o3 is also Leap 15 and updated at least as frequently as the workers, then this makes sense, thanks for tracking the item :)
Updated by okurz about 6 years ago
I see, so I created a new ticket #43976 to cover that.
Updated by nicksinger about 6 years ago
- Copied to deleted (action #43937: align o3 workers (done: "imagetester" and) "power8" with the others which are currently "transactional-update" hosts)
Updated by nicksinger about 6 years ago
- Blocks action #43937: align o3 workers (done: "imagetester" and) "power8" with the others which are currently "transactional-update" hosts added
Updated by okurz over 5 years ago
- Blocks deleted (action #43937: align o3 workers (done: "imagetester" and) "power8" with the others which are currently "transactional-update" hosts)
Updated by okurz over 5 years ago
- Related to action #43937: align o3 workers (done: "imagetester" and) "power8" with the others which are currently "transactional-update" hosts added
Updated by okurz over 5 years ago
- Status changed from Blocked to Workable
I think @nicksinger got it upside down with the blocks/blocked however I do not see that this ticket here is strongly blocking/blocked, a strong relationship, sure.
Updated by okurz over 5 years ago
- Subject changed from salt is gone from o3 workers? to Manage o3 infrastructure with salt again
- Priority changed from Normal to Low
Because salt is not currently used within the o3 infrastructure and because of error messages within the journal on o3 for now I disabled the salt master on o3 as well with systemctl disable --now salt-master
to prevent the error message "Exception during resolving address: [Errno 1] Unknown host"
Updated by okurz about 5 years ago
- Related to action #44066: merge the two osd salt git repos added
Updated by okurz about 5 years ago
- Blocks action #53573: Failed service "irqbalance" on aarch64.o.o added
Updated by okurz about 5 years ago
- Blocks deleted (action #53573: Failed service "irqbalance" on aarch64.o.o)
Updated by okurz about 5 years ago
- Related to action #53573: Failed service "irqbalance" on aarch64.o.o added
Updated by okurz almost 5 years ago
had a chat with lrupp/kl_eisbaer: He made me aware of https://build.opensuse.org/package/show/OBS:Server:Unstable/OBS-WorkerOnly which provides images that are used by OBS workers.
The images are loaded by PXE, the PXE config points to an image - and this image path is a symlink here. Allows to easily switch from one image to the other, if something is identified as broken.
[04/02/2020 10:20:36] <kl_eisbaer> the most funny part is the one that adjusts the worker after the PXE boot.
[04/02/2020 10:20:57] <kl_eisbaer> here we use a script in the init phase, that downloads files/settings from the server
[04/02/2020 10:21:22] <kl_eisbaer> with this, we can adjust the configuration of the worker (how many parallel builds, how much disk space, etc)
[04/02/2020 10:21:44] <kl_eisbaer> okurz: if you are interested, I can give you a short introduction into our setup
[04/02/2020 10:22:36] <kl_eisbaer> ...which even allows us to move workers between OBS/IBS via script since we have access to the switches :-)
Updated by okurz almost 5 years ago
My current proposal is the following:
- Ensure salt-minion on all o3 workers
- Ensure salt-master on o3
- Ensure workers are connected to o3 and salt key is accepted
- Move gitlab.suse.de/openqa/salt-states-openqa to github, e.g. in https://github.com/os-autoinst scope, and create back-mirror into salt-states repo or get rid of it completely
Anyone sees problems with this approach?
Updated by nicksinger about 4 years ago
RBrownSUSE wrote:
Indeed, this is intentional, but temporarily intentional
Given Salt's habit of exploding spectacularly when the master is not updated but the minions are, and given we patched the minions first and they now auto update, it would be suicidal to have a master running on leap 42.3 o3 talking to minions running on Leap 15.0 workers.
When o3 is also Leap 15 and updated at least as frequently as the workers, then this makes sense, thanks for tracking the item :)
@okurz this still applies somewhat. While o3 is on 15.2 in the meantime it still needs manual updates. This raises a few points from my side:
- Isn't the topic "install salt" blocked by the migration of o3 onto transnational servers?
- If salt explodes that spectacularly with non-matching versions; should we maybe look into something like ansible?
- Should we at least deploy a ssh-key onto ariel which can access all workers over ssh and install something like pssh (https://linux.die.net/man/1/pssh)?
Updated by RBrownSUSE about 4 years ago
nicksinger wrote:
@okurz this still applies somewhat. While o3 is on 15.2 in the meantime it still needs manual updates. This raises a few points from my side:
- Isn't the topic "install salt" blocked by the migration of o3 onto transnational servers?
Saltstack supports transactional systems meanwhile - https://github.com/openSUSE/salt/pull/271
- If salt explodes that spectacularly with non-matching versions; should we maybe look into something like ansible?
Salt can run masterless, in which case the versions are unrelated
- Should we at least deploy a ssh-key onto ariel which can access all workers over ssh and install something like pssh (https://linux.die.net/man/1/pssh)?
Updated by nicksinger about 4 years ago
RBrownSUSE wrote:
nicksinger wrote:
@okurz this still applies somewhat. While o3 is on 15.2 in the meantime it still needs manual updates. This raises a few points from my side:
- Isn't the topic "install salt" blocked by the migration of o3 onto transnational servers?
Saltstack supports transactional systems meanwhile - https://github.com/openSUSE/salt/pull/271
kind of https://github.com/saltstack/salt/pull/58520 ;)
It also doesn't solve the problem of o3 being upgraded manually so version differences can still happen. Masterless salt is an interesting point you have there. Can it cover our (current) main use-case: execute commands on multiple hosts?
Updated by okurz about 4 years ago
I would not be concerned with version differences until I see that failing. And running salt-master (again) on o3 and a salt-minion on each worker, accepting salt keys, and then simply use it for distributed command execution, e.g. cmd.run
is a good start.
Updated by okurz over 3 years ago
- Tags set to salt, system management, o3, osd, open source, infrastucture
- Tracker changed from action to coordination
- Project changed from openQA Infrastructure (public) to openQA Project (public)
- Subject changed from Manage o3 infrastructure with salt again to [epic] Manage o3 infrastructure with salt again
- Category set to Organisational
- Assignee set to okurz
- Parent task set to #80142
Updated by okurz over 3 years ago
- Status changed from Workable to Blocked
Created two specific subtasks to make picking up easier :)
Updated by okurz over 3 years ago
- Target version changed from Ready to future
with the two subtasks in future we can also move this epic there for now
Updated by okurz about 2 years ago
- Tags changed from salt, system management, o3, osd, open source, infrastucture to salt, system management, o3, osd, open source, infrastructure, infra
Updated by okurz about 2 years ago
- Tags changed from salt, system management, o3, osd, open source, infrastructure, infra to salt, system management, o3, osd, open source, infra