Project

General

Profile

Actions

coordination #161414

open

[epic] Improved salt based infrastructure management

Added by okurz 8 months ago. Updated 4 days ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2020-08-07
Due date:
% Done:

62%

Estimated time:
(Total: 0.00 h)
Tags:

Subtasks 35 (13 open22 closed)

action #69718: harmonize timezone used for our machines size:MResolveddheidler2020-08-07

Actions
action #94492: Configure retention/downsampling policy for monitoring data stored within InfluxDB size:MResolvedmkittler2021-06-22

Actions
action #103380: Configure retention/downsampling policy for specific monitoring data stored within InfluxDBBlockedokurz2021-12-01

Actions
action #161423: [timeboxed:10h] Incomplete config files on OSD due to salt - Improve salt state application from remotely accessible salt master size:SResolvedokurz2024-06-03

Actions
action #161426: incomplete config files on OSD due to salt - introduce post-deploy monitoring steps like in osd-deployment but in salt-states-openqaNew2024-06-03

Actions
action #161429: incomplete config files on OSD due to salt - create annotations in grafana on the time of the osd deployment as well as salt-states-openqa deploymentsNew2024-06-03

Actions
action #162377: incomplete config files on OSD due to salt - Prevent conflicting state applications on OSD "fstab" size:SResolvedokurz2024-06-03

Actions
action #163790: OSD openqa.ini is corrupted, invalid characters size:MResolvedokurz2024-07-10

Actions
action #167051: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/3109145 failed due to telegraf errors on monitor.qa.suse.de size:SResolvednicksinger2024-09-19

Actions
action #167719: No new data in monitor.qe.nue2.suse.org due to influxdb failing to write with ""error opening new segment file for wal (1): write /var/lib/influxdb/….wal: no space left on device"Resolvedokurz2024-10-02

Actions
action #167722: Efficient use of monitoring data within influxdb on monitor.qe.nue2.suse.org size:MWorkable2024-10-02

Actions
action #167728: grafana dashboard for monitor.qe.nue2.suse.org size:SResolvedgpathak2024-10-02

Actions
action #168145: implement telegraf health check and adjust according pipelinesNew

Actions
action #168148: hackweek idea: use loki to monitor our log files and explore alerting possibilites based on these size:SResolvednicksinger

Actions
action #170077: Put more storage into qamaster "to make our lives easier in general" size:MResolvedokurz2024-11-19

Actions
action #173344: Extend iPXE in qe/oqa.*.suse.org to also display on local console size:SResolvedgpathak

Actions
action #173347: Ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc. size:SResolvedgpathak

Actions
action #173350: Migrate VMs from qamaster to modern hypervisor solutionNew2024-11-29

Actions
action #173353: physically label slots 10+11 on qamaster size:SResolvedrobert.richardson2024-11-29

Actions
action #173674: qamaster-independent backup size:SBlockeddheidler2024-12-03

Actions
action #175407: salt state for machine monitor.qe.nue2.suse.org was broken for almost 2 months, nothing was alerting us size:SResolvedokurz

Actions
action #175629: diesel+petrol (possibly all ppc64le OPAL machines) often run into salt error "Not connected" or "No response" due to wireguard services failing to start on boot size:SResolvednicksinger2025-01-16

Actions
action #175686: OSD webUI ended up with "502 Bad Gateway" from nginx on 2025-01-17, needed manual restart of openqa-webuiResolvedokurz2025-01-17

Actions
action #175689: monitor.qe.nue2.suse.org "502 Bad Gateway" from nginx on 2025-01-17, missing grafana server files?Resolvednicksinger2025-01-17

Actions
action #175707: OSD backups missing since 2024-11 on backup-vm.qe.nue2.suse.org size:SBlockeddheidler2025-01-17

Actions
action #175710: OSD openqa.ini is corrupted, invalid characters, again 2025-01-17Blockedokurz2024-07-10

Actions
action #175740: [alert] deploy pipeline for salt-states-openqa failed, multiple host run into salt error "Not connected" or "No response"Resolvedokurz2025-01-16

Actions
action #175791: [alert] storage: partitions usage (%) alert size:SBlockedgpathak

Actions
action #175848: Validate sls files in salt-{states,pillars}-openqa with a best effort approachNew2025-01-20

Actions
action #175851: Prevent re-evaluation of "stop_and_disable_all_not_configured_workers" state on every run size:SResolvedjbaier_cz2025-01-20

Actions
action #176013: [alert] web UI: Too many Minion job failures alert size:SResolvedybonatakis2025-01-23

Actions
openQA Project (public) - action #176121: salt-states-openqa pipeline deploy fails on master, SaltReqTimeoutError: Message timed outResolvedokurz2025-01-24

Actions
action #176124: OSD influxdb minion route seemingly returns only a very small number of failed minion jobs, not allResolvedtinita

Actions
action #176175: [alert] Grafana failed to start due to corrupted config fileBlockedokurz2025-01-26

Actions
action #176250: file corruption in salt controlled config files size:MWorkable

Actions
Actions #1

Updated by okurz 8 months ago

  • Subtask #161423 added
Actions #2

Updated by okurz 8 months ago

  • Subtask #161426 added
Actions #3

Updated by okurz 8 months ago

  • Subtask #161429 added
Actions #4

Updated by okurz 8 months ago

  • Subtask #162377 added
Actions #5

Updated by okurz 4 months ago

  • Subtask #167719 added
Actions #6

Updated by okurz 4 months ago

  • Subtask #167722 added
Actions #7

Updated by okurz 4 months ago

  • Subtask #167728 added
Actions #8

Updated by okurz 4 months ago

  • Subtask #103380 added
Actions #9

Updated by okurz 4 months ago

  • Subtask #94492 added
Actions #10

Updated by okurz 4 months ago

  • Subtask #167051 added
Actions #11

Updated by nicksinger 4 months ago

  • Subtask #168145 added
Actions #12

Updated by nicksinger 4 months ago

  • Subtask #168148 added
Actions #13

Updated by okurz 2 months ago

  • Subtask #170077 added
Actions #14

Updated by okurz 2 months ago

  • Subtask #173344 added
Actions #15

Updated by okurz 2 months ago

  • Subtask #173347 added
Actions #16

Updated by okurz 2 months ago

  • Subtask #173350 added
Actions #17

Updated by okurz 2 months ago

  • Subtask #173353 added
Actions #18

Updated by okurz 2 months ago

  • Subtask #173674 added
Actions #19

Updated by okurz 22 days ago

  • Subtask #175407 added
Actions #20

Updated by okurz 22 days ago

  • Subtask #175629 added
Actions #21

Updated by okurz 21 days ago

  • Subtask #175686 added
Actions #22

Updated by okurz 21 days ago

  • Subtask #175689 added
Actions #23

Updated by okurz 21 days ago

  • Subtask #175695 added
Actions #24

Updated by okurz 21 days ago

  • Subtask #163790 added
Actions #25

Updated by okurz 21 days ago

  • Subtask #175707 added
Actions #26

Updated by okurz 21 days ago

  • Subtask #175710 added
Actions #27

Updated by jbaier_cz 21 days ago

  • Subtask #175740 added
Actions #28

Updated by okurz 18 days ago

  • Subtask #175791 added
Actions #29

Updated by okurz 18 days ago

  • Subtask #175851 added
Actions #30

Updated by okurz 17 days ago

  • Subtask #175848 added
Actions #31

Updated by okurz 17 days ago

  • Subtask #69718 added
Actions #32

Updated by okurz 15 days ago

  • Subtask #176013 added
Actions #33

Updated by gpuliti 15 days ago

  • Subtask deleted (#176013)
Actions #34

Updated by gpuliti 15 days ago

  • Subtask #176013 added
Actions #35

Updated by okurz 14 days ago

  • Subtask #176124 added
Actions #36

Updated by okurz 14 days ago

  • Subtask #176121 added
Actions #37

Updated by okurz 11 days ago

  • Subtask #176175 added
Actions #38

Updated by okurz 10 days ago

  • Subtask #176250 added
Actions #39

Updated by okurz 8 days ago

  • Subtask deleted (#175695)
Actions

Also available in: Atom PDF