Actions
action #176250
opencoordination #161414: [epic] Improved salt based infrastructure management
file corruption in salt controlled config files size:M
Status:
Blocked
Priority:
Low
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
Multiple config files where somehow corrupted by salt or incompletely written. First #163790, then #175710, both on OSD. Also on monitor, see #176175 . okurz first assumed that this might be related to too high load on OSD while running both the salt master as well as the salt minion but as a similar problem appeared on monitor which is salt-minion only the salt-master alone can not be the problem. So far the problem has only happened on virtual machines (both OSD and monitor are VMs).
Acceptance Criteria¶
- AC1: We have consistent and stable application of config files managed by salt
Suggestions¶
- Try to reproduce the problem in a separate testing environment, e.g. single VM from https://download.opensuse.org/distribution/leap/15.6/appliances/openSUSE-Leap-15.6-Minimal-VM.x86_64-kvm-and-xen.qcow2 and apply local salt state from https://gitlab.suse.de/openqa/salt-states-openqa using the role webui and/or monitor while putting the VM under stress, e.g. with the application
stress-ng
- Read README from https://gitlab.suse.de/openqa/salt-states-openqa
- Run salt repeatedly with a command like
sudo nice env runs=300 count-fail-ratio salt --state-output=changes -C "*" state.apply queue=True | grep -v 'Result.*Clean' 2>&1 | tee -a salt_state.log
- Give another try at upstream research. So far okurz has not found anything related. Consider asking domain experts from the salt community.
- Maybe the problem is related to our rather outdated python+salt stack within our infrastructure (as we run Leap). So after you could reproduce the problem in a clean environment consider to run updated python and/or salt as applicable, e.g. try if you can also reproduce the problem within Tumbleweed.
- If the problem can not be reproduced in a synthetic environment then consider an idea from tinita: "adding another dummy.ini besides openqa.ini, with the same content, and in the loop calling salt.apply, the file is copied to a folder, so we have a list of files and can trace the changes between each call"
Actions