Project

General

Profile

action #92302

Updated by okurz almost 3 years ago

## Observation 

 The nfs mount point systemd unit failed recently (and then turned to ok again) on one of our ARM workers. Likely the problem happens when an ARM machine is rebooted multiple times so that eventually we hit an alert window 

 ## Acceptance criteria 
 * **AC1:** No alert about failed systemd services related to NFS mount failing on ARM workers 

 ## Suggestions 
 * Read the suggestion how to check reboot stability in https://progress.opensuse.org/projects/openqav3/wiki/Wiki#Best-practices-for-infrastructure-work 
 * Try to reproduce the problem by rebooting openqaworker-arm-1 or openqaworker-arm-2 in a loop and check if the alert is triggered or pending for long enough so that the alert would trigger 

 ## Rollback 

 * ssh openqaworker-arm-3 "sudo systemctl enable --now salt-minion" 
 * ssh osd "salt-key -y -a openqaworker-arm-3.suse.de && sudo salt 'openqaworker-arm-3*' state.apply" 
 * Unpause alerts for openqaworker-arm-3

Back