Project

General

Profile

action #120261

Updated by mkittler over 1 year ago

## Observation 
 openQA test in scenario sle-15-SP4-JeOS-for-kvm-and-xen-Updates-x86_64-jeos-kdump@svirt-xen-pv fails in 
 [image_info](https://openqa.suse.de/tests/9916769/modules/image_info/steps/7) 
 to access worker by WORKER_HOSTNAME FQDN which in this case is worker2.oqa.suse.de but here gets "worker2" instead. 

 It looks like the `WORKER_HOSTNAME` is really not configured correctly in those cases, e.g. when the same problem happened on worker6 yesterday there was really just "WORKER_HOSTNAME=worker6" in `workers.ini`. So this appears to be a problem on salt level where the FQDN grain doesn't return the actual fully qualified domain. On worker6 re-applying the salt states helped to get the full FQDN configured again. Rebooting the machine did *not* break it again. 

 ## Steps to reproduce 

 Find jobs referencing this ticket with the help of 
 https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label , 
 call `openqa-query-for-job-label poo#120261` 

 ## Acceptance criteria 
 * **AC1:** All recent jobs failing to upload to an incomplete worker hostname are retriggered and clones end up ok 
 * **AC2:** Jobs are able to upload logs after reboot of the worker machine 
 * **AC3:** Jobs still work just after a salt high state was applied 

 ## Acceptance tests 
 * **AT1-1:** `openqa-query-for-job-label poo#120261` returns no matches more recent than 48h 
 * **AT2-1:** Trigger reboot of the machine at least 2 times, trigger openQA tests (or wait for jobs to finish automatically) and verify that jobs succeed to upload logs 
 * **AT3-1:** Apply salt high state from OSD, trigger openQA tests (or await automatic results) and verify that jobs succeed to upload logs 

 ## Suggestions 
 * See what has been done in #109241 originally 
 * Maybe we need to specify the FQDN in /etc/hostname . If we do that then we should revisit all occurences of "grains['host']" in https://gitlab.suse.de/openqa/salt-states-openqa 
 * Check via `sudo salt -C 'G@roles:worker' cmd.run 'grep -i worker_hostname /etc/openqa/workers.ini'` on OSD whether all hostnames are configured correctly 
 * If all other options fail we can still revert to hardcoding IPv4 addresses but FQDN would be preferred 

 ## Rollback steps 
 * Add back worker2 to salt 

 ## Out of scope 
 Automatic distinction if the upload problem originates from test object misconfigurations, product regressions or problem within os-autoinst or openQA

Back