action #158266
Updated by okurz 9 months ago
## Observation
From https://suse.slack.com/archives/C02CANHLANP/p1711700522125619
> Warning: tests are failing on ppc64 worker host diesel around 5 hours ago, seem qemu VM can't start. https://openqa.suse.de/admin/workers/3393 https://openqa.suse.de/admin/workers/3388 https://openqa.suse.de/admin/workers/3390
autoinst-log.txt says
```
[2024-03-29T09:37:43.496499+01:00] [debug] [pid:18748] QEMU: error: kvm run failed Device or resource busy
[2024-03-29T09:37:43.496606+01:00] [debug] [pid:18748] QEMU: This is probably because your SMT is enabled.
[2024-03-29T09:37:43.496679+01:00] [debug] [pid:18748] QEMU: VCPU can only run on primary threads with all secondary threads offline.
```
There is the "smt_off" service https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/worker.sls?ref_type=heads#L263 to fix the problem regarding SMT. the service was running fine but I restarted the service and restarted https://openqa.suse.de/tests/13906928#live. But it seems it reproduces the problem.
Only diesel is affected, mania and petrol seem fine.
## Suggestions
* *DONE* `ssh osd 'sudo salt-key -y -d diesel.qe.nue2.suse.org'`
* *DONE* `ssh diesel.qe.nue2.suse.org 'sed -i 's/qemu_ppc64le,/qemu_ppc64le-poo158266,/' /etc/openqa/workers.ini && systemctl restart openqa-worker-auto-restart@{1..8} && systemctl disable --now salt-minion telegraf'`
* *DONE* `host=openqa.suse.de WORKER=diesel result="result='failed'" comment="label:poo158266" ./openqa-advanced-retrigger-jobs`
* Investigate what is different on diesel vs. mania+petrol. Maybe mania+petrol are also affected but not noticed yet, maybe they haven't rebooted yet
* Fix the problem
* verify
* rollback
## Rollback actions
* `ssh diesel.qe.nue2.suse.org 'sed -i 's/qemu_ppc64le-poo158266,/qemu_ppc64le,/' /etc/openqa/workers.ini && systemctl restart openqa-worker-auto-restart@{1..8} && systemctl enable --now salt-minion telegraf'`
* `ssh osd 'sudo salt-key -y -a diesel.qe.nue2.suse.org'`
Back