Project

General

Profile

Actions

action #158266

closed

openQA jobs on diesel ppc64le fail due to auto_review:"QEMU: This is probably because your SMT is enabled."

Added by okurz about 2 months ago. Updated 24 days ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-03-29
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

From https://suse.slack.com/archives/C02CANHLANP/p1711700522125619

Warning: tests are failing on ppc64 worker host diesel around 5 hours ago, seem qemu VM can't start. https://openqa.suse.de/admin/workers/3393 https://openqa.suse.de/admin/workers/3388 https://openqa.suse.de/admin/workers/3390

autoinst-log.txt says

[2024-03-29T09:37:43.496499+01:00] [debug] [pid:18748] QEMU: error: kvm run failed Device or resource busy
[2024-03-29T09:37:43.496606+01:00] [debug] [pid:18748] QEMU: This is probably because your SMT is enabled.
[2024-03-29T09:37:43.496679+01:00] [debug] [pid:18748] QEMU: VCPU can only run on primary threads with all secondary threads offline.

There is the "smt_off" service https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/worker.sls?ref_type=heads#L263 to fix the problem regarding SMT. the service was running fine but I restarted the service and restarted https://openqa.suse.de/tests/13906928#live. But it seems it reproduces the problem.

Only diesel is affected, mania and petrol seem fine.

Suggestions

  • DONE ssh osd 'sudo salt-key -y -d diesel.qe.nue2.suse.org'
  • DONE ssh diesel.qe.nue2.suse.org 'sed -i 's/qemu_ppc64le,/qemu_ppc64le-poo158266,/' /etc/openqa/workers.ini && systemctl restart openqa-worker-auto-restart@{1..8} && systemctl disable --now salt-minion telegraf'
  • DONE host=openqa.suse.de WORKER=diesel result="result='failed'" comment="label:poo158266" ./openqa-advanced-retrigger-jobs
  • Investigate what is different on diesel vs. mania+petrol. Maybe mania+petrol are also affected but not noticed yet, maybe they haven't rebooted yet
  • Fix the problem, optionally wait for reported bug
  • verify
  • rollback

Rollback actions

  • DONE ssh diesel.qe.nue2.suse.org 'sed -i 's/qemu_ppc64le-poo158266,/qemu_ppc64le,/' /etc/openqa/workers.ini && systemctl restart openqa-worker-auto-restart@{1..8} && systemctl enable --now salt-minion telegraf'
  • DONE ssh osd 'sudo salt-key -y -a diesel.qe.nue2.suse.org'
  • ssh root@kerosene.qe.nue2.suse.org 'zypper rl powerpc-utils && zypper -n in powerpc-utils'
  • Revert https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/763
Actions

Also available in: Atom PDF