action #111755
closedpower8 fails to execute jobs successfully, no kvm, but also no sshd auto_review:"(?s)power8.*no kvm-img/qemu-img found":retry
0%
Description
Observation¶
Brought up by maxlin in https://matrix.to/#/!ilXMcHXPOjTZeauZcg:libera.chat/$KnzYkYddVbHWHofrb5F3EEl1haJ_imSCFatVCfL1Jp8
https://openqa.opensuse.org/tests/2393439
looks like qemu was removed by an upgrade gone wrong. If you or anyone else can login I suggest to look into the zypper log what has gone wrong and install missing packages. Also can't log in over ssh
Steps to reproduce¶
Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#111755
Updated by okurz over 2 years ago
- Subject changed from power8 fails to execute jobs successfully, no kvm, but also no sshd to power8 fails to execute jobs successfully, no kvm, but also no sshd auto_review:"(?s)power8.*no kvm-img/qemu-img found":retry
- Description updated (diff)
Reinstalled openssh-server, enabled the service. Could login over ssh again. Installed qemu-tools. Added auto-review regex.
Updated by okurz over 2 years ago
zypper has a problem with the ppc repo-oss:
# zypper se -t pattern
Problem retrieving files from 'repo-oss'.
Download (curl) error for 'http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/repodata/repomd.xml':
Error code: HTTP response: 425
Error message: The requested URL returned error: 425 Too Early
I suspect that the "--force-resolution" option in our openQA upgrade scripting is just too dangerous and we should remove it.
Updated by okurz over 2 years ago
After triggering a reboot the system did not boot and was stuck in petitboot. Well, there was no kernel so that is understandable. I did
for i in dev sys proc ; do mount -o bind /$i /var/petitboot/mnt/dev/sda3/$i; done
ln -sf /etc/resolv.conf /var/petitboot/mnt/dev/sda3/etc
chroot /var/petitboot/mnt/dev/sda3
Then manually loaded files for the "repo-oss" with something like
curl https://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/repodata/repomd.xml > /var/cache/zypp/raw/repo-oss/repodata/repomd.xml
and another one that zypper -n in wget
mentioned, with a hashsum included.
Then zypper --no-refresh -n in kernel-default
worked. Also did zypper --no-refresh -n in -t pattern kvm_server base
and triggered a reboot.
Now system booted up to the point of showing a getty login prompt on IPMI SOL but wicked seems to be not installed.
So did
ip link set dev eth4 up
ip addr add 192.168.112.2/24 dev eth4
ip route add default via 192.168.112.254
echo -e 'search openqanet.opensuse.org
nameserver 192.168.112.100
' >> /etc/resolv.conf
After that installed wicked and reinstalled os-autoinst and openQA to ensure all deps are there. Then openQA jobs immediately started to be picked up. But I rebooted the machine anyway to check if it is stable. Also did failed_since=2022-05-25 worker=power8 ./openqa-advanced-retrigger-jobs
.
Now monitoring jobs.
Updated by okurz over 2 years ago
Found more jobs that still failed trying to sync stuff. rsync was there but a dependency was missing. Did zypper -n in --force rsync
which reinstalled dependencies to fix this. Well, now jobs are fine like e.g. https://openqa.opensuse.org/tests/2395231#
Also did https://github.com/os-autoinst/openQA/pull/4678 and merged so same problem shouldn't happen again.
But the problem with HTTP response 425 still happens when I do zypper ref
on power8, asking around:
hi, on the machine power8.openqa.opensuse.org I am seeing
Download (curl) error for 'http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/repodata/repomd.xml': Error code: HTTP response: 425 Error message: The requested URL returned error: 425 Too Early
on
zypper ref
. curl of the mentioned file itself looks fine locally for me as well as on the host but zypper can't read it. Removing the repo withzypper rr
and adding back does not help, neither does switching to https. Anyone has an idea?
- Asked in https://matrix.to/#/!ilXMcHXPOjTZeauZcg:libera.chat/$_iXxEczzPNCLQyg4sJjgMGUgmMfG-stWExmriEXiDkQ?via=libera.chat&via=matrix.org&via=m4u.asia
- https://matrix.to/#/!FlfFztBGbhgNNtDbhN:libera.chat/$ekB-Vue7mpSFGH4oygMZejgNy1cipD_jwE8HMkObvGg?via=libera.chat&via=matrix.org&via=opensuse.org
- https://suse.slack.com/archives/C028VS8TM2B/p1653916397654209
EDIT: Answer by anikitin in https://suse.slack.com/archives/C028VS8TM2B/p1653917247360799?thread_ts=1653916397.654209&cid=C028VS8TM2B, nothing conclusive yet.
Updated by openqa_review over 2 years ago
- Due date set to 2022-06-14
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz over 2 years ago
- Description updated (diff)
- Due date deleted (
2022-06-14) - Status changed from In Progress to Resolved
Clarified with anikitin. He confirmed that the observed behaviour is a bug in the mirror infrastructure code. He applied a workaround and will look into a proper fix eventually. I can confirm that zypper ref
works fine now.