Project

General

Profile

Actions

action #111755

closed

power8 fails to execute jobs successfully, no kvm, but also no sshd auto_review:"(?s)power8.*no kvm-img/qemu-img found":retry

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2022-05-30
Due date:
% Done:

0%

Estimated time:

Description

Observation

Brought up by maxlin in https://matrix.to/#/!ilXMcHXPOjTZeauZcg:libera.chat/$KnzYkYddVbHWHofrb5F3EEl1haJ_imSCFatVCfL1Jp8

https://openqa.opensuse.org/tests/2393439
looks like qemu was removed by an upgrade gone wrong. If you or anyone else can login I suggest to look into the zypper log what has gone wrong and install missing packages. Also can't log in over ssh

Steps to reproduce

Find jobs referencing this ticket with the help of
https://raw.githubusercontent.com/os-autoinst/scripts/master/openqa-query-for-job-label ,
call openqa-query-for-job-label poo#111755

Actions #1

Updated by okurz over 2 years ago

  • Subject changed from power8 fails to execute jobs successfully, no kvm, but also no sshd to power8 fails to execute jobs successfully, no kvm, but also no sshd auto_review:"(?s)power8.*no kvm-img/qemu-img found":retry
  • Description updated (diff)

Reinstalled openssh-server, enabled the service. Could login over ssh again. Installed qemu-tools. Added auto-review regex.

Actions #2

Updated by okurz over 2 years ago

zypper has a problem with the ppc repo-oss:

# zypper se -t pattern
Problem retrieving files from 'repo-oss'.
Download (curl) error for 'http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/repodata/repomd.xml':
Error code: HTTP response: 425
Error message: The requested URL returned error: 425 Too Early

I suspect that the "--force-resolution" option in our openQA upgrade scripting is just too dangerous and we should remove it.

Actions #3

Updated by okurz over 2 years ago

After triggering a reboot the system did not boot and was stuck in petitboot. Well, there was no kernel so that is understandable. I did

for i in dev sys proc ; do mount -o bind /$i /var/petitboot/mnt/dev/sda3/$i; done
ln -sf /etc/resolv.conf /var/petitboot/mnt/dev/sda3/etc
chroot /var/petitboot/mnt/dev/sda3

Then manually loaded files for the "repo-oss" with something like

curl https://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/repodata/repomd.xml > /var/cache/zypp/raw/repo-oss/repodata/repomd.xml

and another one that zypper -n in wget mentioned, with a hashsum included.

Then zypper --no-refresh -n in kernel-default worked. Also did zypper --no-refresh -n in -t pattern kvm_server base and triggered a reboot.

Now system booted up to the point of showing a getty login prompt on IPMI SOL but wicked seems to be not installed.
So did

ip link set dev eth4 up
ip addr add 192.168.112.2/24 dev eth4
ip route add default via 192.168.112.254
echo -e 'search openqanet.opensuse.org
nameserver 192.168.112.100
' >> /etc/resolv.conf

After that installed wicked and reinstalled os-autoinst and openQA to ensure all deps are there. Then openQA jobs immediately started to be picked up. But I rebooted the machine anyway to check if it is stable. Also did failed_since=2022-05-25 worker=power8 ./openqa-advanced-retrigger-jobs.

Now monitoring jobs.

Actions #4

Updated by okurz over 2 years ago

Found more jobs that still failed trying to sync stuff. rsync was there but a dependency was missing. Did zypper -n in --force rsync which reinstalled dependencies to fix this. Well, now jobs are fine like e.g. https://openqa.opensuse.org/tests/2395231#

Also did https://github.com/os-autoinst/openQA/pull/4678 and merged so same problem shouldn't happen again.

But the problem with HTTP response 425 still happens when I do zypper ref on power8, asking around:

hi, on the machine power8.openqa.opensuse.org I am seeing

Download (curl) error for 'http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/repodata/repomd.xml':
Error code: HTTP response: 425
Error message: The requested URL returned error: 425 Too Early

on zypper ref. curl of the mentioned file itself looks fine locally for me as well as on the host but zypper can't read it. Removing the repo with zypper rr and adding back does not help, neither does switching to https. Anyone has an idea?

EDIT: Answer by anikitin in https://suse.slack.com/archives/C028VS8TM2B/p1653917247360799?thread_ts=1653916397.654209&cid=C028VS8TM2B, nothing conclusive yet.

Actions #5

Updated by openqa_review over 2 years ago

  • Due date set to 2022-06-14

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by okurz over 2 years ago

  • Description updated (diff)
  • Due date deleted (2022-06-14)
  • Status changed from In Progress to Resolved

Clarified with anikitin. He confirmed that the observed behaviour is a bug in the mirror infrastructure code. He applied a workaround and will look into a proper fix eventually. I can confirm that zypper ref works fine now.

Actions

Also available in: Atom PDF