we are trying to debug openQA multi-machine tests on openqaworker-arm21+arm22 and struggle to come up with a manual qemu command line that makes the machine boot. I tried e.g. /usr/bin/qemu-system-aarch64 -device virtio-gpu-pci,edid=on,xres=1024,yres=768 -m 2048 -machine virt,gic-version=host -cpu host -mem-prealloc -mem-path /dev/hugepages/ -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:19:34:56 -enable-kvm -no-shutdown -vnc :91,share=force-shared
but when connecting over VNC I only see that the guest has not initialized the display yet. Any ideas what's missing?
openqa-clone-job --parental-inheritance --within-instance https://openqa.opensuse.org/tests/3815206 PAUSE_AT=rescuesystem_validate_131 {BUILD,TEST}+=-poo150920 _GROUP=0
rescue_system@aarch64 -> https://openqa.opensuse.org/tests/3816230
and to crosscheck with warm21
openqa-clone-job --parental-inheritance --within-instance https://openqa.opensuse.org/tests/3815206 PAUSE_AT=rescuesystem_validate_131 {BUILD,TEST}+=-poo150920 _GROUP=0 WORKER_CLASS=openqaworker-arm21
Both tests start up just fine but the developer mode actually can not be connected over the webUI, for both of them. Crosschecked with another machine and https://openqa.opensuse.org/tests/3816232#live for openqaworker24 is just fine. Aso both tests are not related to multi-machine and actually don't configure network in the rescue mode those are not helping towards debugging network within the hosts but the question is why the hosts do not allow to reach the developer mode. Well, in the end mkittler fixed it. Likely wrong combinations of os-autoinst-setup-multi-machine and parameters were triggered.
So let's try developer mode and the only real multi-machine scenario on aarch64
openqa-clone-job --parental-inheritance --within-instance https://openqa.opensuse.org/tests/3815268 PAUSE_AT=libzypp_config {BUILD,TEST}+=-poo150920 _GROUP=0 WORKER_CLASS=openqaworker-arm21
-> https://openqa.opensuse.org/tests/3816247
but I didn't follow up on the same day so eventually the job ran into timeout as expected.
In the end mkittler fixed the config at least on warm21 with
firewall-cmd --zone public --remove-interface=eth0
firewall-cmd --zone trusted --add-interface=eth0
I assume that nicksinger executed os-autoinst-setup-multi-machine but did not see (or ignore) error messages about the inability to add eth0 to trusted as it was already in public according to #150920-22 ?
Anyway now the developer mode works fine and so far also I assume multi-machine tests on warm21 so I added back "tap" to warm21:/etc/openqa/workers.ini and I am monitoring jobs.
Due to the new worker class there was a sudden rise of jobs and also jobs ending up incomplete with "Reason: cache failure: Cache service queue already full (5)" which is unfortunate. I now increased the cache size which is at least slightly related and might help. On both warm21+warm22 we have a 6TB NVMe from which we run most mount points including /var/lib/openqa/cache with enough free space so I set
# Limit size of CACHEDIRECTORY to the specified value in GiB (50 GiB by default)
CACHELIMIT = 1000
Multiple multi-machine tests now passed including
so I assume it's ok if we keep tap. Now to warm22.
openqa-clone-job --parental-inheritance --within-instance https://openqa.opensuse.org/tests/3815268 {BUILD,TEST}+=-poo150920 _GROUP=0 WORKER_CLASS=openqaworker-arm22
openqa-clone-job --skip-chained-deps --parental-inheritance --within-instance https://openqa.opensuse.org/tests/3818743 {BUILD,TEST}+=-poo150920 _GROUP=0 WORKER_CLASS=openqaworker-arm22
yep, also fine.
So let's check across both warm21+warm22:
openqa-clone-job --skip-chained-deps --parental-inheritance --within-instance https://openqa.opensuse.org/tests/3818743 {BUILD,TEST}+=-poo150920 _GROUP=0 WORKER_CLASS:ovs-server=openqaworker-arm22 WORKER_CLASS:ovs-client=openqaworker-arm21
Both successful so enabled "tap" for production on openqaworker-arm22 again as well. With that we can resolve here.