action #58289
closedHuge amount of "Needle file .* not found where expected. Check /var/lib/openqa for distri symlinks" on o3 in /var/log/openqa
0%
Description
Observation¶
[2019-10-17T01:42:10.0358 UTC] [error] [pid:21333] Needle file /var/lib/openqa/share/42-installation_overview-Staging_Update-20190826.json not found where expected. Check /var/lib/openqa for distri symlinks
[2019-10-17T01:42:10.0364 UTC] [error] [pid:21333] Needle file /var/lib/openqa/share/inst-overview-gnome-leap-20161212.json not found where expected. Check /var/lib/openqa for distri symlinks
[2019-10-17T01:42:10.0371 UTC] [error] [pid:21333] Needle file /var/lib/openqa/share/inst-overview-gnome-leap-20180914.json not found where expected. Check /var/lib/openqa for distri symlinks
…
The needles are in /var/lib/openqa/share/tests/opensuse/products/opensuse/needles and should be searched there instead of "/var/lib/openqa/share/".
The first mention seems to be [2019-10-17T01:35:53.0168 UTC] [error] [pid:1946] Needle file /var/lib/openqa/share/grub2-TW-virtio-20190303.json not found where expected. Check /var/lib/openqa for distri symlinks
in /var/log/openqa.1.xz
The webUI was restarted at 01:00 UTC, so most likely the worker upgrade.
I received reports from openqa-logwarn trying to post 20MB emails to o3-admins@suse.de due to this, not sure if there are other impacts.
Problem¶
From /var/log/zypp/history:
2019-10-16 01:00:42|install|os-autoinst|4.5.1571127896.7bd3da32-lp151.193.1|x86_64||devel_openQA|1955dc706c08f41bdbe416d015a3913158b5109a5c0c65d079624f47ecbd6f4b|
2019-10-16 01:00:54|install|openQA-worker|4.6.1571122761.67cc75da9-lp151.1919.1|noarch||devel_openQA|3fb5de00ae76ea4012ae21b3f8951c1c15bdaab5059d3c2f0bfc8f01c7353065|
2019-10-17 01:50:51|install|os-autoinst|4.5.1571258068.dd114f84-lp151.195.1|x86_64||devel_openQA|ebb0d827dc41f42b7eff028260f164beaa09045f6e6175f01532867476facd54|
2019-10-17 01:51:00|install|openQA-worker|4.6.1571253176.1a322744e-lp151.1926.1|noarch||devel_openQA|1e3b706f34bfee6253d3cd399c00a20b1bf075f8a2945def74d53e87995f0170|
- os-autoinst: 7bd3da32..dd114f84
$ git log1 --no-merges 7bd3da32..dd114f84
5df73dd6 (okurz/fix/typo) needle: Fix typo 'parrent'
9ce54ebe Use $needle::needles_dir in needle downloader of developer mode
df9256f6 Log data and pool dir when running fullstack test
aabfa1ab Allow loading needles from current working directory
e1fb6561 Improve error handling when parsing needle JSON
0e6da28e Extend architecture.md to cover needle handling
- openQA: 67cc75da9..1a322744e
$ git log1 --no-merges 67cc75da9..1a322744e
916c45f5c PostgreSQL errors can be localized, so just use the name of the unique constraint
ce83ab943 (okurz/enhance/worker_reconnect) worker: Do not treat reconnect attempts as errors but with warning only
8811ad46c (Martchus/prevent-deadlocks) Remove wrong error handling code when sending ws messages
7506e0ae4 Prevent potential deadlocks in scheduler and ws server
81d318dd5 Hide old job templates editor for new groups
3172996fd (kraih/screenshots_resultset) Handle unique constraint correctly
51967db7f Add missing resultset for screenshots and make a few small optimizations
94afcda64 Drop -v flag on test runs and avoid noisy job "name"
e7c3f3cff clone job: Support specifying a port in host URL
I suspect os-autoinst aabfa1ab but it could be openQA 3172996fd as well.
I rolled back the workers with for i in aarch64 openqaworker1 openqaworker4 imagetester; do echo $i && ssh root@$i "transactional-update rollback last && reboot"; done
Updated by okurz about 5 years ago
sudo tail -f /var/log/openqa | grep 'not found where expected'
still shows a lot after worker rollback. So maybe the webui upgrade then?
zypper in --oldpackage /var/cache/zypp/packages/devel_openQA/noarch/openQA{,-common,-client}-4.6.1571122761.67cc75da9-lp151.1919.1.noarch.rpm
Loading repository data...
Reading installed packages...
Resolving package dependencies...
The following 3 packages are going to be downgraded:
openQA openQA-client openQA-common
3 packages to downgrade.
Overall download size: 2.4 MiB. Already cached: 0 B. After the operation, 1.1 KiB will be freed.
Continue? [y/n/v/...? shows all options] (y): d
The following 3 packages are going to be downgraded:
openQA
4.6.1571253176.1a322744e-lp151.1926.1 -> 4.6.1571122761.67cc75da9-lp151.1919.1 noarch
Plain RPM files cache obs://build.opensuse.org/devel:openQA
openQA-client
4.6.1571253176.1a322744e-lp151.1926.1 -> 4.6.1571122761.67cc75da9-lp151.1919.1 noarch
Plain RPM files cache obs://build.opensuse.org/devel:openQA
openQA-common
4.6.1571253176.1a322744e-lp151.1926.1 -> 4.6.1571122761.67cc75da9-lp151.1919.1 noarch
Plain RPM files cache obs://build.opensuse.org/devel:openQA
… still happening
Updated by okurz about 5 years ago
- Status changed from In Progress to Workable
- Assignee deleted (
okurz)
Updated by Guillaume_G about 5 years ago
May be related: we cannot access any needle from openQA web interface: click on "Screenshot" list on https://openqa.opensuse.org/tests/1059301#step/setup_zdup/4 and you have no image listed.
Updated by okurz about 5 years ago
- Status changed from Workable to In Progress
- Assignee set to okurz
@Guillaume_G yes, I think that's the same problem.
I think we found it. Thanks to mkittler and nsinger for the quick mob debug session. This is fun :)
It was the commit I suspected, just that the rollback on aarch64 was incomplete
https://github.com/os-autoinst/os-autoinst/pull/1233 for the revert-fix
The faulty commit changed the needle path that is sent from the worker to the webUI to only mention the file name so that the webUI can not reference the screenshots correctly anymore.
Updated by okurz about 5 years ago
- Related to action #56789: New needles from git repository not working with openqa-clone-custom-git-refspec added
Updated by okurz about 5 years ago
- Status changed from In Progress to Feedback
- Priority changed from Urgent to Normal
With the revert merged the next nightly update should also be fine. The workers are currently rolled back to the old version so also ok. Urgency removed. I can check correct state of webUI and worker tomorrow again and leave the rest of the work to #56789
Updated by okurz about 5 years ago
- Status changed from Feedback to Resolved
workers do not exactly look ok, see #58403 , but the rest is fine