[openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3 size:M
openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install+publish@64bit-2G fails in
due to empty response from internal openQA instance
Test suite description¶
Maintainer: email@example.com Test for installation of openQA itself. To be used with "openqa" distri. Publishes an qcow2 image including the openQA installation ready to run as an appliance.
Fails since (at least) Build :TW.16820 (current job)
https://openqa.opensuse.org/tests/3086954#comments shows that this is a sporadic issue.
Last good: :TW.16819 (or more recent)
- Catch the error if no jobs are found from an API query, maybe just
set -uin bash is enough to fail when using variables?
- Right now the test module is looking for openQA assets to find the "most recent Tumbleweed build" and then query jobs but it can very well happen that assets are created and registered before all openQA jobs are scheduled so this approach can fail. Find a better way in https://github.com/os-autoinst/os-autoinst-distri-openQA/blob/master/tests/osautoinst/start_test.pm to identify the latest job in a scenario and just clone that.
- If the above fails then consider adding a retry with waiting in between
- Ensure we're not waiting too long e.g. avoid waiting a day and spawning more jobs in-between
- The issue doesn't seem to be reproducible reliably
Always latest result in this scenario: latest
#1 Updated by mkittler about 2 months ago
Looks like not subsequent tests ran into the issue again (and there were a lot of them). The only failure is https://openqa.opensuse.org/tests/3099784#step/start_test/13 but it ran into a timeout instead of getting an empty reply.
Unfortunately the logs don't give any obvious insights why this API call didn't return any jobs.
#2 Updated by mkittler about 2 months ago
- Subject changed from [openqa-in-openqa][sporadic] test fails in start_test due to empty response from internal openQA instance to [openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3
… due to empty response from internal openQA instance
It is actually querying o3 here, not an internally spawned openQA instance. The command is clearly
openqa-cli api --host http://openqa.opensuse.org jobs …. So I'm changing the ticket title.
Note that this test is querying the latest TW build from o3 by querying assets. Then it attempts to find candidate jobs of that build (matching certain criteria). Then this job would be cloned. Here the build could be found but the query for candidate jobs returned no results. Not sure why that would be the case, though. Maybe the asset has already been registered (and thus showed up in the initial query) but jobs haven't been (visibly) scheduled yet (and thus the 2nd query returned no results)? That would be possible. Supposedly retrying the 2nd query would help then (considering https://openqa.opensuse.org/tests/overview?distri=opensuse&version=Tumbleweed&build=20230131&arch=x86_64 shows that jobs for this build have eventually been scheduled).
#3 Updated by cdywan about 2 months ago
- Subject changed from [openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3 to [openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3 size:M
- Description updated (diff)
- Status changed from New to Workable
#4 Updated by mkittler about 2 months ago
- Status changed from Workable to In Progress
- Assignee set to mkittler
#5 Updated by mkittler about 2 months ago
Draft PR (still need to do a verification run): https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/107
#6 Updated by openqa_review about 2 months ago
- Due date set to 2023-02-23
Setting due date based on mean cycle time of SUSE QE Tools
#7 Updated by mkittler about 2 months ago
- Status changed from In Progress to Feedback
I did a verification run. The PR should be good to merge now.
#8 Updated by mkittler about 1 month ago
- Status changed from Feedback to Resolved
We have currently other issues with the openQA-in-openQA test but I haven't seen this issue anymore.