action #123864: [openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3 size:M - openQA Tests (public) - openSUSE Project Management Tool

Actions

Copy link

action #123864

closed

[openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3 size:M

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:

Resolved

Priority:

High

Assignee:

mkittler

Category:

Bugs in existing tests

Target version:

openQA Project (public) - Ready

Start date:

2023-02-01

Due date:

2023-02-23

% Done:

Estimated time:

Difficulty:

Description

Observation¶

openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install+publish@64bit-2G fails in
start_test
due to empty response from internal openQA instance

Test suite description¶

Maintainer: okurz@suse.de Test for installation of openQA itself. To be used with "openqa" distri. Publishes an qcow2 image including the openQA installation ready to run as an appliance.

Reproducible¶

Fails since (at least) Build :TW.16820 (current job)
https://openqa.opensuse.org/tests/3086954#comments shows that this is a sporadic issue.

Expected result¶

Last good: :TW.16819 (or more recent)

Suggestions¶

Catch the error if no jobs are found from an API query, maybe just set -u in bash is enough to fail when using variables?
Right now the test module is looking for openQA assets to find the "most recent Tumbleweed build" and then query jobs but it can very well happen that assets are created and registered before all openQA jobs are scheduled so this approach can fail. Find a better way in https://github.com/os-autoinst/os-autoinst-distri-openQA/blob/master/tests/osautoinst/start_test.pm to identify the latest job in a scenario and just clone that.

If the above fails then consider adding a retry with waiting in between
Ensure we're not waiting too long e.g. avoid waiting a day and spawning more jobs in-between
The issue doesn't seem to be reproducible reliably

Further details¶

Always latest result in this scenario: latest

Actions

Copy link

Updated by mkittler over 2 years ago

Looks like not subsequent tests ran into the issue again (and there were a lot of them). The only failure is https://openqa.opensuse.org/tests/3099784#step/start_test/13 but it ran into a timeout instead of getting an empty reply.

Unfortunately the logs don't give any obvious insights why this API call didn't return any jobs.

Actions

Copy link

Updated by mkittler over 2 years ago

Subject changed from [openqa-in-openqa][sporadic] test fails in start_test due to empty response from internal openQA instance to [openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3

… due to empty response from internal openQA instance

It is actually querying o3 here, not an internally spawned openQA instance. The command is clearly openqa-cli api --host http://openqa.opensuse.org jobs …. So I'm changing the ticket title.

Note that this test is querying the latest TW build from o3 by querying assets. Then it attempts to find candidate jobs of that build (matching certain criteria). Then this job would be cloned. Here the build could be found but the query for candidate jobs returned no results. Not sure why that would be the case, though. Maybe the asset has already been registered (and thus showed up in the initial query) but jobs haven't been (visibly) scheduled yet (and thus the 2nd query returned no results)? That would be possible. Supposedly retrying the 2nd query would help then (considering https://openqa.opensuse.org/tests/overview?distri=opensuse&version=Tumbleweed&build=20230131&arch=x86_64 shows that jobs for this build have eventually been scheduled).

Actions

Copy link

Updated by livdywan over 2 years ago

Subject changed from [openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3 to [openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3 size:M
Description updated (diff)
Status changed from New to Workable