Project

General

Profile

Actions

action #123864

closed

[openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3 size:M

Added by okurz about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
Start date:
2023-02-01
Due date:
2023-02-23
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_install+publish@64bit-2G fails in
start_test
due to empty response from internal openQA instance

Test suite description

Maintainer: okurz@suse.de Test for installation of openQA itself. To be used with "openqa" distri. Publishes an qcow2 image including the openQA installation ready to run as an appliance.

Reproducible

Fails since (at least) Build :TW.16820 (current job)
https://openqa.opensuse.org/tests/3086954#comments shows that this is a sporadic issue.

Expected result

Last good: :TW.16819 (or more recent)

Suggestions

  • Catch the error if no jobs are found from an API query, maybe just set -u in bash is enough to fail when using variables?
  • Right now the test module is looking for openQA assets to find the "most recent Tumbleweed build" and then query jobs but it can very well happen that assets are created and registered before all openQA jobs are scheduled so this approach can fail. Find a better way in https://github.com/os-autoinst/os-autoinst-distri-openQA/blob/master/tests/osautoinst/start_test.pm to identify the latest job in a scenario and just clone that.
  • If the above fails then consider adding a retry with waiting in between
  • Ensure we're not waiting too long e.g. avoid waiting a day and spawning more jobs in-between
  • The issue doesn't seem to be reproducible reliably

Further details

Always latest result in this scenario: latest

Actions #1

Updated by mkittler about 1 year ago

Looks like not subsequent tests ran into the issue again (and there were a lot of them). The only failure is https://openqa.opensuse.org/tests/3099784#step/start_test/13 but it ran into a timeout instead of getting an empty reply.

Unfortunately the logs don't give any obvious insights why this API call didn't return any jobs.

Actions #2

Updated by mkittler about 1 year ago

  • Subject changed from [openqa-in-openqa][sporadic] test fails in start_test due to empty response from internal openQA instance to [openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3

… due to empty response from internal openQA instance

It is actually querying o3 here, not an internally spawned openQA instance. The command is clearly openqa-cli api --host http://openqa.opensuse.org jobs …. So I'm changing the ticket title.

Note that this test is querying the latest TW build from o3 by querying assets. Then it attempts to find candidate jobs of that build (matching certain criteria). Then this job would be cloned. Here the build could be found but the query for candidate jobs returned no results. Not sure why that would be the case, though. Maybe the asset has already been registered (and thus showed up in the initial query) but jobs haven't been (visibly) scheduled yet (and thus the 2nd query returned no results)? That would be possible. Supposedly retrying the 2nd query would help then (considering https://openqa.opensuse.org/tests/overview?distri=opensuse&version=Tumbleweed&build=20230131&arch=x86_64 shows that jobs for this build have eventually been scheduled).

Actions #3

Updated by livdywan about 1 year ago

  • Subject changed from [openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3 to [openqa-in-openqa][sporadic] test fails in start_test due to empty response from o3 size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by mkittler about 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
Actions #5

Updated by mkittler about 1 year ago

Draft PR (still need to do a verification run): https://github.com/os-autoinst/os-autoinst-distri-openQA/pull/107

Actions #6

Updated by openqa_review about 1 year ago

  • Due date set to 2023-02-23

Setting due date based on mean cycle time of SUSE QE Tools

Actions #7

Updated by mkittler about 1 year ago

  • Status changed from In Progress to Feedback

I did a verification run. The PR should be good to merge now.

Actions #8

Updated by mkittler about 1 year ago

  • Status changed from Feedback to Resolved

We have currently other issues with the openQA-in-openQA test but I haven't seen this issue anymore.

Actions

Also available in: Atom PDF