Project

General

Profile

action #96010

[qem] test fails in hawk_gui acquiring a lock as the support server ended prematurely after a '503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children'

Added by martinsmac 3 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2021-07-26
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP1-Server-DVD-HA-Incidents-x86_64-qam_ha_hawk_client@64bit fails in
hawk_gui

Test suite description

The base test suite is used for job templates defined in YAML documents. It has no settings of its own.

Reproducible

Fails since (at least) Build :20208:fence-agents

Expected result

Last good: :20487:novnc (or more recent)

Further details

Always latest result in this scenario: latest

The test is failing in several steps, the most common is this one from the ticket. Need more research to determine what the problem is.


Related issues

Copied to openQA Project - action #98940: mmapi calls can still fail despite retriesResolved

History

#1 Updated by okurz 3 months ago

  • Project changed from openQA Tests to openQA Project
  • Subject changed from [qem] test fails in hawk_gui to [qem] test fails in hawk_gui acquiring a lock as the support server ended prematurely after a '503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children'
  • Category changed from Bugs in existing tests to Concrete Bugs
  • Status changed from New to In Progress
  • Assignee set to okurz
  • Priority changed from Normal to High
  • Target version set to Ready

not seen in before but what I see here:

[2021-07-26T12:47:42.719 CEST] [debug] Waiting for 3 jobs to finish
[2021-07-26T12:47:43.734 CEST] [debug] get_children: 503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children
[2021-07-26T12:47:43.734 CEST] [debug] Waiting for 0 jobs to finish

so it looks like the worker failed to reach osd and treated that as not needing to wait anymore. I will take a deeper look

#2 Updated by okurz 3 months ago

  • Due date set to 2021-08-09
  • Status changed from In Progress to Feedback

#3 Updated by okurz 3 months ago

https://github.com/os-autoinst/os-autoinst/pull/1730 merged, should wait some days to see if there is any effect

#4 Updated by okurz 3 months ago

So in the above PR I added a fix, better logging output, more test coverage. What could we do about monitoring or our processes?

#5 Updated by okurz 3 months ago

  • Status changed from Feedback to Resolved

no problems identified after deployment. Suggested follow-up improvement: #96191

#6 Updated by dzedro about 1 month ago

  • Status changed from Resolved to Feedback

This failures are happening "lately every day".

[2021-09-17T08:06:54.202 CEST] [debug] api_call_2 failed, retries left: 2 of 3
[2021-09-17T08:06:57.208 CEST] [debug] api_call_2 failed, retries left: 1 of 3
[2021-09-17T08:07:00.213 CEST] [debug] api_call_2 failed, retries left: 0 of 3
[2021-09-17T08:07:03.214 CEST] [debug] get_children: 503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children

https://openqa.suse.de/tests/7146596
https://openqa.suse.de/tests/7146593
https://openqa.suse.de/tests/7146604
https://openqa.suse.de/tests/7146614
https://openqa.suse.de/tests/7146646

#7 Updated by cdywan about 1 month ago

Looks the same indeed. Tho I'm surprised to see this after 2 months - did this go unnoticed or did it only start happening again? 🤔️

#8 Updated by dzedro about 1 month ago

cdywan wrote:

Looks the same indeed. Tho I'm surprised to see this after 2 months - did this go unnoticed or did it only start happening again? 🤔️

I would say it started to happen again, I didn't see this kind of fail in between last 2 months.
Today only one fail on aggregates. https://openqa.suse.de/tests/7170466

#9 Updated by okurz about 1 month ago

  • Copied to action #98940: mmapi calls can still fail despite retries added

#10 Updated by okurz about 1 month ago

  • Due date deleted (2021-08-09)
  • Status changed from Feedback to Resolved

I will try with longer and more retries in #98940

Also available in: Atom PDF