action #96010
closed[qem] test fails in hawk_gui acquiring a lock as the support server ended prematurely after a '503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children'
Description
Observation¶
openQA test in scenario sle-15-SP1-Server-DVD-HA-Incidents-x86_64-qam_ha_hawk_client@64bit fails in
hawk_gui
Test suite description¶
The base test suite is used for job templates defined in YAML documents. It has no settings of its own.
Reproducible¶
Fails since (at least) Build :20208:fence-agents
Expected result¶
Last good: :20487:novnc (or more recent)
Further details¶
Always latest result in this scenario: latest
The test is failing in several steps, the most common is this one from the ticket. Need more research to determine what the problem is.
Updated by okurz about 3 years ago
- Project changed from openQA Tests to openQA Project
- Subject changed from [qem] test fails in hawk_gui to [qem] test fails in hawk_gui acquiring a lock as the support server ended prematurely after a '503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children'
- Category changed from Bugs in existing tests to Regressions/Crashes
- Status changed from New to In Progress
- Assignee set to okurz
- Priority changed from Normal to High
- Target version set to Ready
not seen in before but what I see here:
[2021-07-26T12:47:42.719 CEST] [debug] Waiting for 3 jobs to finish
[2021-07-26T12:47:43.734 CEST] [debug] get_children: 503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children
[2021-07-26T12:47:43.734 CEST] [debug] Waiting for 0 jobs to finish
so it looks like the worker failed to reach osd and treated that as not needing to wait anymore. I will take a deeper look
Updated by okurz about 3 years ago
- Due date set to 2021-08-09
- Status changed from In Progress to Feedback
Updated by okurz about 3 years ago
https://github.com/os-autoinst/os-autoinst/pull/1730 merged, should wait some days to see if there is any effect
Updated by okurz about 3 years ago
So in the above PR I added a fix, better logging output, more test coverage. What could we do about monitoring or our processes?
Updated by okurz about 3 years ago
- Status changed from Feedback to Resolved
no problems identified after deployment. Suggested follow-up improvement: #96191
Updated by dzedro about 3 years ago
- Status changed from Resolved to Feedback
This failures are happening "lately every day".
[2021-09-17T08:06:54.202 CEST] [debug] api_call_2 failed, retries left: 2 of 3
[2021-09-17T08:06:57.208 CEST] [debug] api_call_2 failed, retries left: 1 of 3
[2021-09-17T08:07:00.213 CEST] [debug] api_call_2 failed, retries left: 0 of 3
[2021-09-17T08:07:03.214 CEST] [debug] get_children: 503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children
https://openqa.suse.de/tests/7146596
https://openqa.suse.de/tests/7146593
https://openqa.suse.de/tests/7146604
https://openqa.suse.de/tests/7146614
https://openqa.suse.de/tests/7146646
Updated by livdywan about 3 years ago
Looks the same indeed. Tho I'm surprised to see this after 2 months - did this go unnoticed or did it only start happening again? 🤔️
Updated by dzedro about 3 years ago
cdywan wrote:
Looks the same indeed. Tho I'm surprised to see this after 2 months - did this go unnoticed or did it only start happening again? 🤔️
I would say it started to happen again, I didn't see this kind of fail in between last 2 months.
Today only one fail on aggregates. https://openqa.suse.de/tests/7170466
Updated by okurz about 3 years ago
- Copied to action #98940: mmapi calls can still fail despite retries added
Updated by okurz about 3 years ago
- Due date deleted (
2021-08-09) - Status changed from Feedback to Resolved
I will try with longer and more retries in #98940