action #96010: [qem] test fails in hawk_gui acquiring a lock as the support server ended prematurely after a '503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children' - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #96010

closed

[qem] test fails in hawk_gui acquiring a lock as the support server ended prematurely after a '503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children'

Added by martinsmac over 3 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

High

Assignee:

okurz

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2021-07-26

Due date:

% Done:

Estimated time:

Description

Observation¶

openQA test in scenario sle-15-SP1-Server-DVD-HA-Incidents-x86_64-qam_ha_hawk_client@64bit fails in
hawk_gui

Test suite description¶

The base test suite is used for job templates defined in YAML documents. It has no settings of its own.

Reproducible¶

Fails since (at least) Build :20208:fence-agents

Expected result¶

Last good: :20487:novnc (or more recent)

Further details¶

Always latest result in this scenario: latest

The test is failing in several steps, the most common is this one from the ticket. Need more research to determine what the problem is.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by okurz over 3 years ago

Project changed from openQA Tests (public) to openQA Project (public)
Subject changed from [qem] test fails in hawk_gui to [qem] test fails in hawk_gui acquiring a lock as the support server ended prematurely after a '503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children'
Category changed from Bugs in existing tests to Regressions/Crashes
Status changed from New to In Progress
Assignee set to okurz
Priority changed from Normal to High
Target version set to Ready

not seen in before but what I see here:

[2021-07-26T12:47:42.719 CEST] [debug] Waiting for 3 jobs to finish
[2021-07-26T12:47:43.734 CEST] [debug] get_children: 503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children
[2021-07-26T12:47:43.734 CEST] [debug] Waiting for 0 jobs to finish

so it looks like the worker failed to reach osd and treated that as not needing to wait anymore. I will take a deeper look

Actions

Copy link

Updated by okurz over 3 years ago

Due date set to 2021-08-09
Status changed from In Progress to Feedback

https://github.com/os-autoinst/os-autoinst/pull/1730

Actions

Copy link

Updated by okurz over 3 years ago

https://github.com/os-autoinst/os-autoinst/pull/1730 merged, should wait some days to see if there is any effect

Actions

Copy link

Updated by okurz over 3 years ago

So in the above PR I added a fix, better logging output, more test coverage. What could we do about monitoring or our processes?

Actions

Copy link

Updated by okurz over 3 years ago

Status changed from Feedback to Resolved

no problems identified after deployment. Suggested follow-up improvement: #96191

Actions

Copy link

Updated by dzedro over 3 years ago

Status changed from Resolved to Feedback

This failures are happening "lately every day".

[2021-09-17T08:06:54.202 CEST] [debug] api_call_2 failed, retries left: 2 of 3
[2021-09-17T08:06:57.208 CEST] [debug] api_call_2 failed, retries left: 1 of 3
[2021-09-17T08:07:00.213 CEST] [debug] api_call_2 failed, retries left: 0 of 3
[2021-09-17T08:07:03.214 CEST] [debug] get_children: 503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children

https://openqa.suse.de/tests/7146596
https://openqa.suse.de/tests/7146593
https://openqa.suse.de/tests/7146604
https://openqa.suse.de/tests/7146614
https://openqa.suse.de/tests/7146646

Actions

Copy link

Updated by livdywan over 3 years ago

Looks the same indeed. Tho I'm surprised to see this after 2 months - did this go unnoticed or did it only start happening again? 🤔️

Actions

Copy link

Updated by dzedro over 3 years ago

cdywan wrote:

Looks the same indeed. Tho I'm surprised to see this after 2 months - did this go unnoticed or did it only start happening again? 🤔️

I would say it started to happen again, I didn't see this kind of fail in between last 2 months.
Today only one fail on aggregates. https://openqa.suse.de/tests/7170466

Actions

Copy link

Updated by okurz over 3 years ago

Copied to action #98940: mmapi calls can still fail despite retries added

Actions

Copy link

#10

Updated by okurz over 3 years ago

Due date deleted (~~2021-08-09~~)
Status changed from Feedback to Resolved

I will try with longer and more retries in #98940

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #96010

[qem] test fails in hawk_gui acquiring a lock as the support server ended prematurely after a '503 response: Service Unavailable; URL was http://openqa.suse.de/api/v1/mm/children'

Observation¶

Test suite description¶

Reproducible¶

Expected result¶

Further details¶

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by dzedro over 3 years ago

Updated by livdywan over 3 years ago

Updated by dzedro over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago