Project

General

Profile

Actions

action #95995

closed

[sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service size:M

Added by ilausuch almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2021-07-26
Due date:
2021-08-12
% Done:

0%

Estimated time:

Description

Motivation

In this test run https://openqa.opensuse.org/tests/1857293#step/openqa_webui/33 the is a problem. Seems that the server is not responsive

# Test died: command 'while ! [ -f nohup.out ]; do sleep 1 ; done && grep -qP "Listening at.*(127.0.0.1|localhost)" <(tail -f -n0 nohup.out) ' timed out at openqa//tests/install/openqa_webui.pm line 68.

https://openqa.opensuse.org/tests/1857293/logfile?filename=openqa_webui-openqa_nohup_out.txt shows

[warn] [AssetPack] Unable to download https://cdnjs.cloudflare.com/ajax/libs/chosen/1.7.0/chosen.css: Connect timeout
Could not find input asset "https://cdnjs.cloudflare.com/ajax/libs/chosen/1.7.0/chosen.css". at /usr/lib/perl5/vendor_perl/5.32.1/Mojolicious/Plugin/AssetPack.pm line 172.

which maybe is causing the problem, maybe not.

Acceptance Criteria

  • AC 1: The above timeout does not appear again in at least 10 consecutive rounds

Suggestions

Actions #1

Updated by okurz almost 3 years ago

  • Priority changed from Normal to Urgent
  • Target version set to Ready
Actions #2

Updated by okurz almost 3 years ago

  • Tags set to openqa-in-openqa, sporadic, tests
  • Subject changed from [sporadic] Test openqa_from_git eventually fails because of a timeout waiting for webui service to [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service
Actions #3

Updated by okurz almost 3 years ago

  • Subject changed from [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service to [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by okurz almost 3 years ago

  • Description updated (diff)
Actions #5

Updated by kraih almost 3 years ago

Looking through next and previous results (which are all green), i did not see any traces of other asset download errors.

Actions #6

Updated by ilausuch almost 3 years ago

I find a other related problem
https://openqa.opensuse.org/tests/1831843/logfile?filename=openqa_webui-openqa_nohup_out.txt

[warn] [AssetPack] Unable to download https://cdn.datatables.net/1.10.16/css/dataTables.bootstrap4.css: Connect timeout
Could not find input asset "https://cdn.datatables.net/1.10.16/css/dataTables.bootstrap4.css". at /usr/lib/perl5/vendor_perl/5.32.1/Mojolicious/Plugin/AssetPack.pm line 172.

I didn't find more in the last 427 entries

Actions #7

Updated by ilausuch over 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to ilausuch
Actions #8

Updated by openqa_review over 2 years ago

  • Due date set to 2021-08-12

Setting due date based on mean cycle time of SUSE QE Tools

Actions #10

Updated by ilausuch over 2 years ago

The PR is merged, should be interesting to wait for any CDN failure to check it, but due to this a is sporadic, not frequent problem, will be very difficult to check if the problem is solved in production. However, something that we can check if we have in the future e a problem in the same place in spite of the retry. For this reason I think this ticket could be resolved

Actions #11

Updated by okurz over 2 years ago

https://github.com/os-autoinst/openQA/pull/4094 is solved. This will make it immediately applicable for new tests of openQA-in-openQA. I agree that with this we do not need to test any further but just react on any potential upcoming alerts. But please keep the point "For regressions: A regression fix is provided, flaws in the design, monitoring, process have been considered" from our https://progress.opensuse.org/projects/qa/wiki#Definition-of-DONE in mind.

Actions #12

Updated by ilausuch over 2 years ago

I would like to cover, from my point of view the points for regressions:

  • fix is provided: We can consider than yes. The problem detected comes from a the inability to connect to the CDN and get the assets. There are some aspects that we cannot control, the accessibility of the CDN and the internet connection. With this PR we introduced a retry process being optimistic and expecting that the problem will be solved automatically.
  • flaws in the design:
    1. In a meeting we considered to include the assets to ensure that we download that in a previous process. At the beginning of this conversation the idea of include that in the RPM was on the air, but this problem is not using the RPM it uses the git REPO. But in my opinion, include the assets in the REPO is not a good idea because, for instance, will generate a lot needs of update these assets. By the other hand we could use an internal CDN or cache.
    2. The error described drives to a confusion, should be more clear that the assets packaging failed, avoiding the next steps of the steps and showing a clear error
  • monitoring: I don't think this is applicable here.
Actions #13

Updated by okurz over 2 years ago

  • Status changed from In Progress to Resolved

I agree. Additionally you as a member of the team found and reported the issue which means that the user impact is not applicable here so I guess we are good.

Actions

Also available in: Atom PDF