action #95995
closed
[sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service size:M
Added by ilausuch over 3 years ago.
Updated over 3 years ago.
Category:
Regressions/Crashes
Description
Motivation¶
In this test run https://openqa.opensuse.org/tests/1857293#step/openqa_webui/33 the is a problem. Seems that the server is not responsive
# Test died: command 'while ! [ -f nohup.out ]; do sleep 1 ; done && grep -qP "Listening at.*(127.0.0.1|localhost)" <(tail -f -n0 nohup.out) ' timed out at openqa//tests/install/openqa_webui.pm line 68.
https://openqa.opensuse.org/tests/1857293/logfile?filename=openqa_webui-openqa_nohup_out.txt shows
[warn] [AssetPack] Unable to download https://cdnjs.cloudflare.com/ajax/libs/chosen/1.7.0/chosen.css: Connect timeout
Could not find input asset "https://cdnjs.cloudflare.com/ajax/libs/chosen/1.7.0/chosen.css". at /usr/lib/perl5/vendor_perl/5.32.1/Mojolicious/Plugin/AssetPack.pm line 172.
which maybe is causing the problem, maybe not.
Acceptance Criteria¶
- AC 1: The above timeout does not appear again in at least 10 consecutive rounds
Suggestions¶
- Priority changed from Normal to Urgent
- Target version set to Ready
- Tags set to openqa-in-openqa, sporadic, tests
- Subject changed from [sporadic] Test openqa_from_git eventually fails because of a timeout waiting for webui service to [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service
- Subject changed from [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service to [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service size:M
- Description updated (diff)
- Status changed from New to Workable
- Description updated (diff)
Looking through next and previous results (which are all green), i did not see any traces of other asset download errors.
I find a other related problem
https://openqa.opensuse.org/tests/1831843/logfile?filename=openqa_webui-openqa_nohup_out.txt
[warn] [AssetPack] Unable to download https://cdn.datatables.net/1.10.16/css/dataTables.bootstrap4.css: Connect timeout
Could not find input asset "https://cdn.datatables.net/1.10.16/css/dataTables.bootstrap4.css". at /usr/lib/perl5/vendor_perl/5.32.1/Mojolicious/Plugin/AssetPack.pm line 172.
I didn't find more in the last 427 entries
- Status changed from Workable to In Progress
- Assignee set to ilausuch
- Due date set to 2021-08-12
Setting due date based on mean cycle time of SUSE QE Tools
The PR is merged, should be interesting to wait for any CDN failure to check it, but due to this a is sporadic, not frequent problem, will be very difficult to check if the problem is solved in production. However, something that we can check if we have in the future e a problem in the same place in spite of the retry. For this reason I think this ticket could be resolved
I would like to cover, from my point of view the points for regressions:
- fix is provided: We can consider than yes. The problem detected comes from a the inability to connect to the CDN and get the assets. There are some aspects that we cannot control, the accessibility of the CDN and the internet connection. With this PR we introduced a retry process being optimistic and expecting that the problem will be solved automatically.
- flaws in the design:
- In a meeting we considered to include the assets to ensure that we download that in a previous process. At the beginning of this conversation the idea of include that in the RPM was on the air, but this problem is not using the RPM it uses the git REPO. But in my opinion, include the assets in the REPO is not a good idea because, for instance, will generate a lot needs of update these assets. By the other hand we could use an internal CDN or cache.
- The error described drives to a confusion, should be more clear that the assets packaging failed, avoiding the next steps of the steps and showing a clear error
- monitoring: I don't think this is applicable here.
- Status changed from In Progress to Resolved
I agree. Additionally you as a member of the team found and reported the issue which means that the user impact is not applicable here so I guess we are good.
Also available in: Atom
PDF