action #95995
closed[sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service size:M
0%
Description
Motivation¶
In this test run https://openqa.opensuse.org/tests/1857293#step/openqa_webui/33 the is a problem. Seems that the server is not responsive
# Test died: command 'while ! [ -f nohup.out ]; do sleep 1 ; done && grep -qP "Listening at.*(127.0.0.1|localhost)" <(tail -f -n0 nohup.out) ' timed out at openqa//tests/install/openqa_webui.pm line 68.
https://openqa.opensuse.org/tests/1857293/logfile?filename=openqa_webui-openqa_nohup_out.txt shows
[warn] [AssetPack] Unable to download https://cdnjs.cloudflare.com/ajax/libs/chosen/1.7.0/chosen.css: Connect timeout
Could not find input asset "https://cdnjs.cloudflare.com/ajax/libs/chosen/1.7.0/chosen.css". at /usr/lib/perl5/vendor_perl/5.32.1/Mojolicious/Plugin/AssetPack.pm line 172.
which maybe is causing the problem, maybe not.
Acceptance Criteria¶
- AC 1: The above timeout does not appear again in at least 10 consecutive rounds
Suggestions¶
- Crosscheck in a passed test if the asset connect timeout warning also shows up to prevent us following a "red herring"
- DONE: Check if the above download URL from asset definitions can work -> the link https://cdnjs.cloudflare.com/ajax/libs/chosen/1.7.0/chosen.css works
- Try to reproduce locally as well as use https://progress.opensuse.org/projects/openqatests/wiki/Wiki#Statistical-investigation to get statistics of failures
- Prevent timeout either on low-level, e.g. asset preparation or high-level, e.g. retry within the openQA-in-openQA tests
Updated by okurz over 3 years ago
- Priority changed from Normal to Urgent
- Target version set to Ready
Updated by okurz over 3 years ago
- Tags set to openqa-in-openqa, sporadic, tests
- Subject changed from [sporadic] Test openqa_from_git eventually fails because of a timeout waiting for webui service to [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service
Updated by okurz over 3 years ago
- Subject changed from [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service to [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by kraih over 3 years ago
Looking through next and previous results (which are all green), i did not see any traces of other asset download errors.
Updated by ilausuch over 3 years ago
I find a other related problem
https://openqa.opensuse.org/tests/1831843/logfile?filename=openqa_webui-openqa_nohup_out.txt
[warn] [AssetPack] Unable to download https://cdn.datatables.net/1.10.16/css/dataTables.bootstrap4.css: Connect timeout
Could not find input asset "https://cdn.datatables.net/1.10.16/css/dataTables.bootstrap4.css". at /usr/lib/perl5/vendor_perl/5.32.1/Mojolicious/Plugin/AssetPack.pm line 172.
I didn't find more in the last 427 entries
Updated by ilausuch over 3 years ago
- Status changed from Workable to In Progress
- Assignee set to ilausuch
Updated by openqa_review over 3 years ago
- Due date set to 2021-08-12
Setting due date based on mean cycle time of SUSE QE Tools
Updated by ilausuch over 3 years ago
This is the PR https://github.com/os-autoinst/openQA/pull/4094
Updated by ilausuch over 3 years ago
The PR is merged, should be interesting to wait for any CDN failure to check it, but due to this a is sporadic, not frequent problem, will be very difficult to check if the problem is solved in production. However, something that we can check if we have in the future e a problem in the same place in spite of the retry. For this reason I think this ticket could be resolved
Updated by okurz over 3 years ago
https://github.com/os-autoinst/openQA/pull/4094 is solved. This will make it immediately applicable for new tests of openQA-in-openQA. I agree that with this we do not need to test any further but just react on any potential upcoming alerts. But please keep the point "For regressions: A regression fix is provided, flaws in the design, monitoring, process have been considered" from our https://progress.opensuse.org/projects/qa/wiki#Definition-of-DONE in mind.
Updated by ilausuch over 3 years ago
I would like to cover, from my point of view the points for regressions:
- fix is provided: We can consider than yes. The problem detected comes from a the inability to connect to the CDN and get the assets. There are some aspects that we cannot control, the accessibility of the CDN and the internet connection. With this PR we introduced a retry process being optimistic and expecting that the problem will be solved automatically.
- flaws in the design:
- In a meeting we considered to include the assets to ensure that we download that in a previous process. At the beginning of this conversation the idea of include that in the RPM was on the air, but this problem is not using the RPM it uses the git REPO. But in my opinion, include the assets in the REPO is not a good idea because, for instance, will generate a lot needs of update these assets. By the other hand we could use an internal CDN or cache.
- The error described drives to a confusion, should be more clear that the assets packaging failed, avoiding the next steps of the steps and showing a clear error
- monitoring: I don't think this is applicable here.
Updated by okurz over 3 years ago
- Status changed from In Progress to Resolved
I agree. Additionally you as a member of the team found and reported the issue which means that the user impact is not applicable here so I guess we are good.