[sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service size:M
In this test run https://openqa.opensuse.org/tests/1857293#step/openqa_webui/33 the is a problem. Seems that the server is not responsive
# Test died: command 'while ! [ -f nohup.out ]; do sleep 1 ; done && grep -qP "Listening at.*(127.0.0.1|localhost)" <(tail -f -n0 nohup.out) ' timed out at openqa//tests/install/openqa_webui.pm line 68.
[warn] [AssetPack] Unable to download https://cdnjs.cloudflare.com/ajax/libs/chosen/1.7.0/chosen.css: Connect timeout Could not find input asset "https://cdnjs.cloudflare.com/ajax/libs/chosen/1.7.0/chosen.css". at /usr/lib/perl5/vendor_perl/5.32.1/Mojolicious/Plugin/AssetPack.pm line 172.
which maybe is causing the problem, maybe not.
- AC 1: The above timeout does not appear again in at least 10 consecutive rounds
- Crosscheck in a passed test if the asset connect timeout warning also shows up to prevent us following a "red herring"
- DONE: Check if the above download URL from asset definitions can work -> the link https://cdnjs.cloudflare.com/ajax/libs/chosen/1.7.0/chosen.css works
- Try to reproduce locally as well as use https://progress.opensuse.org/projects/openqatests/wiki/Wiki#Statistical-investigation to get statistics of failures
- Prevent timeout either on low-level, e.g. asset preparation or high-level, e.g. retry within the openQA-in-openQA tests
#2 Updated by okurz over 1 year ago
- Tags set to openqa-in-openqa, sporadic, tests
- Subject changed from [sporadic] Test openqa_from_git eventually fails because of a timeout waiting for webui service to [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service
#3 Updated by okurz over 1 year ago
- Subject changed from [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service to [sporadic][openqa-in-openqa] Test openqa_from_git eventually fails because of a timeout waiting for webui service size:M
- Description updated (diff)
- Status changed from New to Workable
#6 Updated by ilausuch over 1 year ago
I find a other related problem
[warn] [AssetPack] Unable to download https://cdn.datatables.net/1.10.16/css/dataTables.bootstrap4.css: Connect timeout Could not find input asset "https://cdn.datatables.net/1.10.16/css/dataTables.bootstrap4.css". at /usr/lib/perl5/vendor_perl/5.32.1/Mojolicious/Plugin/AssetPack.pm line 172.
I didn't find more in the last 427 entries
#10 Updated by ilausuch over 1 year ago
The PR is merged, should be interesting to wait for any CDN failure to check it, but due to this a is sporadic, not frequent problem, will be very difficult to check if the problem is solved in production. However, something that we can check if we have in the future e a problem in the same place in spite of the retry. For this reason I think this ticket could be resolved
#11 Updated by okurz over 1 year ago
https://github.com/os-autoinst/openQA/pull/4094 is solved. This will make it immediately applicable for new tests of openQA-in-openQA. I agree that with this we do not need to test any further but just react on any potential upcoming alerts. But please keep the point "For regressions: A regression fix is provided, flaws in the design, monitoring, process have been considered" from our https://progress.opensuse.org/projects/qa/wiki#Definition-of-DONE in mind.
#12 Updated by ilausuch over 1 year ago
I would like to cover, from my point of view the points for regressions:
- fix is provided: We can consider than yes. The problem detected comes from a the inability to connect to the CDN and get the assets. There are some aspects that we cannot control, the accessibility of the CDN and the internet connection. With this PR we introduced a retry process being optimistic and expecting that the problem will be solved automatically.
- flaws in the design:
- In a meeting we considered to include the assets to ensure that we download that in a previous process. At the beginning of this conversation the idea of include that in the RPM was on the air, but this problem is not using the RPM it uses the git REPO. But in my opinion, include the assets in the REPO is not a good idea because, for instance, will generate a lot needs of update these assets. By the other hand we could use an internal CDN or cache.
- The error described drives to a confusion, should be more clear that the assets packaging failed, avoiding the next steps of the steps and showing a clear error
- monitoring: I don't think this is applicable here.