Matthias Griessmeier 1 day ago @Anton Smorodskyi if it is this crucial, would it make sense to establish a fallback somewhere/somehow? Anton Smorodskyi 1 day ago we have ansible playbook allowing to setup everything in any random host within half-an-hour Anton Smorodskyi 1 day ago beside whole setup process which is fully automated we have two major time consuming activities : Switching DNS name from one IP to another finding actually this "random host" Anton Smorodskyi 1 day ago I think (1) is fully under control of SUSE IT and we can not do more than creating Jira SD ticket and escalate Anton Smorodskyi 1 day ago and you can not do this beforehand :) Anton Smorodskyi 1 day ago for (2) we can book some machine somewhere .... but I am afraid that there is high probability that in event of emergency this host will be unavailable for some random reason :man-shrugging: Anton Smorodskyi 1 day ago so I don't mind doing something about (2) but it won't make ME more confident than now :) but if it will make someone else more confident I don't mind :) Anton Smorodskyi 1 day ago we have ansible playbook allowing to setup everything in any random host within half-an-hour CORRECTION : fully automated except one piece of the puzzle - cloud providers creds ... For this we have dedicated project but currently it can propagate only AWS creds ... Anton Smorodskyi 1 day ago let's have a call about this ? next week would be better ... Oliver Kurz 24 hours ago I'd say with a single VM you can only achieve a limited availability. If you need higher then you need to use high availability services. But maybe you can improve a little bit by putting the same A records into more zone files or something? :+1: 1 Anton Smorodskyi 24 hours ago long story short - without huge changes in current algorithm it is simply not possible to run 2 PCW instance with same configuration in parallel it will create a mess ... Anton Smorodskyi 24 hours ago also in my memory I don't remember any failures of PCW where having cluster of several PCW instances would change anything Anton Smorodskyi 24 hours ago for example in case of today's problem having a cluster will not improve situation :man-shrugging: Anton Smorodskyi 24 hours ago but +1 to your idea to propagate this DNS record to several different places ! Matthias Griessmeier 14 hours ago @Anton Smorodskyi wrt: it is pretty severe as it will affect ALL public cloud tests and we near time of triggering QAM tests .. This reads pretty severe, but obviously can happen from time to time, but what do you think about some automatism to detect stuff like this before triggering all tests, and stop the triggering with a warning. @Oliver Kurz I see this related to the topic we had recently discussed in FC with Santiago, wdyt? Matthias Griessmeier 14 hours ago (and I bet there is already some ticket about it) Oliver Kurz 14 hours ago I don't see a relation yet. Enlighten me. Given that I am not aware of a good process preventing releases on test coverage decrease or when tests are not triggered I am not convinced it's a good idea to not trigger certain tests at all. However what could be done is to make tests more dependant on each other, e.g. have one small and quick cloud smoke tests and trigger more downstream if the first is successful Matthias Griessmeier 14 hours ago yes, that's where I see the relation. e.g. trigger basic test to check connectivity, and if that fails, don't execute hundreds more which will 99% fail as well Matthias Griessmeier 14 hours ago I agree that we should not lose test coverage by not triggering certain tests, but also see no point in wasting computing resources and engineering resources for review failed tests which could be foreseen to fail. especially in PC where each run costs more money than "generic" openQA test (edited) Oliver Kurz 12 hours ago true true. Though I see that all of that can be resolved simply within the openQA test schedules with the current features, so, go ahead :slightly_smiling_face: There might be some feature requests coming up regarding reporting further down the road if such hierarchical schedules are used more. Liv Dywan 12 hours ago Reminds me of our conversations about making mm scenarios fail earlier for similar reasons. Less time spent investigating symptoms of infra issues. Having a test that fails early and prevents many more being scheduled. That's defintiely something we can already do. Oliver Kurz 12 hours ago Yes, true as well but again mostly within the domain of test maintainers Matthias Griessmeier 12 hours ago yes I agree, not a topic for qe-tools (for now) Anton Smorodskyi 10 hours ago true true. Though I see that all of that can be resolved simply within the openQA test schedules with the current features, so, go ahead :slightly_smiling_face: There might be some feature requests coming up regarding reporting further down the road if such hierarchical schedules are used more. yes true it can be done on test side , BUT doing it in such way will over complicate whole setup . Because currently we can not have set test dependencies among different flavors . Which means that just for PC we will need to setup dozens of such smoke tests among different combinations of flavor/version/arch . On other hand making such feature in backend will make it transparent to all tests Oliver Kurz 10 hours ago ok, good point. But if it's not possible to set test dependencies among flavors, what do you need flavors for? Maybe we can find an alternative for that? Anton Smorodskyi 9 hours ago this is really good question Oli , but it leads to big discussion which I will be happy to have but not in terms of this thread in Slack ;) huddle / google meet or in person meet would suit better Liv Dywan 9 hours ago Maybe you can keep it simple by starting with one flavor and take it from there. If there's clear limitations we can discuss extending the backend Oliver Kurz 9 hours ago @Anton Smorodskyi Sure, I am happy to meet with you and discuss that. Can you please still create a ticket with just a rough explanation of the problem so that we have a place where we can take notes and such. Feel welcome to copy-paste content from this thread into the ticket for context