Project

General

Profile

coordination #9966

Updated by okurz over 4 years ago

## user story 
 As a tester reviewing failed tests in flaky environments (involving network, timeouts, reviewing webpages) we want flaky tests to retry automatically to not cause false positives in tests 

 ## acceptance criteria 
 * a test with spurious errors that is normally restarted by hand is restarted automatically 
 * the retry is reported as a "soft fail" (or another state not "fail" or "passed") instead of a "fail" as it would be without this change 

 ## implementation ideas 
 * the "retry" behaviour should be as low level as possible to save testing time but still be able to report the retry as a "soft fail" 
 * For every job that does not have a label or bugref, retrigger some times to see if it's sporadic within the same scenario. Like rescheduling on incomplete but on failed 


 ## further details 
 ### reasoning 
 see irc conversation: 
 ``` 
 <okurz> ancor: wait, shouldn't we try to reproduce it locally, maybe? 
 <ancor> okurz: I'm taking a look to the logs now, but it wouldn't be the first spurious error observed in openQA 
 <okurz> ancor: yes, I know but I like spurious errors to be better handled instead of just "let's retry, waste some time/build cycles, and see if it happens again" :-) 
 <ancor> okurz: I have been usually told than CPU cycles are cheaper than developer ones :-) 
 <ancor> machines don't get bored :-) 
 <okurz> ancor: of course you can restart but let's take it as good intentions for next year to handle spurious errors better :-) of course build cycles are cheaper but I am thinking about a better automatic spurious error detection, e.g. "retry if canditate for spurious" 
 <okurz> ancor: would waste even more build cycles but actually save more develper hours 
 <okurz> ancor: so I would actually like to optimize the time we need to detect spurious errors 
 ... 
 <ancor> okurz: anyway, looking at the logs. It really looks bad "Subprocess failed. Error: RPM failed: error: rpmdb: fsync: Read-only file system" 
 ... 
 ```

Back