Project

General

Profile

Actions

coordination #9966

open

coordination #102915: [saga][epic] Automated classification of failures

[epic] Be more robust about spurious errors

Added by okurz over 8 years ago. Updated 2 months ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2015-12-18
Due date:
% Done:

0%

Estimated time:

Description

user story

As a tester reviewing failed tests in flaky environments (involving network, timeouts, reviewing webpages) we want flaky tests to retry automatically to not cause false positives in tests

acceptance criteria

  • a test with spurious errors that is normally restarted by hand is restarted automatically
  • the retry is reported as a "soft fail" (or another state not "fail" or "passed") instead of a "fail" as it would be without this change

implementation ideas

  • the "retry" behaviour should be as low level as possible to save testing time but still be able to report the retry as a "soft fail"
  • For every job that does not have a label or bugref, retrigger some times to see if it's sporadic within the same scenario. Like rescheduling on incomplete but on failed

further details

reasoning

see irc conversation:

<okurz> ancor: wait, shouldn't we try to reproduce it locally, maybe?
<ancor> okurz: I'm taking a look to the logs now, but it wouldn't be the first spurious error observed in openQA
<okurz> ancor: yes, I know but I like spurious errors to be better handled instead of just "let's retry, waste some time/build cycles, and see if it happens again" :-)
<ancor> okurz: I have been usually told than CPU cycles are cheaper than developer ones :-)
<ancor> machines don't get bored :-)
<okurz> ancor: of course you can restart but let's take it as good intentions for next year to handle spurious errors better :-) of course build cycles are cheaper but I am thinking about a better automatic spurious error detection, e.g. "retry if canditate for spurious"
<okurz> ancor: would waste even more build cycles but actually save more develper hours
<okurz> ancor: so I would actually like to optimize the time we need to detect spurious errors
...
<ancor> okurz: anyway, looking at the logs. It really looks bad "Subprocess failed. Error: RPM failed: error: rpmdb: fsync: Read-only file system"
...

Related issues 2 (1 open1 closed)

Related to openQA Project - action #13242: WDYT: For every job that does not have a label or bugref, retrigger some times to see if it's sporadic. Like rescheduling on incomplete but on failedRejectedokurz2016-11-25

Actions
Copied to openQA Project - action #155731: [brainstorm] Be more robust about spurious errorsNew

Actions
Actions #1

Updated by RBrownSUSE over 8 years ago

  • Checklist item changed from to [ ] SLE, [ ] Leap, [ ] TW
Actions #2

Updated by okurz over 7 years ago

  • Category set to Enhancement to existing tests
Actions #3

Updated by asmorodskyi almost 7 years ago

  • Subject changed from Be more robust about spurious errors to [tools] Be more robust about spurious errors
Actions #4

Updated by okurz over 4 years ago

  • Checklist item changed from [ ] SLE, [ ] Leap, [ ] TW to
  • Project changed from openQA Tests to openQA Project
  • Subject changed from [tools] Be more robust about spurious errors to Be more robust about spurious errors
  • Category changed from Enhancement to existing tests to Feature requests
  • Priority changed from Normal to Low
  • Target version set to future
Actions #5

Updated by okurz over 4 years ago

  • Related to action #13242: WDYT: For every job that does not have a label or bugref, retrigger some times to see if it's sporadic. Like rescheduling on incomplete but on failed added
Actions #6

Updated by okurz over 4 years ago

  • Description updated (diff)
Actions #7

Updated by okurz 3 months ago

  • Target version changed from future to Ready
Actions #8

Updated by okurz 2 months ago

  • Tracker changed from action to coordination
  • Subject changed from Be more robust about spurious errors to [epic] Be more robust about spurious errors
  • Target version changed from Ready to future
  • Parent task set to #102915
Actions #9

Updated by okurz 2 months ago

  • Copied to action #155731: [brainstorm] Be more robust about spurious errors added
Actions

Also available in: Atom PDF