action #19720

Simplify investigation of job failures

Added by okurz over 2 years ago. Updated 25 days ago.

Status:WorkableStart date:17/12/2019
Priority:NormalDue date:
Assignee:-% Done:

100%

Category:Feature requests
Target version:-
Difficulty:
Duration:

Description

motivation

Make job failure investigation easier to save time and ensure we do not miss failures

ideas

  • provide more data in the job logs itself for https://progress.opensuse.org/projects/openqav3/wiki#Further-decision-steps-working-on-test-issues
  • See https://github.com/okurz/openqa_review/tree/feature/investigate especially https://github.com/okurz/openqa_review/blob/feature/investigate/openqa_review/investigate.py
  • DONE: provide diff of failed job vs. "last good"
  • DONE: git log or diff for test+needle changes -> gh#os-autoinst/openQA/2566 and gh#os-autoinst/openQA/2609 for test diff, gh#os-autoinst/openQA/2625 for needles

    • list of changed files
  • os-autoinst version in vars.json

  • all package changes, e.g. save rpm -qa in file and provide diff and/or changelog

  • diff of test schedule

  • DONE: exclude context in vars.json diff, distinguish change and add/remove -> gh#os-autoinst/openQA#2625

  • DONE: exclude merges from test git log -> gh#os-autoinst/openQA#2625

  • if best needle candidate matches 0% it is most likely not a trivial needle issue

  • The Investigation tab should use CodeMirror to render diffs like we do for test sources or the YAML editor (from #61103)

  • Make "last good" a link to a job instead of plain job ID

  • collapse content of initial rows in investigation tab when content becomes too big, e.g. more than 10 lines

  • In settings table mark origin of settings and changed settings, e.g. for setting "foo" instead of the table row "foo | 1" one could have

    • "foo | 1 (testsuites table)" when the settings comes from the test suites database table, e.g. compared to job templates, machines, etc. . This would also help when we allow even more sources for settings, e.g. load job templates from test distributions in parallel to database tables
    • update the settings table from vars.json after job run to included changes but then show which settings changed since the job was initially created
    • "foo | 1 (+)" when the setting is new in the scenario, with the table row and/or "(+)" in green (as in common colored diffs) and on hover it shows the explanation that this was added, linked to the commit, showing which job it compares against
    • "foo | 1 (<->)" or similar when the setting changed against "last good" where it was e.g. 0, with "(<->)" being a link to the "last good" job, with the table row in different color

Subtasks

action #61103: Use CodeMirror to render diffs in the Investigation tabRejectedokurz


Related issues

Related to openQA Project - action #41057: [EPIC] Make reviewing results easier New 14/09/2018
Related to openQA Project - action #60560: Self-investigate potential reasons for failures in openQA Resolved 03/12/2019
Related to openQA Project - action #39719: [epic] Detect "known failures" and mark jobs as such Blocked 23/05/2018 31/12/2020

History

#1 Updated by okurz over 2 years ago

  • Priority changed from Normal to Low

needle git hash available in vars.json as well as in output of autoinst-log.txt. Also https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3141 can serve as a nice example how more information can be extracted from within the SUT by looking into details of y2log and popping up with a record_info and more details so that one does not need to transcribe screenshots or download the logfiles to look into details.

The following can help further to get details then:

import requests
test_url = 'https://openqa.suse.de/tests/1033327'
j = requests.get(test_url + '/file/details-yast2_lan_restart.json').json()
soft_failed_name = [i['text'] for i in j if 'title' in i.keys() and 'Soft Fail' in i['title']][0]
out = requests.get(test_url + '/file/' + soft_failed_name).content
print(out.decode('utf-8'))

-> # Soft Failure:
bsc#992113

#2 Updated by okurz over 2 years ago

asmorodskyi and me used that for getting parsing of soft-failure details into openqa-review, see https://github.com/okurz/openqa_review/commit/44119b13454e5bdb8609f99a21426434134035d0 and https://github.com/okurz/openqa_review/commit/4c4bc937fbf6e85e31eb1f2ca42a34e9603e26d3 for details

#3 Updated by okurz about 2 years ago

  • Assignee deleted (okurz)

Since the last update openqa-review improved by parsing soft-fail info boxes as well as soft-fail needles including reminder comments on tickets for these cases as well. Some further ideas are mentioned in the description for anyone to pick up.

#4 Updated by okurz 7 months ago

  • Status changed from In Progress to Workable

back to workable as it's not really "In Progress" now.

#5 Updated by okurz 3 months ago

  • Related to action #41057: [EPIC] Make reviewing results easier added

#6 Updated by okurz 2 months ago

  • Description updated (diff)

#7 Updated by okurz 2 months ago

  • Related to action #60560: Self-investigate potential reasons for failures in openQA added

#8 Updated by okurz 2 months ago

  • Status changed from Workable to Feedback
  • Assignee set to okurz
  • Target version set to Current Sprint

#9 Updated by okurz 2 months ago

  • Related to action #39719: [epic] Detect "known failures" and mark jobs as such added

#10 Updated by okurz 2 months ago

  • Description updated (diff)

#11 Updated by okurz 25 days ago

  • Description updated (diff)
  • Status changed from Feedback to Workable
  • Assignee deleted (okurz)
  • Target version deleted (Current Sprint)

idea from #61103 included. Last change was the HTML table. Some things crossed of the list. With this I unassign again.

#12 Updated by okurz 25 days ago

  • Description updated (diff)

#13 Updated by okurz 25 days ago

  • Description updated (diff)

Also available in: Atom PDF