action #29289

Improve handling of needle deprecation and deletion

Added by nicksinger over 2 years ago. Updated 4 months ago.

Status:WorkableStart date:12/12/2017
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Feature requests
Target version:QA - future
Difficulty:
Duration:

Description

Observation

While checking the latest logwarns I noticed the following behavior in openQA:

[Mon Dec 11 09:27:37 2017] [5339:warn] /var/lib/openqa/share/tests/sle/products/sle/needles/welcome-bsc1058099-20170915.json: No such file or directory
[…]
[Mon Dec 11 09:27:51 2017] [5228:warn] /var/lib/openqa/share/tests/sle/products/sle/needles/welcome-bsc1058099-20170915.json: No such file or directory
[…]
[Mon Dec 11 09:43:56 2017] [11050:warn] /var/lib/openqa/share/tests/sle/products/sle/needles/welcome-bsc1058099-20170915.json: No such file or directory
[Mon Dec 11 09:43:56 2017] [11050:error] Could not parse needle: welcome-bsc1058099-20170915 for sle 15

I guess this happens because somebody viewed a Job in the webUI which used the "welcome-bsc1058099-20170915" while the needle was already deleted.
Later on, this code: https://github.com/os-autoinst/openQA/blob/8f30ca8dc7bc0b9bbb48a6507110fa06dd335aff/lib/OpenQA/WebAPI/Controller/Step.pm#L244 tried to access the needle - therefore the error message.

Acceptance criteria

  • AC1: Deleted needles do not trigger an error or warning when viewing test details
  • AC2: Needles that had been matched during test runs but meanwhile deleted are obvious from the test details

Suggestions

  • Check if needles exists on the FS before accessing it
  • Properly handle non existent needles (e.g. do not blindly access them if they do not exist)
  • Notify the user somehow that this job used a needle which was deleted meanwhile
    • E.g. show in the needle selection: "99% old_needle_name (deleted meanwhile)"
    • Disable the needle-compare-function if the reference does not exist anymore
  • On top we could keep old needles around while tests are still running (which we effectively do with caching) when a deletion is "requested"

Further details

From nsinger:
For now I'll add these two log-messages to the ignore-list of logwarn since they don't show a critical error (99% of the time) we've to react upon.

From lnussel in #34549 :
I need to retire pattern-minimalx-checked-boo1086058-20180320 and replace it with a correct one.

openQA does not provide a way to do that properly. I'd have to delete it manually from git which would break old test results. Would be better if there was a way to flag the needle so it's no longer used for new matches.

and more recent discussion about this:

[13 Jun 2018 09:11:03] <coolo> hands up! how broke tty2-selected needles?
[13 Jun 2018 09:12:09] <okurz> coolo: could be me, link?
[13 Jun 2018 09:12:29] <coolo> okurz: https://openqa.suse.de/group_overview/108 - spot the difference
[13 Jun 2018 09:12:44] <coolo>  delete mode 100644 tty2-selected-20151201.json
[13 Jun 2018 09:12:44] <coolo>  delete mode 100644 tty2-selected-20151201.png
[13 Jun 2018 09:12:49] <coolo> looks related :)
[13 Jun 2018 09:14:19] <okurz> coolo: I can't find the needle tty2-selected-20180613 I created over the webui
[13 Jun 2018 09:14:24] <coolo> so you didn't even do a MR for it?
[13 Jun 2018 09:14:58] <okurz> coolo: it's on osd. It's in a commit but that one has not been pushed
[13 Jun 2018 09:15:08] <okurz> commit df6b97935a364e341f58590dea75486ae99ba704
[13 Jun 2018 09:15:13] <coolo> because you broke the automatic pushing by doing manual commit at the same time 
[13 Jun 2018 09:15:24] <coolo> anyway, if it's on osd it's broken - as breaking QAM
[13 Jun 2018 09:15:51] <okurz> aren't we always commiting "at the same time"?
[13 Jun 2018 09:16:35] <coolo> okurz: not so often from workstations
[13 Jun 2018 09:16:41] <okurz> https://openqa.suse.de/tests/1759358#step/first_boot/3 is where it's used properly. so I guess I broke tests because they were running at the time of merging my delete MR and new needles are only picked up after restarting …
[13 Jun 2018 09:16:42] <c4y> openQA: caasp-2.0-CaaSP-DVD-Incidents-x86_64-Build:7740:file.1528870658-QAM-
[13 Jun 2018 09:16:57] <okurz> hm
[13 Jun 2018 09:17:45] <coolo> but I give your theory a try and restart all fails
[13 Jun 2018 09:18:31] <okurz> ok, pushed as geekotest from osd as workaround. I guess this asks for a long-term feature to improve the handling of how needles reach the system. Can I expect you to handle that? or should I create a ticket with the chat log?
[13 Jun 2018 09:19:19] <coolo> okurz: I'm not following. if you created the needle before you deleted the old one, osd should never had a time where no tty2 needle existed
[13 Jun 2018 09:19:27] <coolo> so I assume you broke it by first deleting and then creating
[13 Jun 2018 09:20:15] <coolo> even though the timestamps of the git commits disagree
[13 Jun 2018 09:20:17] <coolo> but I don't care
[13 Jun 2018 09:23:17] <coolo> http://bugzilla.suse.com/show_bug.cgi?id=1086457#c54 - one more satisfied openqa user :) 
[13 Jun 2018 09:23:17] <c4y> - 1086457: ltp_aiodio_part test failures: BTRFS critical (device vda2): unable to find logical 8820195328 - https://bugzilla.suse.com/show_bug.cgi?id=1086457
[13 Jun 2018 09:31:32] <okurz> coolo: Deleting a needle has an effect immediately: 
[13 Jun 2018 09:31:32] <okurz> Could not open image /var/lib/openqa/cache/tests/sle/products/sle/needles/tty2-selected-20151201.png
[13 Jun 2018 09:31:32] <okurz> [2018-06-13T07:50:55.0141 CEST] [debug] SKIP(tty2-selected-20151201:missing PNG)
[13 Jun 2018 09:31:49] <okurz> in https://openqa.suse.de/tests/1759188/file/autoinst-log.txt but it should not have with proper caching to have a consistent set
[13 Jun 2018 09:32:11] <coolo> okurz: ah, I know
[13 Jun 2018 09:32:24] <okurz> let me pack that in a ticket
[13 Jun 2018 09:32:25] <coolo> w1 synced the old state and started job
[13 Jun 2018 09:32:33] <coolo> old needle was parsed in
[13 Jun 2018 09:32:41] <coolo> w2 synced the new stated
[13 Jun 2018 09:32:54] <coolo> w1 looked for his needle and didn't find it -> but ignored it

Related issues

Related to openQA Project - action #34549: implement a way to retire needles or "Keep copies of need... Rejected 09/04/2018

History

#1 Updated by nicksinger over 2 years ago

  • Description updated (diff)

#2 Updated by coolo over 2 years ago

  • Subject changed from [tools] Check needles on FS, remember if once deleted and display properly in webui to Check needles on FS, remember if once deleted and display properly in webui
  • Target version set to future

#3 Updated by okurz almost 2 years ago

  • Target version changed from future to future

#4 Updated by okurz 4 months ago

  • Related to action #34549: implement a way to retire needles or "Keep copies of needles while running tests" added

#5 Updated by okurz 4 months ago

  • Description updated (diff)
  • Status changed from New to Workable

#6 Updated by okurz 4 months ago

  • Subject changed from Check needles on FS, remember if once deleted and display properly in webui to Improve handling of needle deprecation and deletion
  • Description updated (diff)

Incorporated content from #34549

Also available in: Atom PDF