action #10418

worker: do not warn on expected problems

Added by okurz about 4 years ago. Updated 5 days ago.

Status:ResolvedStart date:25/01/2016
Priority:NormalDue date:
Assignee:mkittler% Done:

0%

Category:Concrete Bugs
Target version:Current Sprint
Difficulty:
Duration:

Description

observation

worker log output

worker[5235]: waitpid returned error: No child processes
worker[5235]: can't open /var/lib/openqa/pool/1/testresults/status.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 525.
worker[5235]: can't open /var/lib/openqa/pool/1/testresults/test_order.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 525.

suggestion

improve confusing error messages. worker already died in before in this case with error message in autoinst-log.txt

$ tail -n 30 /var/lib/openqa/testresults/$(ls -t /var/lib/openqa/testresults/ | head -1)/autoinst-log.txt
+++ worker notes +++
start time: 2016-01-25 16:56:35
running on okurz-test-for-openqa.openstack.local.p2.cloud.suse.de:1 (Linux 4.1.12-1-default #1 SMP PREEMPT Thu Oct 29 06:43:42 UTC 2015 (e24bad1) x86_64)
Can't locate /var/lib/openqa/share/tests/sle/lib/susedistribution.pm in @INC
[…]
result: died
uploading vars.json
uploading autoinst-log.txt


Related issues

Related to openQA Project - action #14972: [tools][epic] Improvements on backend to improve better h... New 24/11/2016

History

#1 Updated by okurz about 3 years ago

  • Related to action #14972: [tools][epic] Improvements on backend to improve better handling of stalls added

#2 Updated by szarate about 3 years ago

@okurz, this issue is already 1 year old. Can you explain a bit what a "confusing message" is?

#3 Updated by okurz about 3 years ago

"confusing message"?:

There are log messages in the worker log which are not pointing to the error plus there are messages which are seemingly unrelated

worker[5235]: can't open /var/lib/openqa/pool/1/testresults/status.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 525.

#4 Updated by coolo over 2 years ago

  • Subject changed from improve worker log output in case of early failure to worker: do not warn on expected problems
  • Category changed from Feature requests to Concrete Bugs
  • Target version set to Ready

The missing json files causing warnings/errors caused more than a couple of test developers to get nervous if their instance is fine. So I take that as a bug

#5 Updated by mkittler over 1 year ago

  • Status changed from New to Feedback

After almost 2 years I can not find the line which reads status.json anymore. Instead it now only seems to query the command server. Maybe the error handling could improved here, too but that's a different thing. test_order.json is now also only read conditionally if the job is running according to the status. Hence I suppose that warning shouldn't occur anymore as well (unless that file is really missing while the command server is still up and running).

So I'm not sure whether this ticket is still valid. Do we still see (similar) problems in production?

#6 Updated by okurz 8 months ago

  • Status changed from Feedback to Workable

Let me give you examples from recent jobs on production:

  • https://openqa.suse.de/tests/3213464/file/autoinst-log.txt is a failed job that failed because a needle did not match. After the according message there is a lot of noise, e.g. "[2019-08-06T11:09:14.610 CEST] [debug] isotovideo: unable to inform websocket clients about stopping command server: Request timeout at /usr/bin/isotovideo line 172." and also "[2019-08-06T11:09:15.611 CEST] [error] can_read received kill signal at /usr/lib/os-autoinst/myjsonrpc.pm line 91."
  • https://openqa.suse.de/tests/3213450/file/autoinst-log.txt is a passed job but a lot of errors which are backend specific, e.g. "xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":55829""

The original message seems to have changed in behaviour since the time of reporting, e.g. now we have messages in the worker log like "<13>Aug 2 13:04:53 openqa-worker@10: [warn] [pid:5405] Can't open /var/lib/openqa/pool/10/testresults/result-kdump_and_crash.json for result upload - likely isotovideo could not be started or failed early. Error message: No such file or directory", which isn't the best but probably ok-ish for the time being.

#7 Updated by kraih about 1 month ago

  • Assignee set to kraih

I'll take a look at the cases mentioned.

#8 Updated by kraih about 1 month ago

  • Target version changed from Ready to Current Sprint

#9 Updated by okurz about 1 month ago

#10 Updated by cdywan 9 days ago

Can we clarify the goal of this ticket? The PR has been merged.

#11 Updated by okurz 5 days ago

  • Status changed from Workable to Resolved
  • Assignee changed from kraih to mkittler

I checked current failures looking for any potentially confusing error messages and have not found instances of the aforementioned cases.

E.g. https://openqa.opensuse.org/tests/1219947 is a simple case of "needle failed to match". In https://openqa.opensuse.org/tests/1219947/file/autoinst-log.txt one sees

[2020-03-31T16:00:04.130 CEST] [debug] no candidate needle with tag(s) 'package-conflict-choice' matched

and then a rather clean shutdown of the stack ending with

[2020-03-31T16:18:22.806 CEST] [debug] BACKEND SHUTDOWN 
[2020-03-31T16:18:22.807 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2020-03-31T16:18:24.858 CEST] [debug] flushing frames
[2020-03-31T16:18:24.906 CEST] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 7547 (/usr/bin/isotovideo: backen)
[2020-03-31T16:18:24.907 CEST] [debug] sending magic and exit
[2020-03-31T16:18:24.908 CEST] [debug] received magic close
[2020-03-31T16:18:24.926 CEST] [debug] backend process exited: 0
[2020-03-31T16:18:25.927 CEST] [debug] stopping backend process 7547
[2020-03-31T16:18:25.927 CEST] [debug] done with backend process
7510: EXIT 0
[2020-03-31T16:18:25.0998 CEST] [info] Isotovideo exit status: 0
[2020-03-31T16:19:38.0007 CEST] [info] +++ worker notes +++
[2020-03-31T16:19:38.0007 CEST] [info] End time: 2020-03-31 14:19:38
[2020-03-31T16:19:38.0007 CEST] [info] Result: done

also the worker log is clean:

[2020-03-31T15:52:22.0723 CEST] [info] 7510: WORKING 1219947
[2020-03-31T15:52:22.0725 CEST] [info] isotovideo has been started (PID: 7510)
[2020-03-31T16:19:38.0029 CEST] [info] Uploading start_install-btrfs-filesystem-df-mnt.txt
…
[2020-03-31T16:19:40.0642 CEST] [info] Uploading worker-log.txt

As I did not see any changes mentioned by kraih and I am not aware of any related changes by him but the most relevant and recent change is by mkittler I set the assignee to him and set it to resolved now accordingly.

Also available in: Atom PDF