action #10418
closedworker: do not warn on expected problems
0%
Description
observation¶
worker log output
worker[5235]: waitpid returned error: No child processes
worker[5235]: can't open /var/lib/openqa/pool/1/testresults/status.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 525.
worker[5235]: can't open /var/lib/openqa/pool/1/testresults/test_order.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 525.
suggestion¶
improve confusing error messages. worker already died in before in this case with error message in autoinst-log.txt
$ tail -n 30 /var/lib/openqa/testresults/$(ls -t /var/lib/openqa/testresults/ | head -1)/autoinst-log.txt
+++ worker notes +++
start time: 2016-01-25 16:56:35
running on okurz-test-for-openqa.openstack.local.p2.cloud.suse.de:1 (Linux 4.1.12-1-default #1 SMP PREEMPT Thu Oct 29 06:43:42 UTC 2015 (e24bad1) x86_64)
Can't locate /var/lib/openqa/share/tests/sle/lib/susedistribution.pm in @INC
[…]
result: died
uploading vars.json
uploading autoinst-log.txt
Updated by okurz almost 8 years ago
- Related to coordination #14972: [tools][epic] Improvements on backend to improve better handling of stalls added
Updated by szarate almost 8 years ago
@okurz, this issue is already 1 year old. Can you explain a bit what a "confusing message" is?
Updated by okurz almost 8 years ago
"confusing message"?:
There are log messages in the worker log which are not pointing to the error plus there are messages which are seemingly unrelated
worker[5235]: can't open /var/lib/openqa/pool/1/testresults/status.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 525.
Updated by coolo about 7 years ago
- Subject changed from improve worker log output in case of early failure to worker: do not warn on expected problems
- Category changed from Feature requests to Regressions/Crashes
- Target version set to Ready
The missing json files causing warnings/errors caused more than a couple of test developers to get nervous if their instance is fine. So I take that as a bug
Updated by mkittler almost 6 years ago
- Status changed from New to Feedback
After almost 2 years I can not find the line which reads status.json
anymore. Instead it now only seems to query the command server. Maybe the error handling could improved here, too but that's a different thing. test_order.json
is now also only read conditionally if the job is running according to the status. Hence I suppose that warning shouldn't occur anymore as well (unless that file is really missing while the command server is still up and running).
So I'm not sure whether this ticket is still valid. Do we still see (similar) problems in production?
Updated by okurz over 5 years ago
- Status changed from Feedback to Workable
Let me give you examples from recent jobs on production:
- https://openqa.suse.de/tests/3213464/file/autoinst-log.txt is a failed job that failed because a needle did not match. After the according message there is a lot of noise, e.g. "[2019-08-06T11:09:14.610 CEST] [debug] isotovideo: unable to inform websocket clients about stopping command server: Request timeout at /usr/bin/isotovideo line 172." and also "[2019-08-06T11:09:15.611 CEST] [error] can_read received kill signal at /usr/lib/os-autoinst/myjsonrpc.pm line 91."
- https://openqa.suse.de/tests/3213450/file/autoinst-log.txt is a passed job but a lot of errors which are backend specific, e.g. "xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":55829""
The original message seems to have changed in behaviour since the time of reporting, e.g. now we have messages in the worker log like "Aug 2 13:04:53 openqa-worker@10: [warn] [pid:5405] Can't open /var/lib/openqa/pool/10/testresults/result-kdump_and_crash.json for result upload - likely isotovideo could not be started or failed early. Error message: No such file or directory", which isn't the best but probably ok-ish for the time being.
Updated by kraih over 4 years ago
- Target version changed from Ready to Current Sprint
Updated by okurz over 4 years ago
https://github.com/os-autoinst/openQA/pull/2778/commits/05a262463db09fa92aa445922b9a9a3b9b53dcab adresses the message "Can't open ….json for result upload"
Updated by livdywan over 4 years ago
Can we clarify the goal of this ticket? The PR has been merged.
Updated by okurz over 4 years ago
- Status changed from Workable to Resolved
- Assignee changed from kraih to mkittler
I checked current failures looking for any potentially confusing error messages and have not found instances of the aforementioned cases.
E.g. https://openqa.opensuse.org/tests/1219947 is a simple case of "needle failed to match". In https://openqa.opensuse.org/tests/1219947/file/autoinst-log.txt one sees
[2020-03-31T16:00:04.130 CEST] [debug] no candidate needle with tag(s) 'package-conflict-choice' matched
and then a rather clean shutdown of the stack ending with
[2020-03-31T16:18:22.806 CEST] [debug] BACKEND SHUTDOWN
[2020-03-31T16:18:22.807 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2020-03-31T16:18:24.858 CEST] [debug] flushing frames
[2020-03-31T16:18:24.906 CEST] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 7547 (/usr/bin/isotovideo: backen)
[2020-03-31T16:18:24.907 CEST] [debug] sending magic and exit
[2020-03-31T16:18:24.908 CEST] [debug] received magic close
[2020-03-31T16:18:24.926 CEST] [debug] backend process exited: 0
[2020-03-31T16:18:25.927 CEST] [debug] stopping backend process 7547
[2020-03-31T16:18:25.927 CEST] [debug] done with backend process
7510: EXIT 0
[2020-03-31T16:18:25.0998 CEST] [info] Isotovideo exit status: 0
[2020-03-31T16:19:38.0007 CEST] [info] +++ worker notes +++
[2020-03-31T16:19:38.0007 CEST] [info] End time: 2020-03-31 14:19:38
[2020-03-31T16:19:38.0007 CEST] [info] Result: done
also the worker log is clean:
[2020-03-31T15:52:22.0723 CEST] [info] 7510: WORKING 1219947
[2020-03-31T15:52:22.0725 CEST] [info] isotovideo has been started (PID: 7510)
[2020-03-31T16:19:38.0029 CEST] [info] Uploading start_install-btrfs-filesystem-df-mnt.txt
…
[2020-03-31T16:19:40.0642 CEST] [info] Uploading worker-log.txt
As I did not see any changes mentioned by kraih and I am not aware of any related changes by him but the most relevant and recent change is by mkittler I set the assignee to him and set it to resolved now accordingly.