action #10418

worker: do not warn on expected problems

Added by okurz about 4 years ago. Updated 7 months ago.

Status:WorkableStart date:25/01/2016
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:Concrete Bugs
Target version:Ready
Difficulty:
Duration:

Description

observation

worker log output

worker[5235]: waitpid returned error: No child processes
worker[5235]: can't open /var/lib/openqa/pool/1/testresults/status.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 525.
worker[5235]: can't open /var/lib/openqa/pool/1/testresults/test_order.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 525.

suggestion

improve confusing error messages. worker already died in before in this case with error message in autoinst-log.txt

$ tail -n 30 /var/lib/openqa/testresults/$(ls -t /var/lib/openqa/testresults/ | head -1)/autoinst-log.txt
+++ worker notes +++
start time: 2016-01-25 16:56:35
running on okurz-test-for-openqa.openstack.local.p2.cloud.suse.de:1 (Linux 4.1.12-1-default #1 SMP PREEMPT Thu Oct 29 06:43:42 UTC 2015 (e24bad1) x86_64)
Can't locate /var/lib/openqa/share/tests/sle/lib/susedistribution.pm in @INC
[…]
result: died
uploading vars.json
uploading autoinst-log.txt


Related issues

Related to openQA Project - action #14972: [tools][epic] Improvements on backend to improve better h... New 24/11/2016

History

#1 Updated by okurz about 3 years ago

  • Related to action #14972: [tools][epic] Improvements on backend to improve better handling of stalls added

#2 Updated by szarate about 3 years ago

@okurz, this issue is already 1 year old. Can you explain a bit what a "confusing message" is?

#3 Updated by okurz about 3 years ago

"confusing message"?:

There are log messages in the worker log which are not pointing to the error plus there are messages which are seemingly unrelated

worker[5235]: can't open /var/lib/openqa/pool/1/testresults/status.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 525.

#4 Updated by coolo over 2 years ago

  • Subject changed from improve worker log output in case of early failure to worker: do not warn on expected problems
  • Category changed from Feature requests to Concrete Bugs
  • Target version set to Ready

The missing json files causing warnings/errors caused more than a couple of test developers to get nervous if their instance is fine. So I take that as a bug

#5 Updated by mkittler about 1 year ago

  • Status changed from New to Feedback

After almost 2 years I can not find the line which reads status.json anymore. Instead it now only seems to query the command server. Maybe the error handling could improved here, too but that's a different thing. test_order.json is now also only read conditionally if the job is running according to the status. Hence I suppose that warning shouldn't occur anymore as well (unless that file is really missing while the command server is still up and running).

So I'm not sure whether this ticket is still valid. Do we still see (similar) problems in production?

#6 Updated by okurz 7 months ago

  • Status changed from Feedback to Workable

Let me give you examples from recent jobs on production:

  • https://openqa.suse.de/tests/3213464/file/autoinst-log.txt is a failed job that failed because a needle did not match. After the according message there is a lot of noise, e.g. "[2019-08-06T11:09:14.610 CEST] [debug] isotovideo: unable to inform websocket clients about stopping command server: Request timeout at /usr/bin/isotovideo line 172." and also "[2019-08-06T11:09:15.611 CEST] [error] can_read received kill signal at /usr/lib/os-autoinst/myjsonrpc.pm line 91."
  • https://openqa.suse.de/tests/3213450/file/autoinst-log.txt is a passed job but a lot of errors which are backend specific, e.g. "xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":55829""

The original message seems to have changed in behaviour since the time of reporting, e.g. now we have messages in the worker log like "<13>Aug 2 13:04:53 openqa-worker@10: [warn] [pid:5405] Can't open /var/lib/openqa/pool/10/testresults/result-kdump_and_crash.json for result upload - likely isotovideo could not be started or failed early. Error message: No such file or directory", which isn't the best but probably ok-ish for the time being.

Also available in: Atom PDF