action #19332
closedGrowing number of fails due to 'skipped' modules
Added by coolo over 7 years ago. Updated about 7 years ago.
0%
Updated by okurz over 7 years ago
I'm not sure what you mean by "nameless fail". What I see in the job example you provided is that all expected modules are mentioned on the details page, all modules except apache_ssl have a proper status and screenshots. autoinst-log.txt looks complete. It's not obvious why the job turns out as failed but as we know the openQA computation sets it to fail in the case when not all modules have passed. Looking into the journal on openqaworker7 I can see
May 23 20:18:04 openqaworker7 worker[15994]: [INFO] 10336: WORKING 954478
May 23 20:18:06 openqaworker7 qemu-system-x86_64[10347]: looking for plugins in '/usr/lib64/sasl2', failed to open directory, error: No such file or directory
May 23 21:05:14 openqaworker7 worker[15994]: [ERROR] 502 response: Proxy Error (remaining tries: 2)
May 23 21:40:49 openqaworker7 worker[15994]: [ERROR] 502 response: Proxy Error (remaining tries: 2)
and this is also the last content in the whole journal since then. I was expecting something more obvious about a failed access when trying to upload the modules details but at least it is showing us again that the worker access over the API is having some problems.
Looking at the log /var/log/openqa on osd I see that openQA does not complain that the file would not be there or something when trying to read it. The webUI just says
[Tue May 23 22:54:22 2017] [22858:debug] reading /var/lib/openqa/testresults/00954/00954478-sle-12-SP2-Server-DVD-Updates-x86_64-Build20170523-5-mau-webserver@64bit/details-apache_ssl.json
but it's a 2-byte file with whole content []
so something within os-autoinst. Not really the API transfer to blame. Right?
Updated by coolo over 7 years ago
- Subject changed from Growing number of skipped modules cause nameless fails to Growing number of fails due to 'skipped' modules
Either os-autoinst (30% chance) or worker reading details (60%) or some stupid webapi problem (10%). It's still too unlikely to make a reliable analysis :(
Updated by coolo over 7 years ago
Another example https://openqa.suse.de/tests/954522
Updated by okurz over 7 years ago
Checked this case again. The JSON file details-bootloader.json is 2 bytes consisting of only '[]' so a valid empty JSON array. So I highly suspect basetest::save_test_result
is ending up with an empty $result
. Add some debugging log there?
Updated by vsvecova over 7 years ago
More occurrences:
https://openqa.suse.de/tests/982230
https://openqa.suse.de/tests/985302
Updated by vsvecova over 7 years ago
More from today:
https://openqa.suse.de/tests/988720
https://openqa.suse.de/tests/988703
Updated by nicksinger over 7 years ago
So here is another case of such an "skipped" module: https://openqa.suse.de/tests/1055564
According to autoinst-log the module itself passed successfully but was not able to publish the results. The reason for why it was unable to publish results is not clear for me and I couldn't find a clear reason in the logs.
But what I really don't get here is why we don't color the reason for the suite failing (obviously the webui could determinate that something went wrong here) in a bright shiny red?
(Please explain reasons I may overlook)
Updated by okurz over 7 years ago
nicksinger wrote:
But what I really don't get here is why we don't color the reason for the suite failing (obviously the webui could determinate that something went wrong here) in a bright shiny red?
(Please explain reasons I may overlook)
I don't know what you are referring to with "suite". See http://open.qa/docs/#concepts . You mean the overall job result? That is red. Regarding the module result: The webui simply never gets a status of the module from os-autoinst. What one could do is to color all missing job modules when the whole job completes.
Updated by nicksinger over 7 years ago
okurz wrote:
nicksinger wrote:
But what I really don't get here is why we don't color the reason for the suite failing (obviously the webui could determinate that something went wrong here) in a bright shiny red?
(Please explain reasons I may overlook)I don't know what you are referring to with "suite". See http://open.qa/docs/#concepts . You mean the overall job result? That is red. Regarding the module result: The webui simply never gets a status of the module from os-autoinst. What one could do is to color all missing job modules when the whole job completes.
"test suite - a collection of test modules, e.g. "textmode". All test modules within one test suite are run serially" but we referred to the same thing.
Let me rephrase my critique:
- The webui decides the status of the job based on factors it knows
- The webui decides to mark the job as "failed" because on of the factors didn't match what is required to mark a job as "passed"
- The real reason for this decision goes undetected
Updated by coolo over 7 years ago
https://openqa.suse.de/tests/1065886 - latest example and I think I know why:
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:27:35 +0200 "POST /api/v1/jobs/1065886/status HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:29:05 +0200 "POST /api/v1/jobs/1065886/status HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:31:15 +0200 "POST /api/v1/jobs/1065886/status HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:32:20 +0200 "POST /api/v1/jobs/1065886/status HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:32:55 +0200 "POST /api/v1/jobs/1065886/status HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:40:05 +0200 "POST /api/v1/jobs/1065886/status HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:53:14 +0200 "POST /api/v1/jobs/1065886/artefact HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
the worker gets a 502 and is ignoring the error.
Updated by EDiGiacinto over 7 years ago
- Related to action #20378: [tools]Too many 502 on openqa added
Updated by EDiGiacinto over 7 years ago
another possible candidate due to congestion
Updated by coolo over 7 years ago
We still should report errors - possibly in an extra file parallel to autoinst-log
Updated by EDiGiacinto about 7 years ago
- Status changed from New to In Progress
- Assignee set to EDiGiacinto
Closing this bug since it might be related to #20378 which is now solved. Please re-open if problem still persist - in case it is persisting and the bug had attachments (such logs, screens, ecc.) please provide new ones.
Updated by EDiGiacinto about 7 years ago
- Status changed from In Progress to Resolved