Project

General

Profile

action #19332

Growing number of fails due to 'skipped' modules

Added by coolo about 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Concrete Bugs
Target version:
-
Start date:
2017-05-23
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

I notice very often lately - e.g. https://openqa.suse.de/tests/954478


Related issues

Related to openQA Tests - action #20378: [tools]Too many 502 on openqaResolved2017-07-18

History

#1 Updated by okurz about 5 years ago

I'm not sure what you mean by "nameless fail". What I see in the job example you provided is that all expected modules are mentioned on the details page, all modules except apache_ssl have a proper status and screenshots. autoinst-log.txt looks complete. It's not obvious why the job turns out as failed but as we know the openQA computation sets it to fail in the case when not all modules have passed. Looking into the journal on openqaworker7 I can see

May 23 20:18:04 openqaworker7 worker[15994]: [INFO] 10336: WORKING 954478
May 23 20:18:06 openqaworker7 qemu-system-x86_64[10347]: looking for plugins in '/usr/lib64/sasl2', failed to open directory, error: No such file or directory
May 23 21:05:14 openqaworker7 worker[15994]: [ERROR] 502 response: Proxy Error (remaining tries: 2)
May 23 21:40:49 openqaworker7 worker[15994]: [ERROR] 502 response: Proxy Error (remaining tries: 2)

and this is also the last content in the whole journal since then. I was expecting something more obvious about a failed access when trying to upload the modules details but at least it is showing us again that the worker access over the API is having some problems.

Looking at the log /var/log/openqa on osd I see that openQA does not complain that the file would not be there or something when trying to read it. The webUI just says

[Tue May 23 22:54:22 2017] [22858:debug] reading /var/lib/openqa/testresults/00954/00954478-sle-12-SP2-Server-DVD-Updates-x86_64-Build20170523-5-mau-webserver@64bit/details-apache_ssl.json

but it's a 2-byte file with whole content [] so something within os-autoinst. Not really the API transfer to blame. Right?

#2 Updated by coolo about 5 years ago

  • Subject changed from Growing number of skipped modules cause nameless fails to Growing number of fails due to 'skipped' modules

Either os-autoinst (30% chance) or worker reading details (60%) or some stupid webapi problem (10%). It's still too unlikely to make a reliable analysis :(

#4 Updated by okurz about 5 years ago

Checked this case again. The JSON file details-bootloader.json is 2 bytes consisting of only '[]' so a valid empty JSON array. So I highly suspect basetest::save_test_result is ending up with an empty $result. Add some debugging log there?

#5 Updated by coolo about 5 years ago

to annoy Anton? :)

#9 Updated by nicksinger almost 5 years ago

So here is another case of such an "skipped" module: https://openqa.suse.de/tests/1055564
According to autoinst-log the module itself passed successfully but was not able to publish the results. The reason for why it was unable to publish results is not clear for me and I couldn't find a clear reason in the logs.

But what I really don't get here is why we don't color the reason for the suite failing (obviously the webui could determinate that something went wrong here) in a bright shiny red?
(Please explain reasons I may overlook)

#10 Updated by okurz almost 5 years ago

nicksinger wrote:

But what I really don't get here is why we don't color the reason for the suite failing (obviously the webui could determinate that something went wrong here) in a bright shiny red?
(Please explain reasons I may overlook)

I don't know what you are referring to with "suite". See http://open.qa/docs/#concepts . You mean the overall job result? That is red. Regarding the module result: The webui simply never gets a status of the module from os-autoinst. What one could do is to color all missing job modules when the whole job completes.

#11 Updated by nicksinger almost 5 years ago

okurz wrote:

nicksinger wrote:

But what I really don't get here is why we don't color the reason for the suite failing (obviously the webui could determinate that something went wrong here) in a bright shiny red?
(Please explain reasons I may overlook)

I don't know what you are referring to with "suite". See http://open.qa/docs/#concepts . You mean the overall job result? That is red. Regarding the module result: The webui simply never gets a status of the module from os-autoinst. What one could do is to color all missing job modules when the whole job completes.

"test suite - a collection of test modules, e.g. "textmode". All test modules within one test suite are run serially" but we referred to the same thing.
Let me rephrase my critique:

  1. The webui decides the status of the job based on factors it knows
  2. The webui decides to mark the job as "failed" because on of the factors didn't match what is required to mark a job as "passed"
  3. The real reason for this decision goes undetected

#12 Updated by coolo almost 5 years ago

https://openqa.suse.de/tests/1065886 - latest example and I think I know why:

2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:27:35 +0200 "POST /api/v1/jobs/1065886/status HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:29:05 +0200 "POST /api/v1/jobs/1065886/status HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:31:15 +0200 "POST /api/v1/jobs/1065886/status HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:32:20 +0200 "POST /api/v1/jobs/1065886/status HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:32:55 +0200 "POST /api/v1/jobs/1065886/status HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:40:05 +0200 "POST /api/v1/jobs/1065886/status HTTP/1.1" 406 - "-" "Mojolicious (Perl)"
2620:113:80c0:8080:10:160:1:100 - - 19/Jul/2017:10:53:14 +0200 "POST /api/v1/jobs/1065886/artefact HTTP/1.1" 406 - "-" "Mojolicious (Perl)"

the worker gets a 502 and is ignoring the error.

#13 Updated by EDiGiacinto almost 5 years ago

#14 Updated by EDiGiacinto almost 5 years ago

another possible candidate due to congestion

#15 Updated by coolo almost 5 years ago

We still should report errors - possibly in an extra file parallel to autoinst-log

#16 Updated by EDiGiacinto almost 5 years ago

  • Status changed from New to In Progress
  • Assignee set to EDiGiacinto

Closing this bug since it might be related to #20378 which is now solved. Please re-open if problem still persist - in case it is persisting and the bug had attachments (such logs, screens, ecc.) please provide new ones.

#17 Updated by EDiGiacinto almost 5 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF