action #75364
closedopenQA Infrastructure (public) - action #64279: [virtualization][OS upgrade] upgrade xen host openqaw5-xen.qa.suse.de
[qac] job incompletes with auto_review:"(?s)Error connecting to VNC server.*openqa.*-xen.*backend died: socket does not exist. Probably your backend instance could not start or died.*"
0%
Description
https://openqa.suse.de/tests/4889195 is incompleted, the log shows:
[0m[2020-10-26T18:40:02.633 CET] [debug] tests/console/snapper_jeos_cli.pm:80 called snapper_jeos_cli::rollback_and_reboot -> tests/console/snapper_jeos_cli.pm:43 called power_action_utils::power_action -> lib/power_action_utils.pm:308 called power_action_utils::assert_shutdown_and_restore_system -> lib/power_action_utils.pm:371 called testapi::select_console
[2020-10-26T18:40:02.633 CET] [debug] <<< testapi::select_console(testapi_console="sut")
/usr/lib/os-autoinst/consoles/vnc_base.pm:62:{
"password" => "nots3cr3t",
"hostname" => "openqaw5-xen.qa.suse.de",
"port" => 5902
}
[37m[2020-10-26T18:40:04.637 CET] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[0m[37m[2020-10-26T18:40:05.638 CET] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[0m[37m[2020-10-26T18:40:06.640 CET] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[0m[37m[2020-10-26T18:40:07.641 CET] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[0m[37m[2020-10-26T18:40:08.642 CET] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[0m[37m[2020-10-26T18:40:09.643 CET] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[0m[37m[2020-10-26T18:40:10.644 CET] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[0m[37m[2020-10-26T18:40:11.646 CET] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5902>: IO::Socket::INET: connect: Connection refused
[0m[37m[2020-10-26T18:40:12.649 CET] [debug] Backend process died, backend errors are reported below in the following lines:
socket does not exist. Probably your backend instance could not start or died. at /usr/lib/os-autoinst/consoles/VNC.pm line 881.
[0m[37m[2020-10-26T18:40:12.649 CET] [debug] Closing SSH serial connection with openqaw5-xen.qa.suse.de
[0m[37m[2020-10-26T18:40:12.650 CET] [debug] Passing remaining frames to the video encoder
[0m[37m[2020-10-26T18:40:12.679 CET] [debug] Waiting for video encoder to finalize the video
[0m[37m[2020-10-26T18:40:12.679 CET] [debug] The built-in video encoder (pid 18516) terminated
[0m[37m[2020-10-26T18:40:12.679 CET] [debug] SSH disconnect hostname=openqaw5-xen.qa.suse.de,username=root
[0m[37m[2020-10-26T18:40:12.679 CET] [debug] sending magic and exit
[0m[37m[2020-10-26T18:40:12.680 CET] [debug] received magic close
[0m[37m[2020-10-26T18:40:12.681 CET] [debug] THERE IS NOTHING TO READ 15 4 3
[0m[37m[2020-10-26T18:40:12.681 CET] [debug] stopping command server 18416 because test execution ended
[0m[37m[2020-10-26T18:40:12.681 CET] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20133/x4l5yowHjPYG6hIe/broadcast
[0m[37m[2020-10-26T18:40:12.704 CET] [debug] commands process exited: 0
[0m[37m[2020-10-26T18:40:12.709 CET] [debug] backend process exited: 0
[0m[37m[2020-10-26T18:40:12.709 CET] [debug] done with command server
[0m[37m[2020-10-26T18:40:12.709 CET] [debug] stopping autotest process 18422
[0m[37m[2020-10-26T18:40:12.709 CET] [debug] autotest received signal TERM, saving results of current test before exiting
[0mXIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":60663"
after 28807 requests (28807 known processed) with 0 events remaining.
[37m[2020-10-26T18:40:12.711 CET] [debug] Driver backend collected unknown process with pid 18553 and exit status: 1
[0mxterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":60663"
[37m[2020-10-26T18:40:12.714 CET] [debug] Driver backend collected unknown process with pid 18558 and exit status: 0
[0m[37m[2020-10-26T18:40:12.714 CET] [debug] Driver backend collected unknown process with pid 18555 and exit status: 84
[0m[37m[2020-10-26T18:40:12.714 CET] [debug] Driver backend collected unknown process with pid 18557 and exit status: 0
[0m[37m[2020-10-26T18:40:12.726 CET] [debug] Driver backend collected unknown process with pid 18532 and exit status: 0
[0m[37m[2020-10-26T18:40:12.727 CET] [debug] [autotest] process exited: 1
[0m[37m[2020-10-26T18:40:12.827 CET] [debug] done with autotest process
[0m[37m[2020-10-26T18:40:12.827 CET] [debug] isotovideo failed
[0m[37m[2020-10-26T18:40:12.828 CET] [debug] stopping backend process 18423
[0m[37m[2020-10-26T18:40:12.828 CET] [debug] done with backend process
[0m18412: EXIT 1
see more details in https://openqa.suse.de/tests/4889195/file/autoinst-log.txt
Updated by okurz about 4 years ago
- Tags set to qac, jeos, xen
- Project changed from openQA Project (public) to openQA Tests (public)
- Subject changed from job incompletes with auto_review:"backend died: socket does not exist. Probably your backend instance could not start or died.*" to [qac] job incompletes with auto_review:"(?s)Error connecting to VNC server.*openqa.*-xen.*backend died: socket does not exist. Probably your backend instance could not start or died.*"
- Category set to Bugs in existing tests
- Assignee set to jlausuch
- Priority changed from Low to High
Maintenance of special worker addendums including the Xen hypervisor host is ouf of scope for SUSE QA Tools (https://progress.opensuse.org/projects/qa/wiki#Out-of-scope). As the test is about "JeOS" I will assign to QAC team.
@Xiaojing_liu I suggest to be a bit more specific with the auto_review regex to prevent matching on too many generic issues, e.g. if that symptom also appears for other backends or machines.
Updated by okurz about 4 years ago
- Has duplicate action #71236: job incompletes with auto_review:"backend died: Error connecting to VNC server <openqaw5-xen.qa.suse.de:5901>: IO::Socket::INET: connect: Connection refused" added
Updated by jlausuch about 4 years ago
What am I supposed to do with this? Just tag the failed test I suppose :)
This looks like the same nature of https://progress.opensuse.org/issues/71236
Updated by okurz about 4 years ago
jlausuch wrote:
What am I supposed to do with this? Just tag the failed test I suppose :)
Well, this is about incomplete jobs so "failed" tests would not really fit. And with the "auto_review" keyword in the subject line there should be no need to manually label builds ("tagging" is for builds). See more about auto-review on https://gitlab.suse.de/openqa/auto-review/ if you are interested
So what I can suggest to do is do a couple of things:
- Prevent the test from incompleting and turn them into failed by making sure that consoles are only tried to be activated when they are present. What specifically happened here I do not know. But in the complete test scenario https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=JeOS-for-kvm-and-xen&machine=svirt-xen-hvm&test=jeos-filesystem_xenhvm&version=15-SP3 I see only one incomplete and then previous and later tests were fine again, at least not incompleting. So the issue is likely not that severe.
- Improve the backend that is used here so that the error feedback in case of problems is better than the "connection refused" and incomplete.
- Help to improve how the hypervisor hosts are managed, maintained, monitored and alerting.
The QE Tools team is happy to offer help but does not have the capacity to improve the "special worker addendums" that are used for tests here themselves.
This looks like the same nature of #71236
yes, this is why I rejected #71236 as a duplicate of this ticket. But you should not point back to the duplicate ticket otherwise you are caught in an infinite circle ;)
Updated by cfconrad about 4 years ago
- Priority changed from High to Normal
Set to prio Normal, as this was later run's didn't show this incomplete behaviors anymore.
Updated by mloviska about 4 years ago
- Tags changed from qac, jeos, xen to qac, xen
- Status changed from New to Blocked
- Assignee deleted (
jlausuch) - Parent task set to #64279
A priori we need to resolve OS upgrade. Let me set this one as blocked.
Updated by okurz about 4 years ago
please be aware that #64279 is out of scope of the SUSE QE Tools team, see https://progress.opensuse.org/projects/qa/wiki/Wiki#Out-of-scope .
Updated by jlausuch over 3 years ago
- Status changed from Blocked to Resolved
After XEN host update done by Martin, we haven't observed this issue.