Project

General

Profile

action #46919

[functional][u][svirt][sporadic] auto_review:"IO::Socket::INET: connect: Connection timed out"

Added by okurz over 2 years ago. Updated over 1 year ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Infrastructure
Target version:
SUSE QA - Milestone 30
Start date:
2018-12-01
Due date:
% Done:

0%

Estimated time:
Difficulty:
hard

Description

Observation

https://openqa.suse.de/tests/2426419/file/autoinst-log.txt is incomplete. I see first that the connection timed out as well as the "Use of uninitialized value"

[2019-01-31T13:22:41.310 CET] [debug] Backend process died, backend errors are reported below in the following lines:
Error connecting to host <10.161.145.30>: IO::Socket::INET: connect: Connection timed out
[2019-01-31T13:22:41.311 CET] [debug] Destroying openQA-SUT-1 virtual machine
Use of uninitialized value $libvirt_connector in concatenation (.) or string at /usr/lib/os-autoinst/backend/svirt.pm line 80.

Reproducible

Sporadic

Expected result

  • Host connection time out should not cause an incomplete

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Tests - action #46964: [functional][u][s390x] test fails in the middle of execution (not installation) as incomplete with "half-open socket?" – connection to machine vanished?Resolved2019-02-01

Related to openQA Project - action #46967: [functional][u][tools] warning in bootloader_zkvm: Calling Net::SSH2::Channel::readline in non-blocking mode is usually a programming errorResolved2019-02-01

Related to openQA Project - action #44579: [functional][u][svirt] wait_serial call timed out while the SUT was still alive Rejected2018-11-30

Related to openQA Tests - action #50765: [sle][functional][u] test fails in bootloader - svirt-xen vnc connection refusedRejected2019-04-25

Related to openQA Project - action #49961: Prevent svirt backend to hang on virsh undefine command causing job timeouts/incompletesRejected2019-04-03

Follows openQA Tests - action #44594: [functional][u] test fails in sysstat - regular expression for pidstat test is incorrectResolved2018-11-30

Copied to openQA Tests - action #48059: [functional][u][svirt] "Use of uninitialized value"Resolved2018-12-01

History

#1 Updated by okurz over 2 years ago

  • Related to action #46964: [functional][u][s390x] test fails in the middle of execution (not installation) as incomplete with "half-open socket?" – connection to machine vanished? added

#2 Updated by szarate over 2 years ago

  • Description updated (diff)

#3 Updated by szarate over 2 years ago

  • Subject changed from [functional][u][s390x] "Connection timed out" and "Use of uninitialized value" to [functional][u][svirt] "Connection timed out"

#4 Updated by szarate over 2 years ago

  • Target version changed from future to Milestone 24

There's the same error for openqaw5-xen.qa.suse.de. Suggestion is to try to ensure that the code handles properly the authentication via fallbacks (if supported by the current version) and on top of that, actually wait if the authentication is not possible.

Will try this tomorrow in combination with my current in progress ticket (Since is svirt too)

#5 Updated by szarate over 2 years ago

  • Follows action #44594: [functional][u] test fails in sysstat - regular expression for pidstat test is incorrect added

#6 Updated by okurz over 2 years ago

  • Subject changed from [functional][u][svirt] "Connection timed out" to [functional][u][svirt] "Connection timed out" (was: … "Use of uninitialized value")

observed in https://openqa.suse.de/tests/2454567/file/autoinst-log.txt now

adjusted subject to mention the uninitialized variable again to make it easier to discover.

#7 Updated by szarate over 2 years ago

  • Subject changed from [functional][u][svirt] "Connection timed out" (was: … "Use of uninitialized value") to [functional][u][svirt] "Connection timed out"

Could you create a separate ticket for the uninitialized values?, in any case: https://github.com/os-autoinst/os-autoinst/pull/1105 and https://github.com/os-autoinst/os-autoinst/pull/1099 took care of those, We'd see the changes in the next deploy.

#8 Updated by okurz over 2 years ago

  • Copied to action #48059: [functional][u][svirt] "Use of uninitialized value" added

#9 Updated by szarate over 2 years ago

  • Related to action #46967: [functional][u][tools] warning in bootloader_zkvm: Calling Net::SSH2::Channel::readline in non-blocking mode is usually a programming error added

#10 Updated by szarate over 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to szarate
  • Target version changed from Milestone 24 to Milestone 23

Picking this one... there are some improvements that could be done far beyond #4

#11 Updated by szarate over 2 years ago

Ok for the time being, filed: https://bugzilla.opensuse.org/show_bug.cgi?id=1126292

currently I'm at: http://phobos.suse.de/tests/1747295/file/autoinst-log.txt

- read 0 total
[2019-02-20T09:26:35.974 CET] [debug] Command's stderr:
error: Failed to start domain openQA-SUT-1
error: internal error: Failed to load module '/usr/lib64/libvirt/storage-file/libvirt_storage_file_fs.so': /usr/lib64/libvirt.so.0: version `LIBVIRT_PRIVATE_5.0.0' not found (required by /usr/lib64/libvirt/storage-file/libvirt_storage_file_fs.so)

Which is why I've reported the bug, so will install leap 15.1 on my laptop and set that one up as virtsh host and try from there: https://github.com/foursixnine/os-autoinst/commits/wait-for-it

#12 Updated by szarate over 2 years ago

  • Related to action #44579: [functional][u][svirt] wait_serial call timed out while the SUT was still alive added

#13 Updated by szarate over 2 years ago

Currently enabling openSUSE Leap 15.1. However repo is not yet published...

#14 Updated by okurz over 2 years ago

https://download.opensuse.org/repositories/devel:/openQA/openSUSE_Leap_15.1/ is there now.

szarate is this still "In Progress"? Any look with the setup of svirt workers on your notebook? What do you consider as next step?

#15 Updated by szarate over 2 years ago

For the time being, i had some problems with my local svirt setup, which I haven't quite figured out yet, since according to mkittler's docs, it should be pretty straightforward to use the qemu backend... however, in the meantime, I've added a PoC to support ssh based auth on the backend itself, and I'm currently looking on how to add it to the ssh consoles, so that the jump host, doesn't ask for a password anymore.

To answr okurz's question:

Once the ssh auth is done, increase the connection timeout of the SSH connection, as that's the symptom that we're trying to fix here, since from time to time when there's huge load on the network we see this problems.

#16 Updated by szarate over 2 years ago

So, I have a pr open now: https://github.com/os-autoinst/os-autoinst/pull/1131 just for the key authentication part of it... still no luck for the VNC part of it.

As for the timeouts, Matthias... how about suggesting to use mosh instead of ssh for the connections?

#17 Updated by SLindoMansilla over 2 years ago

  • Related to action #48434: [functional][u] test Tumbleweed s390x again added

#18 Updated by SLindoMansilla over 2 years ago

  • Related to deleted (action #48434: [functional][u] test Tumbleweed s390x again)

#19 Updated by mgriessmeier over 2 years ago

  • Target version changed from Milestone 23 to Milestone 24

moving to M24

#20 Updated by SLindoMansilla over 2 years ago

  • Subject changed from [functional][u][svirt] "Connection timed out" to [functional][u][svirt][sporadic] "Connection timed out"

#21 Updated by szarate over 2 years ago

  • Status changed from In Progress to Workable

https://openqa.suse.de/tests/2790262 Displays something that could be related

[2019-04-09T13:25:36.881 CEST] [debug] considering VNC stalled, no update for 4.00 seconds
[2019-04-09T13:25:38.883 CEST] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
[2019-04-09T13:25:39.883 CEST] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
[2019-04-09T13:25:40.884 CEST] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
[2019-04-09T13:25:41.885 CEST] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
[2019-04-09T13:25:42.886 CEST] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
[2019-04-09T13:25:43.887 CEST] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
[2019-04-09T13:25:44.887 CEST] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
[2019-04-09T13:25:45.888 CEST] [debug] Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
[2019-04-09T13:25:46.892 CEST] [debug] Backend process died, backend errors are reported below in the following lines:
Error connecting to VNC server <openqaw5-xen.qa.suse.de:5903>: IO::Socket::INET: connect: Connection refused
last frame

PS: Setting to Workable, as I'm not working on this atm

#22 Updated by szarate over 2 years ago

  • Related to action #50765: [sle][functional][u] test fails in bootloader - svirt-xen vnc connection refused added

#23 Updated by SLindoMansilla over 2 years ago

  • Related to action #49961: Prevent svirt backend to hang on virsh undefine command causing job timeouts/incompletes added

#24 Updated by SLindoMansilla over 2 years ago

  • Related to deleted (action #49961: Prevent svirt backend to hang on virsh undefine command causing job timeouts/incompletes)

#25 Updated by SLindoMansilla over 2 years ago

  • Blocks action #49961: Prevent svirt backend to hang on virsh undefine command causing job timeouts/incompletes added

#26 Updated by mgriessmeier over 2 years ago

  • Target version changed from Milestone 24 to Milestone 25

#27 Updated by mgriessmeier about 2 years ago

  • Target version changed from Milestone 25 to Milestone 26

#28 Updated by mgriessmeier about 2 years ago

  • Target version changed from Milestone 26 to Milestone 27

@Santi - what shall we do with this one?

#29 Updated by mgriessmeier about 2 years ago

  • Status changed from Workable to Rejected
  • Target version changed from Milestone 27 to Milestone 28

this won't be looked at again - and since apparently no one cares or the issue doesn't happen anymore - I'm cleaning up old mess

#30 Updated by szarate almost 2 years ago

  • Category changed from Bugs in existing tests to Infrastructure
  • Status changed from Rejected to New
  • Difficulty set to hard

I missed the notification here, I still believe this ticket is valid as we have it often in different places.

#31 Updated by okurz almost 2 years ago

https://github.com/os-autoinst/os-autoinst/pull/1262 might help with that by turning an incomplete into a fail. If you encounter this again please check if this is effective. I tried to follow the link to "latest" from description but could not find any useful scenario. Could you please reference failed jobs for reference?

#32 Updated by mgriessmeier over 1 year ago

  • Target version changed from Milestone 28 to Milestone 30

needs to be discussed offline

#33 Updated by okurz over 1 year ago

  • Subject changed from [functional][u][svirt][sporadic] "Connection timed out" to [functional][u][svirt][sporadic] auto_review:"IO::Socket::INET: connect: Connection timed out"

#34 Updated by okurz over 1 year ago

  • Blocks deleted (action #49961: Prevent svirt backend to hang on virsh undefine command causing job timeouts/incompletes)

#35 Updated by okurz over 1 year ago

  • Related to action #49961: Prevent svirt backend to hang on virsh undefine command causing job timeouts/incompletes added

#36 Updated by szarate over 1 year ago

  • Status changed from New to Rejected

I haven't seen this for a while

Also available in: Atom PDF