Project

General

Profile

action #81835

`/dev/sshserial` is broken on generalhw backend

Added by ggardet_arm 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
Start date:
2021-01-06
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario opensuse-Tumbleweed-JeOS-for-RPi-aarch64-jeos-containers@RPi3 fails in
prepare_firstboot

/dev/sshserial is broken on generalhw backend, so all Raspberry Pi tests are red.

Test suite description

JeOS as container host. Test container runtimes (podman and docker) and related tools

Reproducible

Fails since (at least) Build 20210105 (current job)

Expected result

Last good: 20201228 (or more recent)

Further details

Always latest result in this scenario: latest

History

#1 Updated by okurz 3 months ago

  • Target version set to future

Unfortunately I do not have a good idea what could have broken this. I asume you have the best chances to fix this with access to Raspberry Pi runners.

#2 Updated by ggardet_arm 3 months ago

Latest working log:

GOT GO

[2020-12-29T18:25:54.776 CET] [debug] Snapshots are not supported
[2020-12-29T18:25:54.784 CET] [debug] ||| starting prepare_firstboot tests/jeos/prepare_firstboot.pm
[2020-12-29T18:25:54.787 CET] [debug] tests/jeos/prepare_firstboot.pm:36 called opensusebasetest::select_serial_terminal -> lib/opensusebasetest.pm:1240 called testapi::select_console
[2020-12-29T18:25:54.788 CET] [debug] <<< testapi::select_console(testapi_console="root-ssh")
/usr/lib/os-autoinst/consoles/vnc_base.pm:62:{
  "port" => 45377,
  "hostname" => "localhost",
  "ikvm" => 0
}
[2020-12-29T18:25:55.341 CET] [debug] Connected to Xvnc - PID 25526
icewm PID is 25530
[2020-12-29T18:25:56.359 CET] [debug] Wait for SSH on host 192.168.0.54 (timeout: 240)
xterm PID is 25714
[2020-12-29T18:27:18.455 CET] [debug] <<< backend::baseclass::start_ssh_serial(username="root", password="SECRET", hostname="192.168.0.54")
[2020-12-29T18:27:18.455 CET] [debug] <<< backend::baseclass::new_ssh_connection(hostname="192.168.0.54", username="root", password="SECRET")
[2020-12-29T18:27:18.603 CET] [debug] SSH connection to root@192.168.0.54 established
[2020-12-29T18:27:19.045 CET] [debug] ssh xterm vt: grabbing serial console
[2020-12-29T18:27:19.098 CET] [debug] led state 0 0 0 -261
[2020-12-29T18:27:19.113 CET] [debug] activate_console, console: root-ssh, type: ssh

VS current log:

GOT GO

[2021-01-07T09:33:08.399 CET] [debug] Snapshots are not supported
[2021-01-07T09:33:08.408 CET] [debug] ||| starting prepare_firstboot tests/jeos/prepare_firstboot.pm
[2021-01-07T09:33:08.411 CET] [debug] tests/jeos/prepare_firstboot.pm:36 called opensusebasetest::select_serial_terminal -> lib/opensusebasetest.pm:1242 called testapi::select_console
[2021-01-07T09:33:08.412 CET] [debug] <<< testapi::select_console(testapi_console="root-serial-ssh")
[2021-01-07T09:33:08.414 CET] [debug] Connecting SSH serial console for root@192.168.0.54
[2021-01-07T09:33:08.415 CET] [debug] <<< backend::baseclass::new_ssh_connection(password="SECRET", hostname="192.168.0.54", username="root")
[2021-01-07T09:33:11.560 CET] [debug] Could not connect to root@192.168.0.54, Retrying after some seconds...
[2021-01-07T09:33:24.690 CET] [debug] Could not connect to root@192.168.0.54, Retrying after some seconds...
[2021-01-07T09:33:37.800 CET] [debug] Could not connect to root@192.168.0.54, Retrying after some seconds...
[2021-01-07T09:33:50.930 CET] [debug] Could not connect to root@192.168.0.54, Retrying after some seconds...
[2021-01-07T09:34:04.040 CET] [debug] Could not connect to root@192.168.0.54, Retrying after some seconds...
[2021-01-07T09:34:14.050 CET] [info] ::: basetest::runtest: # Test died: Error connecting to <root@192.168.0.54>: No route to host at /usr/lib/os-autoinst/testapi.pm line 1701.

[2021-01-07T09:34:14.051 CET] [debug] lib/opensusebasetest.pm:1329 called opensusebasetest::select_log_console -> lib/opensusebasetest.pm:450 called testapi::select_console
[2021-01-07T09:34:14.051 CET] [debug] <<< testapi::select_console(testapi_console="log-console", timeout=180)
/usr/lib/os-autoinst/consoles/vnc_base.pm:62:{
  "hostname" => "localhost",
  "port" => 54677,
  "ikvm" => 0
}
[2021-01-07T09:34:14.309 CET] [debug] Connected to Xvnc - PID 17529
icewm PID is 17533
[2021-01-07T09:34:15.327 CET] [debug] Wait for SSH on host 192.168.0.54 (timeout: 240)
xterm PID is 17539
[2021-01-07T09:34:34.417 CET] [debug] led state 0 0 0 -261
[2021-01-07T09:34:34.432 CET] [debug] activate_console, console: log-console, type: ssh

#4 Updated by MDoucha 3 months ago

ggardet_arm wrote:

So, it seems the problem comes from: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/11625

The log says that SSH can't connect to the SUT at all.

You can try swapping $self->select_serial_terminal; for select_console('root-ssh'); to restore the original behavior here:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/tests/jeos/prepare_firstboot.pm#L36

By I seriously doubt that it'll work. You'll just get the same error because the SUT appears to be unreachable via network in the first place.

#5 Updated by ggardet_arm 3 months ago

MDoucha wrote:

ggardet_arm wrote:

So, it seems the problem comes from: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/11625

The log says that SSH can't connect to the SUT at all.

This is expected at this point as we just switched on the SUT. The previous behavior was waiting for the SUT appears on Network, with a timeout.

You can try swapping $self->select_serial_terminal; for select_console('root-ssh'); to restore the original behavior here:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/tests/jeos/prepare_firstboot.pm#L36

By I seriously doubt that it'll work. You'll just get the same error because the SUT appears to be unreachable via network in the first place.

It seems you recreated something which existed before, with a slightly different behavior. Your patch expect the SUT to be already reachable, which is not the case yet, here.

#6 Updated by MDoucha 3 months ago

ggardet_arm wrote:

This is expected at this point as we just switched on the SUT. The previous behavior was waiting for the SUT appears on Network, with a timeout.

It seems you recreated something which existed before, with a slightly different behavior. Your patch expect the SUT to be already reachable, which is not the case yet, here.

It's not safe to call select_serial_terminal before the SUT is fully booted. There are multiple console backends in there which require access to interactive shell on previous console to be activated.

#7 Updated by ggardet_arm 3 months ago

MDoucha wrote:

ggardet_arm wrote:

This is expected at this point as we just switched on the SUT. The previous behavior was waiting for the SUT appears on Network, with a timeout.

It seems you recreated something which existed before, with a slightly different behavior. Your patch expect the SUT to be already reachable, which is not the case yet, here.

It's not safe to call select_serial_terminal before the SUT is fully booted. There are multiple console backends in there which require access to interactive shell on previous console to be activated.

It worked perfectly fine before your patch.
All the code needed to wait for the SUT is there for more than a year 0 and used every day.

#8 Updated by MDoucha 3 months ago

ggardet_arm wrote:

It worked perfectly fine before your patch.
All the code needed to wait for the SUT is there for more than a year [0] and used every day.

[0]: https://github.com/os-autoinst/os-autoinst/pull/1304

Because your code was relying on special behavior of the root-ssh console. Select it explicitly with select_console('root-ssh'); and wait for login prompt. Then you can safely call select_serial_terminal if you want a better console than VNC.

#9 Updated by ggardet_arm 3 months ago

  • Status changed from New to Resolved
  • Assignee set to ggardet_arm

Also available in: Atom PDF