action #116287
closed[qe-core][s390x] SSH serial terminal connection issues on s390x workers
0%
Description
s390x livepatch tests had a lot of installation failures this month due to SSH serial terminal connection failures. Interestingly enough, the connection failures seem to happen around the same module step. serial_terminal.txt output appears to be out of sync with the terminal because part of the commands and output is missing even though it's listed in the update_kernel module details. The dmesg output in serial0.txt often (but not always) shows some key exchange SSH error followed by output from a completely different job:
Welcome to SUSE Linux Enterprise Server 15 SP2 (s390x) - Kernel 5.3.18-24.83-default (ttysclp0).
eth0: 10.161.145.86 fe80::5054:ff:fe84:f877
susetest login: root
Password:
Last login: Mon Sep 5 10:18:10 from 10.160.0.147
susetest:~ #(B systemctl is-active network
active
susetest:~ #(B systemctl is-active sshd
active
susetest:~ #(B 2022-09-05T10:25:03.604370-04:00 susetest sshd[4272]: error: kex_exchange_identification: Connection closed by remote host
2022-09-05T10:25:04.844743-04:00 susetest sshd[4273]: error: kex_exchange_identification: Connection closed by remote host
[ 107.444474] LTP: starting DI000 (dirty)
[ 107.445525] LTP: starting DS000 (dio_sparse)
[ 107.466125] LTP: starting abort01
[ 107.758318] LTP: starting accept01
12-SP4: https://openqa.suse.de/tests/9438804#step/update_kernel/337
15-SP2: https://openqa.suse.de/tests/9457752#step/update_kernel/337
15-SP3: https://openqa.suse.de/tests/9458645#step/update_kernel/337
15-SP4: https://openqa.suse.de/tests/9455666#step/update_kernel/199
I could not find any such connection failure on SLE-12SP5. Other SLE releases don't support s390x livepatches and KOTD tests don't show this kind of issue. This looks like a kernel bug but I'd like an s390x expert to look at this before I create a Bugzilla ticket. And of course this has exposed logging issues in OpenQA.