action #174196
closedCan't use an undefined value as a HASH reference at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318. size:S
Added by okurz 7 days ago. Updated 2 days ago.
0%
Description
Observation¶
From
https://openqa.suse.de/tests/16178611/logfile?filename=autoinst-log.txt
[2024-12-11T18:14:06.942976Z] [debug] [pid:118860] no match: -1.5s, best candidate: firstrun-linux-login-20241023 (0.00)
[2024-12-11T18:14:07.060490Z] [debug] [pid:118664] >>> testapi::_check_backend_response: match=emergency-mode,emergency-shell,linux-login timed out after 500 (assert_screen)
[2024-12-11T18:14:07.089099Z] [info] [pid:118664] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'linux-login, emergency-shell, emergency-mode' matched
[2024-12-11T18:14:07.101392Z] [debug] [pid:118664] lib/bootbasetest.pm:11 called testapi::check_screen
[2024-12-11T18:14:07.101508Z] [debug] [pid:118664] <<< testapi::check_screen(mustmatch="generic-login", timeout=0)
[2024-12-11T18:14:10.660274Z] [debug] [pid:118645] backend process exited: 0
[2024-12-11T18:14:10.660943Z] [warn] [pid:118645] !!! main: Can't use an undefined value as a HASH reference at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
OpenQA::Isotovideo::CommandHandler::_handle_command_check_screen(OpenQA::Isotovideo::CommandHandler=HASH(0x560986098fa8), HASH(0x5609860ba3c0), "check_screen") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 82
OpenQA::Isotovideo::CommandHandler::process_command(OpenQA::Isotovideo::CommandHandler=HASH(0x560986098fa8), GLOB(0x56097ed996a8), HASH(0x5609860ba3c0)) called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 71
OpenQA::Isotovideo::Runner::_read_response(OpenQA::Isotovideo::Runner=HASH(0x56097ee3f720), HASH(0x5609860ba3c0), GLOB(0x56097ed996a8)) called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 55
OpenQA::Isotovideo::Runner::run(OpenQA::Isotovideo::Runner=HASH(0x56097ee3f720)) called at /usr/bin/isotovideo line 187
eval {...} called at /usr/bin/isotovideo line 178
[2024-12-11T18:14:10.661301Z] [debug] [pid:118645] stopping backend process 118860
[2024-12-11T18:14:10.661369Z] [debug] [pid:118645] done with backend process
[2024-12-11T18:14:10.661432Z] [debug] [pid:118645] stopping command server 118654 because test execution ended through exception
[2024-12-11T18:14:10.861881Z] [debug] [pid:118645] done with command server
[2024-12-11T18:14:10.861961Z] [debug] [pid:118645] stopping autotest process 118664
[2024-12-11T18:14:10.862185Z] [debug] [pid:118664] autotest received signal TERM, saving results of current test before exiting
[2024-12-11T18:14:11.062437Z] [debug] [pid:118645] done with autotest process
118645: EXIT 0
[2024-12-11T18:14:11.117696Z] [info] [pid:82820] Isotovideo exit status: 0
[2024-12-11T18:14:13.167461Z] [info] [pid:82820] +++ worker notes +++
[2024-12-11T18:14:13.167638Z] [info] [pid:82820] End time: 2024-12-11 18:14:13
[2024-12-11T18:14:13.167742Z] [info] [pid:82820] Result: done
[2024-12-11T18:14:13.176030Z] [info] [pid:3961] Uploading autoinst-log.txt
according to fniederwanger "since a couple of days occasionally"
Updated by ph03nix 7 days ago
Ouf of my head: I've seen this issue since a couple of days, could be since last week.
Restart often fixes it, as https://openqa.suse.de/tests/16183465 shows. This test run passes the broken boot_to_desktop
part after two job restarts (16183400 and 16178611).
Updated by okurz 7 days ago
- Subject changed from Can't use an undefined value as a HASH reference at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318. to Can't use an undefined value as a HASH reference at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318. size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler 7 days ago
Judging by the last good test the screen is supposed to show an Ubuntu login prompt. This prompt is only partially shown in the failing cases. This maybe be caused by a problem in the VNC client code or by the SUT not updating the screen correctly.
The system had e.g. 8 minutes (from 2024-12-12T07:32:16 to 2024-12-12T07:40:39) to boot until Test died: no candidate needle with tag(s) 'linux-login, emergency-shell, emergency-mode' matched
was logged. So the mismatch was not caused by an error in os-autoinst declaring the mismatch too soon.
That means all other problems (including the Perl warning) are not really the source of the problem but of course also things that should be handled better.
Updated by mkittler 7 days ago
PR to avoid the distracting Perl warning: https://github.com/os-autoinst/os-autoinst/pull/2587
Note that neither the warning nor the potential backend problems are caused by recent changes. At least I haven't found any relevant commits.
Updated by mkittler 7 days ago
The VNC stall detection logged its usual message:
[2024-12-12T07:32:28.475041Z] [debug] [pid:106660] considering VNC stalled, no update for 4.07 seconds
[2024-12-12T07:32:28.730246Z] [debug] [pid:106660] pointer type 1 0 1024 768 -257
[2024-12-12T07:32:28.730386Z] [debug] [pid:106660] led state 0 1 1 -261
Considering pointer type and led state are logged again the stall detection probably did a re-connect as expected. So the VNC stall was maybe even handled correctly.
Unfortunately there's no video so it isn't possible to double check how the screen might have looked like.
Considering the serial log contains the login prompt¹ this is definitely just an issue of the graphical output or VNC. So it would be possible to workaround it by checking the serial log instead of doing a screen match.
¹
Ubuntu 24.04.1 LTS ubuntu24 hvc0
ubuntu24 login:
Updated by openqa_review 6 days ago
- Due date set to 2024-12-27
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler 6 days ago
- Status changed from In Progress to Resolved
It looks like this scenario is rather new. At least there's not much of a job history. So I checked other jobs:
openqa=> select count(id), array_agg(id), TEST, reason from jobs where t_finished >= '2024-12-01T00:00:00' and reason ilike '%use an undefined value as a HASH reference%CommandHandler%' group by TEST, reason order by count(id) desc;
count | array_agg | test |
reason
-------+---------------------------------------+-----------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------
--------------------------------------------------------------------------
4 | {16124511,16149412,16124478,16169203} | milvus_2.4_on_ubuntu_docker:investigate:retry | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
4 | {16146477,16115357,16116040,16155774} | php-fpm_8_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
3 | {16146757,16155885,16129289} | dotnet-aspnet_8.0_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
3 | {16129985,16170662,16146004} | openjdk_21_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
3 | {16159825,16172333,16171736} | apache-tomcat_9-openjdk21_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
3 | {16177674,16178611,16183400} | dotnet-sdk_8.0_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
2 | {16131872,16075246} | mariadb_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
2 | {16137178,16099323} | apache-pulsar_3.3_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
2 | {16163910,16129566} | openjdk-devel_21_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
2 | {16122052,16117461} | apache-tomcat_10.1-openjdk11_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
2 | {16148378,16156456} | dotnet-runtime_9.0_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
2 | {16130318,16148388} | dotnet-sdk_9.0_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
2 | {16170165,16160141} | helm_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
2 | {16146531,16129077} | python_3.6_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
2 | {16129177,16171005} | ruby_2.5_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16114487} | openjdk_11_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16128925} | php_8_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16145555} | postgres_16_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16129127} | python_3.11_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16172656} | python_3.12_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16114387} | rmt_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16129385} | rust_oldstable_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16116119} | rust_stable_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16170465} | sle-12-SP5_image_on_ubuntu | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16135334} | 389-ds_2.2_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16190382} | slem_open_vm_tools | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16135734} | apache-tomcat_9-openjdk11_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16171708} | apache-tomcat_9-openjdk17_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16172506} | apache-tomcat_9-openjdk21_on_ubuntu_docker:investigate:last_good_tests:bed8ca4bc7379de3ccc01f1d06b68a354f472f93 | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16160695} | bci-base_16.0_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16177977} | bci-init_15.6_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16123755} | bci-micro_16.0_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16110432} | combustion | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16160292} | cosign_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16148457} | dotnet-aspnet_9.0_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16156473} | dotnet-runtime_8.0_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16148060} | golang_oldstable_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16150463} | jeos-filesystem | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16147493} | nginx_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
1 | {16146786} | ollama_0.3_on_ubuntu_docker | isotovideo done: Can't use an undefined value as a HASH refere
nce at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
(40 rows)
So this symptom is happening almost excursively on those "ubuntu_docker" scenarios.
In https://openqa.suse.de/tests/16150463#step/firstrun/1 the underlying issue was not a failed needle match but some backend issue. It lead to the same warning so the warning itself is really meaningless:
corrupted size vs. prev_size
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":38023"
after 28808 requests (28651 known processed) with 0 events remaining.
[2024-12-09T20:14:51.408429Z] [debug] [pid:91221] backend process exited: 0
xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":38023"
[2024-12-09T20:14:51.410108Z] [info] [pid:91221] ::: backend::driver::_collect_orphan: Driver backend collected unknown process with pid 91828 and exit status: 1
[2024-12-09T20:14:51.410233Z] [info] [pid:91221] ::: backend::driver::_collect_orphan: Driver backend collected unknown process with pid 91831 and exit status: 0
[2024-12-09T20:14:51.410499Z] [info] [pid:91221] ::: backend::driver::_collect_orphan: Driver backend collected unknown process with pid 91829 and exit status: 84
[2024-12-09T20:14:51.410590Z] [info] [pid:91221] ::: backend::driver::_collect_orphan: Driver backend collected unknown process with pid 91830 and exit status: 0
[2024-12-09T20:14:51.411364Z] [warn] [pid:91221] !!! main: Can't use an undefined value as a HASH reference at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 318.
In https://openqa.suse.de/tests/16190382#step/esxi_open_vm_tools/107 there was also a screen mismatch after the screen was not updated anymore. Here the VNC problems are maybe related to the websocket tunneling for VMWare 7 which still doesn't seem to be fully stable.
I will not look into these different backend problems as part of this ticket. We can create a new ticket to investigate the cases as necessary.
For the "ubunut_docket" scenarios specifically I suggest using the developer mode to pause the test on an assert screen failure and when one catches the problematic state one can try to connect with a different VNC viewer to see whether the problem is SUT-specific or due to os-autoinst's VNC code being stuck. It would also be interesting to see whether os-autoinst's VNC code gets unstuck again when provoking bigger screen changes manually. As a workaround one could also match on the serial output as mentioned in #174196#note-6.
Updated by mkittler 6 days ago · Edited
- Status changed from Resolved to In Progress
It makes actually sense to check how the relevant scenarios behave now after the PR has been merged and it looks the warning was just traded with another one:
openqa=> select count(id), array_agg(id), TEST, reason from jobs where t_finished >= '2024-12-13T00:00:00' and result = 'failed' and TEST ilike '%ubuntu_docker' group by TEST, reason order by count(id) desc;
count | array_agg | test | reason
-------+---------------------+-----------------------------------------------+----------------------------------------------------------------------------------------------------------------------
2 | {16190810,16190864} | golang_oldstable-openssl_on_ubuntu_docker |
2 | {16190759,16190944} | golang_stable-openssl_on_ubuntu_docker |
1 | {16191164} | apache-tomcat_10.1-openjdk11_on_ubuntu_docker | isotovideo done: myjsonrpc: remote end terminated connection, stopping at /usr/lib/os-autoinst/myjsonrpc.pm line 43.
[2024-12-13T10:14:38.125541Z] [debug] [pid:74154] >>> testapi::_check_backend_response: match=emergency-mode,emergency-shell,linux-login timed out after 500 (assert_screen)
[2024-12-13T10:14:38.154388Z] [info] [pid:74154] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'linux-login, emergency-shell, emergency-mode' matched
[2024-12-13T10:14:38.166268Z] [debug] [pid:74154] lib/bootbasetest.pm:11 called testapi::check_screen
[2024-12-13T10:14:38.166407Z] [debug] [pid:74154] <<< testapi::check_screen(mustmatch="generic-login", timeout=0)
[2024-12-13T10:14:38.206747Z] [debug] [pid:73625] backend process exited: 0
[2024-12-13T10:14:38.207608Z] [warn] [pid:73625] !!! main: myjsonrpc: remote end terminated connection, stopping at /usr/lib/os-autoinst/myjsonrpc.pm line 43.
myjsonrpc::send_json(IO::Pipe::End=GLOB(0x55f836ed8c80), HASH(0x55f836ec6f68)) called at /usr/lib/os-autoinst/backend/driver.pm line 123
backend::driver::_send_json(backend::driver=HASH(0x55f838bddb00), HASH(0x55f836ec6f68)) called at /usr/lib/os-autoinst/OpenQA/Isotovideo/CommandHandler.pm line 406
OpenQA::Isotovideo::CommandHandler::check_asserted_screen(OpenQA::Isotovideo::CommandHandler=HASH(0x55f836ec2f78)) called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 58
OpenQA::Isotovideo::Runner::run(OpenQA::Isotovideo::Runner=HASH(0x55f82fb33ad0)) called at /usr/bin/isotovideo line 187
eval {...} called at /usr/bin/isotovideo line 178
Updated by mkittler 6 days ago
PR to avoid "myjsonrpc: remote end terminated connection" from showing up as reason: https://github.com/os-autoinst/os-autoinst/pull/2589
I think we normally don't see this error message as reason (as well as the warning we had before) because normally there's already a different reason recorded or the backend only exists after isotovideo exists its loop. In this situation the backend exists before the isotovideo loop finishes because autotest calls bmwqemu::stop_vm();
after the fatal test module failed and that happens before the "tests done" message is sent. Apparently the backend is very quick to respond here. Not sure whether it makes sense to change the architecture so this situation cannot happen in the first place. It would probably be "cleaner" than just avoiding to serialize the error as reason. However, it would also complicate things and potentially introduce other bugs.
Updated by mkittler 2 days ago
- Status changed from Feedback to Resolved
I re-ran the query. I could find jobs running into the problem again (https://openqa.suse.de/tests/16211593, https://openqa.suse.de/tests/16211048, https://openqa.suse.de/tests/16210945 and https://openqa.suse.de/tests/16210260) and this time there's no distracting reason.
About the underlying problem: see the last paragraph on #174196#note-8
With that I'm considering this ticket resolved.