Project

General

Profile

Actions

action #182204

closed

[sporadic] Failure on openqa-Tumbleweed-dev-x86_64-Build:TW.36505-openqa_install_multimachine size:S

Added by gpuliti 23 days ago. Updated 8 days ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2025-05-12
Due date:
% Done:

0%

Estimated time:

Description

Observation

openqa-Tumbleweed-dev-x86_64-Build:TW.36505-openqa_install_multimachine@64bit-4G http://openqa.opensuse.org/tests/5053634: Unknown test issue, to be reviewed -> autoinst-log.txt

Name: openqa-Tumbleweed-dev-x86_64-Build:TW.36505-openqa_install_multimachine@64bit-4G
Result: failed
Reason: null
It might be a product bug, an outdated needle, test code needing adaptation or a test infrastructure related problem. Adding a bugref that can be carried over will prevent these mails for this issue. If the carry-over is not sufficient, you may want to create a ticket with auto-review-regex.

Last lines before SUT shutdown:

# --- 8< ---
# [2025-05-11T23:20:38.299860Z] [debug] [pid:11991] isotovideo done
# [2025-05-11T23:20:38.301126Z] [debug] [pid:11991] backend shutdown state: 1
# [2025-05-11T23:20:38.301811Z] [info] [pid:12336] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
# [2025-05-11T23:20:39.420487Z] [debug] [pid:12336] Passing remaining frames to the video encoder
# frame 3134 fps2.0 q0.0 Lsize    8027kB time :02:10.54 bitrate 503.7kbits/s speed0.0829x    
# video:8006kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.261387%
# [2025-05-11T23:20:40.063506Z] [debug] [pid:12336] Waiting for video encoder to finalize the video
# [2025-05-11T23:20:40.063587Z] [debug] [pid:12336] The external video encoder (pid 12527) terminated
# [2025-05-11T23:20:40.063637Z] [debug] [pid:12336] The built-in video encoder (pid 12528) terminated
# [2025-05-11T23:20:40.064243Z] [debug] [pid:12336] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 12336 (/usr/bin/isotovideo: backen)
# --- >8 ---

Test died on test_running.pm#L8 during test#5053634

# Test died: command 'retry -s 15 -r 120 -- sh -c '
        r=`openqa-cli api jobs test=ping_client | tee /dev/fd/2 |
        jq -r ".jobs | max_by(.id) | if .result != \"none\" then .result else .state end"`;
        echo $r | grep -q "incomplete\|failed" && killall retry;
        echo $r | grep -q "passed"'' failed at /usr/lib/os-autoinst/autotest.pm line 418.

Acceptance Criteria

  • AC1: openqa-in-openqa tests are passing consistently

Suggestions

  • Consider a possible regression in either the base OS or (more likely) openQA and try to find an upstream fix, not just test changes

Related issues 1 (0 open1 closed)

Related to openQA Project (public) - action #131102: [openQA-in-openQA] test fails in test_runningResolvedtinita2023-06-19

Actions
Actions #1

Updated by gpuliti 23 days ago

  • Description updated (diff)
Actions #2

Updated by gpuliti 23 days ago

  • Description updated (diff)
Actions #3

Updated by okurz 23 days ago

  • Tags changed from alert to alert, reactive work, multi-machine, openqa-in-openqa, sporadic
  • Category set to Regressions/Crashes
  • Priority changed from Normal to High
  • Target version set to Ready
Actions #4

Updated by gpuliti 20 days ago

  • Copied to action #182471: [Alert] Build failed in Jenkins: monitor-openQA_in_openQA-TW - openqa-cli produces "Connect timeout" added
Actions #5

Updated by gpuliti 20 days ago

  • Copied to deleted (action #182471: [Alert] Build failed in Jenkins: monitor-openQA_in_openQA-TW - openqa-cli produces "Connect timeout")
Actions #6

Updated by gpuliti 19 days ago

  • Subject changed from Sporadic failure on openqa-Tumbleweed-dev-x86_64-Build:TW.36505-openqa_install_multimachine to [sporadic] Failure on openqa-Tumbleweed-dev-x86_64-Build:TW.36505-openqa_install_multimachine size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #7

Updated by tinita 15 days ago

Just for reference: We can't work on that right now because of infrastructure problems.
Hopefully tomorrow.

Actions #8

Updated by okurz 8 days ago

  • Assignee set to okurz
Actions #9

Updated by okurz 8 days ago

  • Status changed from Workable to Resolved

https://openqa.opensuse.org/tests/5053634#comments shows that all retries passed and also https://openqa.opensuse.org/tests/5053634#next_previous shows that the next 200 jobs passed so we have a probability of that reproducing below 1.50% so I consider it not reproducible.

https://openqa.opensuse.org/tests/5053634#step/test_running/20 shows that two inner openQA jobs had actually be executed. Also https://openqa.opensuse.org/tests/5053634#step/test_running/3 shows that the jobs executed but they failed.

    assert_script_run qq{retry -s 15 -r 120 -- sh -c '
         r=`openqa-cli api jobs $api_query | tee /dev/fd/2 |
         jq -r ".jobs | max_by(.id) | if .result != \\"none\\" then .result else .state `;
         echo \$r | grep -q "incomplete\\|failed" && killall retry;
         echo \$r | grep -q "$success"'}, timeout => 1830;

ended on killall retry. The inner openQA test failed on https://openqa.opensuse.org/tests/5053634/logfile?filename=test_running-autoinst-log.txt#line-2934

[2025-05-11T19:14:33.236886-04:00] [debug] [pid:12717] >>> testapi::_check_backend_response: match=emergency-mode,emergency-shell,linux-login timed out after 500 (assert_screen)
[2025-05-11T19:14:33.461110-04:00] [info] [pid:12717] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'linux-login, emergency-shell, emergency-mode' matched

The videos from https://openqa.opensuse.org/tests/5053634/file/test_running-testresults.tar.xz showed the inner system just stuck on boot. I don't think there is anything more useful we can do here.

Actions #10

Updated by okurz 8 days ago

  • Related to action #131102: [openQA-in-openQA] test fails in test_running added
Actions

Also available in: Atom PDF