Project

General

Profile

Actions

action #126665

closed

Worker did not upload details when running into error size:M

Added by mkittler over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2023-03-27
Due date:
% Done:

0%

Estimated time:

Description

Observation

The job https://openqa.opensuse.org/tests/3193710 was cancelled but supposedly produced results that should be shown in the details tab (which only shows the blank list test modules) but are not present. The video and other logs have been uploaded so it is clear that the test went far enough to produce screenshots.

The worker log is very short and there's also nothing more to see in the journal on the worker host:

[2023-03-27T10:30:37.505926+02:00] [info] Preparing cgroup to start isotovideo
[2023-03-27T10:30:37.506643+02:00] [info] Using cgroup /sys/fs/cgroup/systemd/openqa.slice/openqa-worker.slice/openqa-worker-auto-restart@2.service/3193710
[2023-03-27T10:30:37.506879+02:00] [info] Starting isotovideo container
[2023-03-27T10:30:37.509020+02:00] [info] isotovideo has been started (PID: 20888)
[2023-03-27T10:30:37.509691+02:00] [info] 20888: WORKING 3193710
[2023-03-27T10:32:41.469453+02:00] [info] Uploading video.ogv
[2023-03-27T10:32:41.473856+02:00] [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/3193710/status): Connection error: Premature connection close (remaining tries: 59)
[2023-03-27T10:32:41.555177+02:00] [info] Uploading vars.json
[2023-03-27T10:32:41.602247+02:00] [info] Uploading autoinst-log.txt
[2023-03-27T10:32:41.664950+02:00] [info] Uploading worker-log.txt

The test went quite far within the boot module:

[2023-03-27T10:30:41.238096+02:00] [debug] [pid:20904] ||| starting boot /tests/install/boot.pm
…
[2023-03-27T10:30:42.378306+02:00] [debug] [pid:20982] no match: 29.0s, best candidate: gnome-desktop-20190509 (0.00)
[2023-03-27T10:30:43.386089+02:00] [debug] [pid:20904] >>> testapi::_handle_found_needle: found openqa-boot-menu-Tumbleweed-20190329, similarity 1.00 @ 65/11
…
[2023-03-27T10:32:12.827037+02:00] [debug] [pid:20982] no match: -0.4s, best candidate: gnome-desktop-20190509 (0.00)
[2023-03-27T10:32:13.110159+02:00] [debug] [pid:20904] >>> testapi::_check_backend_response: match=openqa-desktop timed out after 500 (assert_screen)
[2023-03-27T10:32:13.193770+02:00] [info] [pid:20904] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'openqa-desktop' matched
[2023-03-27T10:32:34.992796+02:00] [debug] [pid:20888] isotovideo: test execution will be resumed
[2023-03-27T10:32:34.996854+02:00] [debug] [pid:20904] lib/openQAcoretest.pm:13 called testapi::send_key
[2023-03-27T10:32:34.997097+02:00] [debug] [pid:20904] <<< testapi::send_key(key="ctrl-alt-f3", wait_screen_change=0)
[2023-03-27T10:32:35.400841+02:00] [debug] [pid:20904] lib/openQAcoretest.pm:24 called testapi::enter_cmd
[2023-03-27T10:32:35.401272+02:00] [debug] [pid:20904] <<< testapi::type_string(string="openqa-cli api jobs\n", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2023-03-27T10:32:36.156458+02:00] [debug] [pid:20904] lib/openQAcoretest.pm:26 called openQAcoretest::get_log -> lib/openQAcoretest.pm:8 called testapi::script_run
[2023-03-27T10:32:36.156731+02:00] [debug] [pid:20904] <<< testapi::script_run(cmd="journalctl --pager-end --no-tail --no-pager -u apache2 -u openqa-scheduler -u openqa-websockets -u openqa-webui -u openqa-worker\@1 | tee openqa_services.log", timeout=30, output="", die_on_timeout=-1, quiet=undef)
[2023-03-27T10:32:36.156907+02:00] [debug] [pid:20904] lib/openQAcoretest.pm:26 called openQAcoretest::get_log -> lib/openQAcoretest.pm:8 called testapi::script_run
[2023-03-27T10:32:36.157065+02:00] [debug] [pid:20904] <<< testapi::type_string(string="journalctl --pager-end --no-tail --no-pager -u apache2 -u openqa-scheduler -u openqa-websockets -u openqa-webui -u openqa-worker\@1 | tee openqa_services.log", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
…

Maybe some upload steps have been skipped because the job has been cancelled in a "bad moment".

Acceptance criteria

  • AC1: Job details are uploaded (unless there's really a severe problem indicated by the log or absence of logs).

Suggestions

  • Check code for hypothesis mentioned at the end of observation and other potential problems.
  • Maybe it did not work because the test was just in the first module and partial progress within the module is not uploaded? Maybe the first module is a specifically problematic in that regard?
Actions #1

Updated by okurz over 1 year ago

  • Tags set to reactive work
  • Target version set to Ready
Actions #2

Updated by livdywan over 1 year ago

  • Subject changed from Worker did not upload details when running into error to Worker did not upload details when running into error size:M
  • Status changed from New to Workable
Actions #3

Updated by tinita over 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to tinita
Actions #4

Updated by tinita over 1 year ago

  • Status changed from In Progress to Feedback

https://github.com/os-autoinst/openQA/pull/5072 Ensure final status update for cancelled jobs

Actions #5

Updated by tinita over 1 year ago

  • Status changed from Feedback to Resolved

Merged, deployed and tested: https://openqa.opensuse.org/tests/3209792

Actions

Also available in: Atom PDF