Project

General

Profile

action #126665

Worker did not upload details when running into error size:M

Added by mkittler 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2023-03-27
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

The job https://openqa.opensuse.org/tests/3193710 was cancelled but supposedly produced results that should be shown in the details tab (which only shows the blank list test modules) but are not present. The video and other logs have been uploaded so it is clear that the test went far enough to produce screenshots.

The worker log is very short and there's also nothing more to see in the journal on the worker host:

[2023-03-27T10:30:37.505926+02:00] [info] Preparing cgroup to start isotovideo
[2023-03-27T10:30:37.506643+02:00] [info] Using cgroup /sys/fs/cgroup/systemd/openqa.slice/openqa-worker.slice/openqa-worker-auto-restart@2.service/3193710
[2023-03-27T10:30:37.506879+02:00] [info] Starting isotovideo container
[2023-03-27T10:30:37.509020+02:00] [info] isotovideo has been started (PID: 20888)
[2023-03-27T10:30:37.509691+02:00] [info] 20888: WORKING 3193710
[2023-03-27T10:32:41.469453+02:00] [info] Uploading video.ogv
[2023-03-27T10:32:41.473856+02:00] [error] REST-API error (POST http://openqa1-opensuse/api/v1/jobs/3193710/status): Connection error: Premature connection close (remaining tries: 59)
[2023-03-27T10:32:41.555177+02:00] [info] Uploading vars.json
[2023-03-27T10:32:41.602247+02:00] [info] Uploading autoinst-log.txt
[2023-03-27T10:32:41.664950+02:00] [info] Uploading worker-log.txt

The test went quite far within the boot module:

[2023-03-27T10:30:41.238096+02:00] [debug] [pid:20904] ||| starting boot /tests/install/boot.pm
…
[2023-03-27T10:30:42.378306+02:00] [debug] [pid:20982] no match: 29.0s, best candidate: gnome-desktop-20190509 (0.00)
[2023-03-27T10:30:43.386089+02:00] [debug] [pid:20904] >>> testapi::_handle_found_needle: found openqa-boot-menu-Tumbleweed-20190329, similarity 1.00 @ 65/11
…
[2023-03-27T10:32:12.827037+02:00] [debug] [pid:20982] no match: -0.4s, best candidate: gnome-desktop-20190509 (0.00)
[2023-03-27T10:32:13.110159+02:00] [debug] [pid:20904] >>> testapi::_check_backend_response: match=openqa-desktop timed out after 500 (assert_screen)
[2023-03-27T10:32:13.193770+02:00] [info] [pid:20904] ::: basetest::runtest: # Test died: no candidate needle with tag(s) 'openqa-desktop' matched
[2023-03-27T10:32:34.992796+02:00] [debug] [pid:20888] isotovideo: test execution will be resumed
[2023-03-27T10:32:34.996854+02:00] [debug] [pid:20904] lib/openQAcoretest.pm:13 called testapi::send_key
[2023-03-27T10:32:34.997097+02:00] [debug] [pid:20904] <<< testapi::send_key(key="ctrl-alt-f3", wait_screen_change=0)
[2023-03-27T10:32:35.400841+02:00] [debug] [pid:20904] lib/openQAcoretest.pm:24 called testapi::enter_cmd
[2023-03-27T10:32:35.401272+02:00] [debug] [pid:20904] <<< testapi::type_string(string="openqa-cli api jobs\n", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2023-03-27T10:32:36.156458+02:00] [debug] [pid:20904] lib/openQAcoretest.pm:26 called openQAcoretest::get_log -> lib/openQAcoretest.pm:8 called testapi::script_run
[2023-03-27T10:32:36.156731+02:00] [debug] [pid:20904] <<< testapi::script_run(cmd="journalctl --pager-end --no-tail --no-pager -u apache2 -u openqa-scheduler -u openqa-websockets -u openqa-webui -u openqa-worker\@1 | tee openqa_services.log", timeout=30, output="", die_on_timeout=-1, quiet=undef)
[2023-03-27T10:32:36.156907+02:00] [debug] [pid:20904] lib/openQAcoretest.pm:26 called openQAcoretest::get_log -> lib/openQAcoretest.pm:8 called testapi::script_run
[2023-03-27T10:32:36.157065+02:00] [debug] [pid:20904] <<< testapi::type_string(string="journalctl --pager-end --no-tail --no-pager -u apache2 -u openqa-scheduler -u openqa-websockets -u openqa-webui -u openqa-worker\@1 | tee openqa_services.log", max_interval=250, wait_screen_change=0, wait_still_screen=0, timeout=30, similarity_level=47)
…

Maybe some upload steps have been skipped because the job has been cancelled in a "bad moment".

Acceptance criteria

  • AC1: Job details are uploaded (unless there's really a severe problem indicated by the log or absence of logs).

Suggestions

  • Check code for hypothesis mentioned at the end of observation and other potential problems.
  • Maybe it did not work because the test was just in the first module and partial progress within the module is not uploaded? Maybe the first module is a specifically problematic in that regard?

History

#1 Updated by okurz 2 months ago

  • Tags set to reactive work
  • Target version set to Ready

#2 Updated by cdywan 2 months ago

  • Subject changed from Worker did not upload details when running into error to Worker did not upload details when running into error size:M
  • Status changed from New to Workable

#3 Updated by tinita 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to tinita

#4 Updated by tinita 2 months ago

  • Status changed from In Progress to Feedback

https://github.com/os-autoinst/openQA/pull/5072 Ensure final status update for cancelled jobs

#5 Updated by tinita 2 months ago

  • Status changed from Feedback to Resolved

Merged, deployed and tested: https://openqa.opensuse.org/tests/3209792

Also available in: Atom PDF