action #28714: [tools] Investigate why sporadically job is set to scalar value of the reference instead of the reference itself. - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #28714

closed

[tools] Investigate why sporadically job is set to scalar value of the reference instead of the reference itself.

Added by EDiGiacinto over 7 years ago. Updated about 6 years ago.

Status:

Resolved

Priority:

Low

Assignee:

mkittler

Category:

Feature requests

Target version:

Current Sprint

Start date:

2017-12-01

Due date:

% Done:

Estimated time:

Description

It seems that under certain conditions (possibly websocket connection turned down) the worker sets the job to an invalid value.

Logs of that happening can be seen in #28355, currently we avoid that by not starting on invalid jobs (but this should not happen in first place - as the job will go from assigned back to scheduled - and can cause problems, e.g. wrt MM clusters).

ACs:

Investigate, verify that it still happen and fix it properly as #28355 is a workaround

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by EDiGiacinto over 7 years ago

Related to action #28355: [tools][bonus][Sprint 201711.2] Worker loop dies during job setup added

Actions

Copy link

Updated by EDiGiacinto over 7 years ago

Description updated (diff)

Actions

Copy link

Updated by EDiGiacinto almost 7 years ago

Related to coordination #32851: [tools][EPIC] Scheduling redesign added

Actions

Copy link

Updated by EDiGiacinto almost 7 years ago

Description updated (diff)

Actions

Copy link

Updated by EDiGiacinto almost 7 years ago

Description updated (diff)
Category set to 122
Priority changed from Normal to Low
Target version set to Ready

Setting as low and in the ready queue as we have workaround for it - but this is a bit scary, as can become a real problem (mostly for MM tests, as jumping back from assigned->scheduled makes things more complex ) and the workaround hides it from the logs.

Actions

Copy link

Updated by szarate almost 7 years ago

So this seems to be happening:

Apr 05 10:21:48 QA-Power8-4-kvm worker[41841]: [info] quit due to signal TERM
Apr 05 10:21:48 QA-Power8-4-kvm worker[41841]: Mojo::Reactor::Poll: Timer failed: Can't use string ("HASH(0xaf90080)") as a HASH ref while "strict refs" in use at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 151.

Actions

Copy link

Updated by szarate almost 7 years ago

@mudler's theory is that the string itself is comming from the webUI somehow.

Actions

Copy link

Updated by EDiGiacinto almost 7 years ago

For reference: https://openqa.suse.de/tests/1587182

Actions

Copy link

Updated by EDiGiacinto almost 7 years ago

Pr opened with temporary workaround: https://github.com/os-autoinst/openQA/pull/1618

Actions

Copy link

#10

Updated by EDiGiacinto over 6 years ago

Just saw this again:

Aug 01 12:03:04 openqaworker12 worker[26285]: Mojo::Reactor::Poll: I/O watcher failed: Can't use string ("HASH(0x9584728)") as a HASH ref while "strict refs" in use at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 522.

While i was testing new scheduler changes, but eventually jobs went back to scheduled.

Actions

Copy link

#11

Updated by EDiGiacinto over 6 years ago

Happened once again:

Aug 15 18:56:30 openqaworker6 worker[13360]: [error] Unable to upgrade connection for host "openqa.suse.de" to WebSocket: [no code]. proxy_wstunnel enabled?
Aug 15 18:56:08 openqaworker6 worker[13360]: [error] Unable to upgrade connection for host "openqa.suse.de" to WebSocket: [no code]. proxy_wstunnel enabled?
Aug 15 18:55:49 openqaworker6 worker[13360]: [error] Unable to upgrade connection for host "openqa.suse.de" to WebSocket: [no code]. proxy_wstunnel enabled?
Aug 15 18:55:31 openqaworker6 worker[13360]: [error] Unable to upgrade connection for host "openqa.suse.de" to WebSocket: [no code]. proxy_wstunnel enabled?
Aug 15 18:55:12 openqaworker6 worker[13360]: [error] Unable to upgrade connection for host "openqa.suse.de" to WebSocket: [no code]. proxy_wstunnel enabled?
Aug 14 17:37:13 openqaworker6 worker[13360]: Mojo::Reactor::Poll: I/O watcher failed: Can't use string ("HASH(0x9ec9610)") as a HASH ref while "strict refs" in use at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 522.

Actions

Copy link

#12

Updated by mkittler about 6 years ago

Status changed from New to In Progress
Assignee set to mkittler
Target version changed from Ready to 445

Note that line 522 is now 529:

sub start_job {
    my ($host) = @_;

    return _reset_state unless verify_job;
    # block the job from having dangerous settings (isotovideo specific though)
    # it needs to come from worker_settings
->  delete $job->{settings}->{GENERAL_HW_CMD_DIR};
    # add_log_channel('worker', path => 'worker-log.txt', level => $worker_settings->{LOG_LEVEL} // 'info');

    # update settings with worker-specific stuff
    copy_job_settings($job, $worker_settings);

Actions

Copy link

#13

Updated by mkittler about 6 years ago

PR: https://github.com/os-autoinst/openQA/pull/1917

Actions

Copy link

#14

Updated by mkittler about 6 years ago

Target version changed from 445 to Current Sprint

Actions

Copy link

#15

Updated by mkittler about 6 years ago

Status changed from In Progress to Resolved

Not sure whether we still see this in production. If we observe it again we can reopen the ticket. The PR has been merged so we should have a little bit better debug output.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #28714

[tools] Investigate why sporadically job is set to scalar value of the reference instead of the reference itself.

Updated by EDiGiacinto over 7 years ago

Updated by EDiGiacinto over 7 years ago

Updated by EDiGiacinto almost 7 years ago

Updated by EDiGiacinto almost 7 years ago

Updated by EDiGiacinto almost 7 years ago

Updated by szarate almost 7 years ago

Updated by szarate almost 7 years ago

Updated by EDiGiacinto almost 7 years ago

Updated by EDiGiacinto almost 7 years ago

Updated by EDiGiacinto over 6 years ago

Updated by EDiGiacinto over 6 years ago

Updated by mkittler about 6 years ago

Updated by mkittler about 6 years ago

Updated by mkittler about 6 years ago

Updated by mkittler about 6 years ago