Project

General

Profile

Actions

action #41048

closed

job_networks not reliably deleted

Added by coolo about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2018-09-14
Due date:
% Done:

0%

Estimated time:

Description

Done jobs are not supposed to have a job_networks entry after ac9f540ef945520ae209e0298f3a37487ec71eb9,
so this needs to be looked at:

openqa=> delete from job_networks where job_id in (select id from jobs where id in (select job_id from job_networks) and state='done');
DELETE 348
Actions #1

Updated by coolo about 6 years ago

  • Target version changed from Ready to Current Sprint
  • Difficulty set to medium
Actions #2

Updated by coolo about 6 years ago

  • Assignee set to coolo

looking into this - all problematic jobs right now were incompletes.

Actions #3

Updated by coolo about 6 years ago

The last chunk happened during restart of the webui:

Sep 17 11:48:59 openqaworker3 worker[28323]: [info] 30572: WORKING 2061750
Sep 17 11:49:02 openqaworker3 qemu-system-x86_64[30591]: looking for plugins in '/usr/lib64/sasl2', failed to open directory, error: No such file or directory
Sep 17 11:58:45 openqaworker3 worker[28323]: [error] Unable to upgrade connection for host "openqa.suse.de" to WebSocket: [no code]. Apache modules proxy_wstunnel and rewrite enabled?
Sep 17 11:58:49 openqaworker3 worker[28323]: [error] Connection error: Connection refused (remaining tries: 2)
Sep 17 11:58:54 openqaworker3 worker[28323]: [error] Connection error: Connection refused (remaining tries: 1)
Sep 17 11:58:55 openqaworker3 worker[28323]: [error] Unable to upgrade connection for host "openqa.suse.de" to WebSocket: [no code]. Apache modules proxy_wstunnel and rewrite enabled?
Sep 17 11:58:59 openqaworker3 worker[28323]: [error] Connection error: Connection refused (remaining tries: 0)
Sep 17 11:58:59 openqaworker3 worker[28323]: [error] Job aborted because web UI doesn't accept updates anymore (likely considers this job dead)
Sep 17 11:59:00 openqaworker3 worker[28323]: [error] Connection error: Connection refused (remaining tries: 2)
Sep 17 11:59:05 openqaworker3 worker[28323]: [error] Unable to upgrade connection for host "openqa.suse.de" to WebSocket: [no code]. Apache modules proxy_wstunnel and rewrite enabled?
Sep 17 11:59:05 openqaworker3 worker[28323]: [error] Connection error: Connection refused (remaining tries: 1)
Sep 17 11:59:09 openqaworker3 worker[28323]: [info] registering worker openqaworker3 version 13 with openQA openqa.suse.de using protocol version [1]
Sep 17 11:59:09 openqaworker3 worker[28323]: [error] unable to connect to host openqa.suse.de, retry in 10s
Sep 17 11:59:10 openqaworker3 worker[28323]: [error] Connection error: Connection refused (remaining tries: 0)

So the worker didn't even abort itself, this must be dead job detection then

Actions #4

Updated by coolo about 6 years ago

If it was, we should see log_warning 'Dead job... - and we don't
Plus everything setting results to incomplete use done() call, which deletes the networks. Puzzling

Actions #5

Updated by coolo about 6 years ago

  • Subject changed from job_networks not reliabled deleted to job_networks not reliably deleted
[2018-09-17T17:56:23.0342 CEST] [debug] [pid:20240] enqueuing abort for 2065142 621
[2018-09-17T17:56:30.0279 CEST] [debug] [DBIx debug] Took 0.00421119 seconds executed: UPDATE job_modules SET result = ? WHERE ( ( job_id = ? AND result = ? ) ): 'none', '2065142', 'running'.
[2018-09-17T17:56:30.0282 CEST] [debug] [DBIx debug] Took 0.00050807 seconds executed: UPDATE jobs SET result = ?, state = ?, t_finished = ?, t_updated = ? WHERE ( id = ? ): 'incomplete', 'done', '2018-09-17 15:56:30', '2018-09-17 15:56:30', '2065142'.

No mention of jobs_networks

Actions #6

Updated by coolo about 6 years ago

  • Status changed from New to Resolved
Actions #7

Updated by szarate about 6 years ago

  • Target version changed from Current Sprint to Done
Actions

Also available in: Atom PDF