action #41048

job_networks not reliably deleted

Added by coolo over 1 year ago. Updated over 1 year ago.

Status:ResolvedStart date:14/09/2018
Priority:NormalDue date:
Assignee:coolo% Done:

0%

Category:Concrete Bugs
Target version:Done
Difficulty:medium
Duration:

Description

Done jobs are not supposed to have a job_networks entry after ac9f540ef945520ae209e0298f3a37487ec71eb9,
so this needs to be looked at:

openqa=> delete from job_networks where job_id in (select id from jobs where id in (select job_id from job_networks) and state='done');
DELETE 348

History

#1 Updated by coolo over 1 year ago

  • Target version changed from Ready to Current Sprint
  • Difficulty set to medium

#2 Updated by coolo over 1 year ago

  • Assignee set to coolo

looking into this - all problematic jobs right now were incompletes.

#3 Updated by coolo over 1 year ago

The last chunk happened during restart of the webui:

Sep 17 11:48:59 openqaworker3 worker[28323]: [info] 30572: WORKING 2061750
Sep 17 11:49:02 openqaworker3 qemu-system-x86_64[30591]: looking for plugins in '/usr/lib64/sasl2', failed to open directory, error: No such file or directory
Sep 17 11:58:45 openqaworker3 worker[28323]: [error] Unable to upgrade connection for host "openqa.suse.de" to WebSocket: [no code]. Apache modules proxy_wstunnel and rewrite enabled?
Sep 17 11:58:49 openqaworker3 worker[28323]: [error] Connection error: Connection refused (remaining tries: 2)
Sep 17 11:58:54 openqaworker3 worker[28323]: [error] Connection error: Connection refused (remaining tries: 1)
Sep 17 11:58:55 openqaworker3 worker[28323]: [error] Unable to upgrade connection for host "openqa.suse.de" to WebSocket: [no code]. Apache modules proxy_wstunnel and rewrite enabled?
Sep 17 11:58:59 openqaworker3 worker[28323]: [error] Connection error: Connection refused (remaining tries: 0)
Sep 17 11:58:59 openqaworker3 worker[28323]: [error] Job aborted because web UI doesn't accept updates anymore (likely considers this job dead)
Sep 17 11:59:00 openqaworker3 worker[28323]: [error] Connection error: Connection refused (remaining tries: 2)
Sep 17 11:59:05 openqaworker3 worker[28323]: [error] Unable to upgrade connection for host "openqa.suse.de" to WebSocket: [no code]. Apache modules proxy_wstunnel and rewrite enabled?
Sep 17 11:59:05 openqaworker3 worker[28323]: [error] Connection error: Connection refused (remaining tries: 1)
Sep 17 11:59:09 openqaworker3 worker[28323]: [info] registering worker openqaworker3 version 13 with openQA openqa.suse.de using protocol version [1]
Sep 17 11:59:09 openqaworker3 worker[28323]: [error] unable to connect to host openqa.suse.de, retry in 10s
Sep 17 11:59:10 openqaworker3 worker[28323]: [error] Connection error: Connection refused (remaining tries: 0)

So the worker didn't even abort itself, this must be dead job detection then

#4 Updated by coolo over 1 year ago

If it was, we should see log_warning 'Dead job... - and we don't
Plus everything setting results to incomplete use done() call, which deletes the networks. Puzzling

#5 Updated by coolo over 1 year ago

  • Subject changed from job_networks not reliabled deleted to job_networks not reliably deleted
[2018-09-17T17:56:23.0342 CEST] [debug] [pid:20240] enqueuing abort for 2065142 621
[2018-09-17T17:56:30.0279 CEST] [debug] [DBIx debug] Took 0.00421119 seconds executed: UPDATE job_modules SET result = ? WHERE ( ( job_id = ? AND result = ? ) ): 'none', '2065142', 'running'.
[2018-09-17T17:56:30.0282 CEST] [debug] [DBIx debug] Took 0.00050807 seconds executed: UPDATE jobs SET result = ?, state = ?, t_finished = ?, t_updated = ? WHERE ( id = ? ): 'incomplete', 'done', '2018-09-17 15:56:30', '2018-09-17 15:56:30', '2065142'.

No mention of jobs_networks

#6 Updated by coolo over 1 year ago

  • Status changed from New to Resolved

#7 Updated by szarate over 1 year ago

  • Target version changed from Current Sprint to Done

Also available in: Atom PDF