Project

General

Profile

Actions

action #108476

closed

coordination #103962: [saga][epic] Easy multi-machine handling: MM-tests as first-class citizens

The siblings jobs with START_DIRECTLY_AFTER_TEST are all cancelled

Added by tonyyuan about 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-03-17
Due date:
% Done:

0%

Estimated time:

Description

I updated openQA recently and found a issue:
Multiple child jobs start directly after a parent job. If one child fails/cancel all siblings jobs will be cancelled.
Here is the jobs overview: http://openqa.qam.suse.cz/tests/overview?groupid=160&build=tony%3Atest&distri=sle&version=12-SP5
Here is dependency: http://openqa.qam.suse.cz/tests/38178#dependencies

The qam_virt_install_host_xen qam-xen-install uses yaml schedule files, The other use main.pm


Files

Actions #1

Updated by tonyyuan about 2 years ago

Worker log: journalctl -u openqa-worker@9.service -xn 15000

Mar 14 11:27:59 void worker[4955]: [info] Accepting job 38178 from queue
Mar 14 11:27:59 void worker[4955]: [info] +++ setup notes +++
Mar 14 11:27:59 void worker[4955]: [info] Running on void:9 (Linux 5.3.18-150300.59.54-default #1 SMP Sat Mar 5 10:00:50 UTC 2022 (1d0fa95) x86_64)
Mar 14 11:27:59 void worker[4955]: [info] Preparing cgroup to start isotovideo
Mar 14 11:27:59 void worker[4955]: [info] Using cgroup /sys/fs/cgroup/systemd/openqa.slice/openqa-worker.slice/openqa-worker@9.service/38178
Mar 14 11:27:59 void worker[4955]: [info] Starting isotovideo container
Mar 14 11:27:59 void worker[21982]: [info] 21982: WORKING 38178
Mar 14 11:27:59 void worker[4955]: [info] isotovideo has been started (PID: 21982)
Mar 14 11:28:59 void worker[4955]: [warn] Websocket connection to http://void.qam.suse.cz/api/v1/ws/16 finished by remote side with code 1006, no reason - trying again in 10 seconds
Mar 14 11:29:09 void worker[4955]: [info] Registering with openQA http://void.qam.suse.cz
Mar 14 11:29:09 void worker[4955]: [info] Establishing ws connection via ws://void.qam.suse.cz/api/v1/ws/16
...

Mar 14 13:14:21 void worker[4955]: [info] Establishing ws connection via ws://void.qam.suse.cz/api/v1/ws/16
Mar 14 13:14:21 void worker[4955]: [info] Registered and connected via websockets with openQA host http://void.qam.suse.cz and worker ID 16
Mar 14 13:15:21 void worker[4955]: [warn] Websocket connection to http://void.qam.suse.cz/api/v1/ws/16 finished by remote side with code 1006, no reason - trying again in 10 seconds
Mar 14 13:15:31 void worker[4955]: [info] Registering with openQA http://void.qam.suse.cz
Mar 14 13:15:31 void worker[4955]: [info] Establishing ws connection via ws://void.qam.suse.cz/api/v1/ws/16
Mar 14 13:15:31 void worker[4955]: [info] Registered and connected via websockets with openQA host http://void.qam.suse.cz and worker ID 16
Mar 14 13:16:31 void worker[4955]: [warn] Websocket connection to http://void.qam.suse.cz/api/v1/ws/16 finished by remote side with code 1006, no reason - trying again in 10 seconds
Mar 14 13:16:41 void worker[4955]: [info] Registering with openQA http://void.qam.suse.cz
Mar 14 13:16:41 void worker[4955]: [info] Establishing ws connection via ws://void.qam.suse.cz/api/v1/ws/16
Mar 14 13:16:41 void worker[4955]: [info] Registered and connected via websockets with openQA host http://void.qam.suse.cz and worker ID 16
Mar 14 13:17:37 void worker[4955]: [info] Isotovideo exit status: 1
Mar 14 13:17:37 void worker[4955]: [info] +++ worker notes +++
Mar 14 13:17:37 void worker[4955]: [info] End time: 2022-03-14 12:17:37
Mar 14 13:17:37 void worker[4955]: [info] Result: died
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-dmesg.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-localhost.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-sles11sp4HVMx32.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-sles11sp4PVx64.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-sles12sp3PV.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-sles12sp4HVM.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-sles12sp5HVM.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-sles12sp5PV.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-sles15HVM.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-sles15sp1PV.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-sles15sp2HVM.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-sles15sp2PV.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-sles15sp3HVM.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading smoketest-spectre-meltdown-checker-sles15sp3PV.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading stresstest-stresstest-sles15sp2HVM.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading stresstest-stresstest-sles15sp3PV.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading vars.json
Mar 14 13:17:37 void worker[25388]: [info] Uploading autoinst-log.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading worker-log.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading serial0.txt
Mar 14 13:17:37 void worker[25388]: [info] Uploading video_time.vtt
Mar 14 13:17:37 void worker[25388]: [info] Uploading serial_terminal.txt
Mar 14 13:17:39 void worker[4955]: [info] Will cancel job 38171 later as requested by the web UI
Mar 14 13:17:39 void worker[4955]: [info] Will cancel job 38172 later as requested by the web UI
Mar 14 13:17:39 void worker[4955]: [info] Will cancel job 38173 later as requested by the web UI
Mar 14 13:17:39 void worker[4955]: [info] Will cancel job 38174 later as requested by the web UI
Mar 14 13:17:39 void worker[4955]: [info] Will cancel job 38175 later as requested by the web UI
Mar 14 13:17:39 void worker[4955]: [info] Will cancel job 38179 later as requested by the web UI
Mar 14 13:17:39 void worker[4955]: [info] Will cancel job 38180 later as requested by the web UI
Mar 14 13:17:39 void worker[4955]: [info] Skipping job 38179 from queue (parent failed with result died)
Mar 14 13:17:39 void worker[4955]: [info] Skipping job 38180 from queue (parent failed with result skipped)
Mar 14 13:17:39 void worker[4955]: [info] Skipping job 38176 from queue (parent failed with result skipped)
Mar 14 13:17:39 void worker[4955]: [info] Skipping job 38171 from queue (parent failed with result skipped)
Mar 14 13:17:39 void worker[4955]: [info] Skipping job 38172 from queue (parent failed with result skipped)
Mar 14 13:17:39 void worker[4955]: [info] Skipping job 38173 from queue (parent failed with result skipped)
Mar 14 13:17:39 void worker[4955]: [info] Skipping job 38174 from queue (parent failed with result skipped)
Mar 14 13:17:39 void worker[4955]: [info] Skipping job 38175 from queue (parent failed with result skipped)
Mar 14 13:19:21 void worker[4955]: [warn] Websocket connection to http://void.qam.suse.cz/api/v1/ws/16 finished by remote side with code 1006, no reason - trying again in 10 seconds
Mar 14 13:19:31 void worker[4955]: [info] Establishing ws connection via ws://void.qam.suse.cz/api/v1/ws/16
Mar 14 13:19:31 void worker[4955]: [info] Registered and connected via websockets with openQA host http://void.qam.suse.cz and worker ID 16
Mar 14 13:20:31 void worker[4955]: [warn] Websocket connection to http://void.qam.suse.cz/api/v1/ws/16 finished by remote side with code 1006, no reason - trying again in 10 seconds
Mar 14 13:20:41 void worker[4955]: [info] Registering with openQA http://void.qam.suse.cz
Mar 14 13:20:41 void worker[4955]: [info] Establishing ws connection via ws://void.qam.suse.cz/api/v1/ws/16
Mar 14 13:20:41 void worker[4955]: [info] Registered and connected via websockets with openQA host http://void.qam.suse.cz and worker ID 16:q

Actions #2

Updated by okurz about 2 years ago

  • Category set to Support
  • Priority changed from Normal to Low
  • Target version set to Ready
Actions #3

Updated by mkittler about 2 years ago

  • Assignee set to mkittler

Looks like there were connection problems when running http://openqa.qam.suse.cz/tests/38178#dependencies. If those are happen for too long when picking the next job of the chain these jobs would be cancelled. However, here it seems not that simple. The reason for cancelling the jobs seems that the worker thinks the web UI actually wanted to cancel those jobs. Not sure why that would be the case. Maybe a bug in the stale job detection. It would be good to have the web UI logs from that time as well.

Actions #4

Updated by mkittler about 2 years ago

  • Status changed from New to In Progress

I think there are two things that have gone wrong here (besides the connection problems):

  1. The "Will cancel job 38171 later as requested by the web UI" log messages indicate that the stale job detection tried cancelling jobs too aggressively here.
  2. The previous point is however not the reason the jobs were skipped. If it was the reason we should see e.g. "Skipping job 38179 from queue (web UI sent command cancel)" but instead we see "Skipping job 38179 from queue (parent failed with result died)". The previous job's result is wrongly considered here as they're only siblings. (Only the result of the parent job 38170 should be considered.) Maybe there's a bug in the chain processing within the worker code.

The first point is hard to investigate without web UI logs so I'm focusing on the second point for now.

Actions #5

Updated by mkittler about 2 years ago

Point 2. is not a recent regression; the skipping logic is just behaving wrong in certain cases.

Actions #6

Updated by tonyyuan about 2 years ago

Hello,
I don't know where is "web UI logs" located. I am wondering if the web logging is enabled by default. Can you tell me how to enable/collect "web UI logs" so I can help you get them?

Actions #7

Updated by okurz about 2 years ago

  • Category changed from Support to Regressions/Crashes

As explained by mkittler this affects multiple people.

tonyyuan wrote:

Hello,
I don't know where is "web UI logs" located. I am wondering if the web logging is enabled by default. Can you tell me how to enable/collect "web UI logs" so I can help you get them?

sudo journalctl -e -u openqa-webui should help.

Actions #8

Updated by tonyyuan about 2 years ago

okurz wrote:

As explained by mkittler this affects multiple people.

tonyyuan wrote:

Hello,
I don't know where is "web UI logs" located. I am wondering if the web logging is enabled by default. Can you tell me how to enable/collect "web UI logs" so I can help you get them?

sudo journalctl -e -u openqa-webui should help.

I only got this out on Mar 14:

Mar 09 11:36:53 void openqa-webui-daemon[5469]: [info] Worker 5469 started
Mar 09 11:36:53 void openqa-webui-daemon[4985]: [info] Creating process id file "/tmp/prefork.pid"
Mar 09 11:36:53 void openqa-webui-daemon[5470]: [info] Worker 5470 started
Mar 14 17:14:50 void openqa-webui-daemon[4985]: [info] Worker 5441 stopped
Mar 14 17:14:50 void openqa-webui-daemon[29218]: [info] Worker 29218 started
Mar 15 09:24:05 void openqa-webui-daemon[4985]: [info] Worker 5469 stopped
Mar 15 09:24:05 void openqa-webui-daemon[6966]: [info] Worker 6966 started

Actions #9

Updated by mkittler about 2 years ago

PR to address point 2: https://github.com/os-autoinst/openQA/pull/4571

The logic is still flawed, see last paragraph of the PR description.

Actions #10

Updated by mkittler about 2 years ago

  • Status changed from In Progress to Feedback
  • Parent task set to #103962

The PR should now be complete. It has unit tests and I've also tested different cases manually with the full stack.

The mentioned web UI logs aren't sufficient. I suppose they are only from the top level service (which does the preforking). For now I'd just leave it at solving problem 2 (from #108476#note-4) and see where this leaves us.

Actions #11

Updated by mkittler about 2 years ago

The PR has been merged and deployed on OSD. Try upgrading your instance (web UI and workers) to test it.

Actions #12

Updated by tonyyuan almost 2 years ago

Hello,

It looks like that this issue is not fixed. I manually canceled job 39739. Its sibling job 39740 was canceled too.
http://openqa.qam.suse.cz/tests/overview?version=15-SP3&groupid=156&build=%3Asplit%3Axen%3Atony&distri=sle

openQA-4.6.1648473912.da11be75c-lp153.4875.1.noarch
os-autoinst-4.6.1648584869.9383ae9d-lp153.1177.1.x86_64

journalctl -u openqa-worker@5.service -xn 500

Mar 31 14:58:10 void worker[14497]: [info] Accepting job 39739 from queue
Mar 31 14:58:10 void worker[14497]: [info] +++ setup notes +++
Mar 31 14:58:10 void worker[14497]: [info] Running on void:5 (Linux 5.3.18-150300.59.54-default #1 SMP Sat Mar 5 10:00:50 UTC 2022 (1d0fa95) x86_64)
Mar 31 14:58:10 void worker[14497]: [info] Preparing cgroup to start isotovideo
Mar 31 14:58:10 void worker[14497]: [info] Using cgroup /sys/fs/cgroup/systemd/openqa.slice/openqa-worker.slice/openqa-worker@5.service/39739
Mar 31 14:58:10 void worker[14497]: [info] Starting isotovideo container
Mar 31 14:58:10 void worker[14497]: [info] isotovideo has been started (PID: 31837)
Mar 31 14:58:10 void worker[31837]: [info] 31837: WORKING 39739
Mar 31 15:21:30 void worker[14497]: [info] Isotovideo exit status: 1
Mar 31 15:21:30 void worker[14497]: [info] +++ worker notes +++
Mar 31 15:21:30 void worker[14497]: [info] End time: 2022-03-31 13:21:30
Mar 31 15:21:30 void worker[14497]: [info] Result: cancel
Mar 31 15:21:30 void worker[14497]: [info] Will cancel job 39740 later as requested by the web UI
Mar 31 15:21:30 void worker[563]: [info] Uploading vars.json
Mar 31 15:21:30 void worker[563]: [error] REST-API error (POST http://void.qam.suse.cz/api/v1/jobs/39739/status): Connection error: Premature connection close (remaining tries: 59)
Mar 31 15:21:30 void worker[563]: [info] Uploading autoinst-log.txt
Mar 31 15:21:30 void worker[563]: [info] Uploading worker-log.txt
Mar 31 15:21:30 void worker[563]: [info] Uploading serial0.txt
Mar 31 15:21:30 void worker[563]: [info] Uploading video_time.vtt
Mar 31 15:21:30 void worker[563]: [info] Uploading serial_terminal.txt
Mar 31 15:21:37 void worker[14497]: [info] Skipping job 39740 from queue (web UI sent command cancel)

Actions #13

Updated by mkittler almost 2 years ago

I tested cancelling locally before even merging my latest changes and it worked as expected (directly chained siblings are not cancelled). I'll re-do the test locally.

However, before I waste too much time with needless investigation: Are you absolutely certain that also the worker (and the openQA-worker package) has been updated? Both, web UI and worker need to be up-to-date.

Actions #14

Updated by mkittler almost 2 years ago

It works fine here, e.g. in this example directly-chained-03-child still runs after directly-chained-02-child has been cancelled while it was running; only its child is cancelled. Before that I cancelled directly-chained-01-child while directly-chained-parent was still running and it still attempted to run directly-chained-02-child while directly-chained-01-child wasn't even started.

This should cover your case and actually almost all possible cases (cancellation of currently running job, cancellation of job ahead in the queue, siblings are still executed but children skipped). (I've also tested cancelling a job before the jobs are assigned to a worker and it works as well.)


In your case Mar 31 15:21:30 void worker[14497]: [info] Will cancel job 39740 later as requested by the web UI and Mar 31 15:21:37 void worker[14497]: [info] Skipping job 39740 from queue (web UI sent command cancel) are logged. That means the web UI somehow cancels the job explicitly and the cancellation is not a result of a bug in the worker's queue handling. My last changes only fixed the latter so there's a good chance that something is still broken on the web UI side. In #108476#note-4 I've already suspected the stale job detection. However, here it doesn't look like there are connection issues involved (except for Mar 31 15:21:30 void worker[563]: [error] REST-API error (POST http://void.qam.suse.cz/api/v1/jobs/39739/status): Connection error: Premature connection close (remaining tries: 59) which only happened after the cancellation). So I'm not sure yet why it doesn't work in your case. Maybe the audit log could give some clues but I cannot access it on your instance.

Actions #15

Updated by mkittler almost 2 years ago

When cloning your exact cluster I can reproduce it:

Here I've been cancelling the next job in the queue and the current job even turned out parallel_failed which is quite strange (and definitely a bug since there are no parallel dependencies here).

The logs look very similar to yours:

[debug] [pid:32378] Upload concluded (at bootloader_start)
[debug] [pid:32378] REST-API call: POST http://localhost:9526/api/v1/jobs/2895/status
[debug] [pid:32378] Upload concluded (at bootloader_start)
[debug] [pid:32378] REST-API call: POST http://localhost:9526/api/v1/jobs/2895/status
[debug] [pid:32378] Upload concluded (at bootloader_start)
[debug] [pid:32378] REST-API call: POST http://localhost:9526/api/v1/jobs/2895/status
[debug] [pid:32378] Upload concluded (at bootloader_start)
[debug] [pid:32378] REST-API call: POST http://localhost:9526/api/v1/jobs/2895/status
[debug] [pid:32378] Upload concluded (at bootloader_start)
[info] [pid:32378] Will cancel job 2896 later as requested by the web UI
[debug] [pid:32378] Stopping job 2895 from http://localhost:9526: 00002895-opensuse-Tumbleweed-DVD-x86_64-Build20220322-qam-xen-final@64bit-ipmi - reason: cancel
[debug] [pid:32378] REST-API call: POST http://localhost:9526/api/v1/jobs/2895/status
[info] [pid:32378] Will cancel job 2898 later as requested by the web UI
[info] [pid:32378] Will cancel job 2899 later as requested by the web UI
[info] [pid:32378] Will cancel job 2900 later as requested by the web UI
[info] [pid:32378] Will cancel job 2901 later as requested by the web UI
[info] [pid:32378] Will cancel job 2903 later as requested by the web UI
[info] [pid:32378] Will cancel job 2905 later as requested by the web UI
[debug] [pid:32378] Announcing job termination (due to cancel) to command server via http://localhost:20013/0lllCAxPrUT5JLcM/broadcast
[info] [pid:32378] Isotovideo exit status: 1
[info] [pid:32378] +++ worker notes +++
[info] [pid:32378] End time: 2022-03-31 16:59:38
[info] [pid:32378] Result: cancel

So the culprit is definitely the web UI but I can also exclude connection issues and the stall job detection. The audit log shows the expected job_cancel event (by my user) and command_enqueue events corresponding to the unexpected cancellations (by the system user). I'll continue with the investigation tomorrow. Somehow I expect that the very last job in the chain could be the cause. (It is a regularly chained job and thus not part of the directly chained cluster.)

Actions #16

Updated by tonyyuan almost 2 years ago

I could still reproduce the issues this morning. worker version: openQA-worker-4.6.1648720710.9f444c83a-lp153.4879.1.noarch

Apr 01 08:43:21 void worker[23777]: [info] Uploading serial0.txt
Apr 01 08:43:21 void worker[23777]: [info] Uploading video_time.vtt
Apr 01 08:43:21 void worker[23777]: [info] Uploading serial_terminal.txt
Apr 01 08:43:22 void worker[10846]: [info] Accepting job 39770 from queue
Apr 01 08:43:22 void worker[10846]: [info] +++ setup notes +++
Apr 01 08:43:22 void worker[10846]: [info] Running on void:11 (Linux 5.3.18-150300.59.54-default #1 SMP Sat Mar 5 10:00:50 UTC 2022 (1d0fa95) x86_64)
Apr 01 08:43:22 void worker[10846]: [info] Preparing cgroup to start isotovideo
Apr 01 08:43:22 void worker[10846]: [info] Using cgroup /sys/fs/cgroup/systemd/openqa.slice/openqa-worker.slice/openqa-worker@11.service/39770
Apr 01 08:43:22 void worker[10846]: [info] Starting isotovideo container
Apr 01 08:43:22 void worker[10846]: [info] isotovideo has been started (PID: 23784)
Apr 01 08:43:22 void worker[23784]: [info] 23784: WORKING 39770
Apr 01 08:44:54 void worker[10846]: [info] Isotovideo exit status: 1
Apr 01 08:44:54 void worker[10846]: [info] +++ worker notes +++
Apr 01 08:44:54 void worker[10846]: [info] End time: 2022-04-01 06:44:54
Apr 01 08:44:54 void worker[10846]: [info] Result: cancel
Apr 01 08:44:54 void worker[23940]: [info] Uploading vars.json
Apr 01 08:44:54 void worker[10846]: [info] Will cancel job 39763 later as requested by the web UI
Apr 01 08:44:54 void worker[10846]: [info] Will cancel job 39764 later as requested by the web UI
Apr 01 08:44:54 void worker[10846]: [info] Will cancel job 39765 later as requested by the web UI
Apr 01 08:44:54 void worker[10846]: [info] Will cancel job 39766 later as requested by the web UI
Apr 01 08:44:54 void worker[10846]: [info] Will cancel job 39767 later as requested by the web UI
Apr 01 08:44:54 void worker[10846]: [info] Will cancel job 39771 later as requested by the web UI
Apr 01 08:44:54 void worker[23940]: [error] REST-API error (POST http://void.qam.suse.cz/api/v1/jobs/39770/status): Connection error: Premature connection close (remaining tries: 59)
Apr 01 08:44:54 void worker[23940]: [info] Uploading autoinst-log.txt
Apr 01 08:44:54 void worker[23940]: [info] Uploading worker-log.txt
Apr 01 08:44:54 void worker[23940]: [info] Uploading serial0.txt
Apr 01 08:44:54 void worker[23940]: [info] Uploading video_time.vtt
Apr 01 08:44:54 void worker[10846]: [info] Skipping job 39771 from queue (web UI sent command cancel)
Apr 01 08:44:54 void worker[10846]: [info] Skipping job 39763 from queue (web UI sent command cancel)
Apr 01 08:44:54 void worker[10846]: [info] Skipping job 39764 from queue (web UI sent command cancel)
Apr 01 08:44:54 void worker[10846]: [info] Skipping job 39765 from queue (web UI sent command cancel)
Apr 01 08:44:54 void worker[10846]: [info] Skipping job 39766 from queue (web UI sent command cancel)
Apr 01 08:44:54 void worker[10846]: [info] Skipping job 39767 from queue (web UI sent command cancel)

Actions #17

Updated by tonyyuan almost 2 years ago

One thing should be noted: this issue might be brought in by some recent changes to openQA because I had never encountered it before we upgraded openQA instance about 3 weeks ago.

Actions #18

Updated by mkittler almost 2 years ago

I could still reproduce the issues this morning. worker version: openQA-worker-4.6.1648720710.9f444c83a-lp153.4879.1.noarch

Thanks for the info. Since I've investigated the problem nevertheless I can even reproduce it locally now. It is definitely only a web UI thing at this point as it can be reproduced without even starting a worker. When leaving the last regularly chained job out it can no longer be reproduced. So that kind of dependency must be the corner case the code is currently not handling well.

Actions #19

Updated by mkittler almost 2 years ago

Draft PR that should fix it: https://github.com/os-autoinst/openQA/pull/4590

One thing should be noted: this issue might be brought in by some recent changes to openQA because I had never encountered it before we upgraded openQA instance about 3 weeks ago.

Maybe some changes in the code changed the internal order of dependency traversal which in turn affected the cancellation behavior. However, it looks like it has been broken like this for quite a while.

Actions #20

Updated by mkittler almost 2 years ago

The PR has been merged but it hasn't been deployed on OSD yet. You can still try it on your own instance.

Actions #21

Updated by mkittler almost 2 years ago

The changes has been deployed on OSD since 05.04.22 07:23 CEST.

Actions #22

Updated by tonyyuan almost 2 years ago

Hello,

I just updated my openQA instance and worker. I could not reproduce issues. I think that this issue is fixed!

Thanks!

Actions #23

Updated by mkittler almost 2 years ago

  • Status changed from Feedback to Resolved

Glad to hear. I'm marking the issue as resolved then.

Actions

Also available in: Atom PDF