Project

General

Profile

action #58700

It seems we have "Test schedule has changed, reloading test_order.json" in *every* job now, I doubt the schedule actually changes

Added by okurz almost 2 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2019-10-25
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

See the worker log from every job. The message "Test schedule has changed, reloading test_order.json" is repeated very often when we would expect to

Acceptance criteria

Suggestions

History

#1 Updated by tinita almost 2 years ago

  • Status changed from New to In Progress
  • Assignee set to tinita

I see the message appearing once in every job.
I think the log message should just be written when the size or mtime really changed.
That's an easy fix.

#2 Updated by tinita almost 2 years ago

  • % Done changed from 0 to 100

#3 Updated by tinita almost 2 years ago

  • Status changed from In Progress to Feedback

#4 Updated by tinita almost 2 years ago

PR was merged

#7 Updated by MDoucha almost 2 years ago

FYI, this is not a bug. The log message was printed in every job on purpose to debug misconfigured workers. Specifically the exact kind of misconfiguration which has been blocking https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329 for the past month.

Edit: And yes, the test schedule does change when the log line gets printed the first time. From empty to something.

#8 Updated by cdywan almost 2 years ago

MDoucha wrote:

FYI, this is not a bug. The log message was printed in every job on purpose to debug misconfigured workers. Specifically the exact kind of misconfiguration which has been blocking https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329 for the past month.

Edit: And yes, the test schedule does change when the log line gets printed the first time. From empty to something.

Would you mind clarifying the use case here? How is the message being used for debugging?

The message, the way I read it, suggests the schedule changed, which implies it's not the usual schedule but it was explicitly updated. That's also what was discussed when the code was being worked on. If that were shown in every test regardless it would seem to defeat the point?

#9 Updated by tinita almost 2 years ago

  • Status changed from Feedback to Workable
  • % Done changed from 100 to 0

#10 Updated by tinita almost 2 years ago

  • Assignee deleted (tinita)

#11 Updated by mkittler over 1 year ago

  • Status changed from Workable to New

Considering the discussion is isn't clear what's left to be done.

#12 Updated by okurz over 1 year ago

  • Description updated (diff)
  • Status changed from New to Workable

I hope I could provide sufficient details that you also consider it "Workable" :)

#13 Updated by Xiaojing_liu over 1 year ago

May I have a try? I am interested in this feature, although I am not familiar with this.

#14 Updated by Xiaojing_liu over 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to Xiaojing_liu
  • Target version set to Current Sprint

#15 Updated by Xiaojing_liu over 1 year ago

I did some tests about this issue, here is test report:
For now we check that if the test_order has been changed by checking if the current_test_module exists or the file test_order.json modify time/size is different from before value. There are some reasons that maybe we should not use current_test_module

  1. the current test name is null between a test module finish and another test module start. In some case, the process can last more than 1 min. Here is an example: The message in os-autoinst.log
[2020-05-28T10:37:47.903 CST] [debug] ||| finished zypper_up update at 2020-05-28 02:37:47 (42 s)
[2020-05-28T10:37:47.904 CST] [debug] Creating a VM snapshot lastgood
[2020-05-28T10:37:47.905 CST] [debug] Saving snapshot (Current VM state is running).
...
[2020-05-28T10:38:25.585 CST] [debug] Snapshot complete
[2020-05-28T10:38:25.640 CST] [debug] ||| starting system_prepare tests/console/system_prepare.pm

Before ||| finished, the function done is called and it will set the current_test null.
here is the content of autoinst-status.json

[2020-05-28T10:38:24.0133 CST] [error] the test_status is {
  'status' => 'running',
  'test_execution_paused' => 0,
  'current_test' => undef
}

So from 10:37:47.904 to 10:38:25.640, the current_test is null, the log Test schedule has changed will be printed many times.

  1. When all test modules have been finished, the current_test will be set null, but there is still other mission, e.g. creating qcow file. During this process, the current_test does not exists, and the log Test schedule has changed... will be printed many times. Here is an example:
[2020-05-27T17:50:26.901 CST] [debug] ||| finished shutdown_ltp kernel at 2020-05-27 09:50:26 (26 s)
[2020-05-27T17:50:26.903 CST] [debug] stopping autotest process 24786
[2020-05-27T17:50:26.909 CST] [debug] [autotest] process exited: 0
[2020-05-27T17:50:26.909 CST] [debug] done with autotest process
[2020-05-27T17:50:26.910 CST] [debug] terminating command server 24783 because test execution ended
[2020-05-27T17:50:26.910 CST] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/4_8v1amG6lqwzlWH/broadcast
[2020-05-27T17:50:26.924 CST] [debug] commands process exited: 0
[2020-05-27T17:50:26.924 CST] [debug] done with command server
[2020-05-27T17:50:26.924 CST] [debug] isotovideo done
[2020-05-27T17:50:26.925 CST] [debug] BACKEND SHUTDOWN 1
[2020-05-27T17:50:26.926 CST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2020-05-27T17:50:28.964 CST] [debug] flushing frames
[2020-05-27T17:50:28.991 CST] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 24790 (/usr/bin/isotovideo: backen)
[2020-05-27T17:50:28.993 CST] [debug] sending magic and exit
[2020-05-27T17:50:28.993 CST] [debug] received magic close
[2020-05-27T17:50:29.026 CST] [debug] backend process exited: 0
[2020-05-27T17:50:29.027 CST] [info] ::: backend::qemu::do_extract_assets: Extracting (?^:^hd0$)
[2020-05-27T17:50:29.027 CST] [debug] running nice ionice qemu-img convert -c -O qcow2 /var/lib/openqa/pool/1/raid/hd0-overlay1 assets_public/sle-12-SP5-x86_64-4.12.14-212.1.gd51ffe9-Server-DVD-Incidents-Kernel@64bit-with-ltp.qcow2
[2020-05-27T17:53:47.656 CST] [debug] running qemu-img info --output=json assets_public/sle-12-SP5-x86_64-4.12.14-212.1.gd51ffe9-Server-DVD-Incidents-Kernel@64bit-with-ltp.qcow2
[2020-05-27T17:53:47.676 CST] [debug] stopping backend process 24790
[2020-05-27T17:53:47.676 CST] [debug] done with backend process

And the log in worker-log

[2020-05-27T17:52:21.0689 CST] [error] the test_status is {
  'test_execution_paused' => 0,
  'status' => 'finished',
  'current_test' => ''
}

In above examples, the test schedule has not changed. Because of the current_test_module does not exist, there are many logs. And there is one problem, in these situations, the openQA will update the job modules again and again, because openQA will update job modules when test_order is not empty.

#17 Updated by Xiaojing_liu over 1 year ago

  • Status changed from In Progress to Feedback

PR has been merged

#18 Updated by tinita over 1 year ago

  • Status changed from Feedback to In Progress
  • Assignee changed from Xiaojing_liu to tinita

#19 Updated by tinita over 1 year ago

Another attempt at fixing this: https://github.com/os-autoinst/openQA/pull/3180

#20 Updated by tinita over 1 year ago

  • Status changed from In Progress to Feedback

#22 Updated by tinita over 1 year ago

Created a related PR to avoid unnecessary schedule reloads: https://github.com/os-autoinst/openQA/pull/3183

#23 Updated by tinita about 1 year ago

  • Status changed from Feedback to Resolved

Please reopen if necessary.

Also available in: Atom PDF