action #58700
closedIt seems we have "Test schedule has changed, reloading test_order.json" in *every* job now, I doubt the schedule actually changes
0%
Description
Observation¶
See the worker log from every job. The message "Test schedule has changed, reloading test_order.json" is repeated very often when we would expect to
Acceptance criteria¶
- AC1: The test order is only re-evaluated if necessary
- AC2: Any logging output corresponds to actual actions, i.e. message "Test schedule has changed, reloading test_order.json" when test schedule has actually changed, e.g. worker log from https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=64bit&test=ltp_pty&version=15 only shows a single (or limited) "Test schedule has changed…"
- AC3: the test order is not re-evaluated for static schedules, e.g. worker log from https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=textmode&version=Tumbleweed does not show "Test schedule has changed, reloading test_order.json"
Suggestions¶
- Research what was the original idea of #53948 and https://github.com/os-autoinst/openQA/pull/2302 together with https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329 and the approaches of https://github.com/os-autoinst/openQA/pull/2439 and https://github.com/os-autoinst/openQA/pull/2445 trying to fix the problem
- Find a good test case based on the openQA scenarios mentioned above
- Ensure there is sufficient unit test coverage within os-autoinst for the feature
- Fix the real problem :)
Updated by tinita about 5 years ago
- Status changed from New to In Progress
- Assignee set to tinita
I see the message appearing once in every job.
I think the log message should just be written when the size or mtime really changed.
That's an easy fix.
Updated by tinita about 5 years ago
- % Done changed from 0 to 100
Updated by okurz about 5 years ago
Updated by tinita about 5 years ago
Updated by MDoucha about 5 years ago
FYI, this is not a bug. The log message was printed in every job on purpose to debug misconfigured workers. Specifically the exact kind of misconfiguration which has been blocking https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329 for the past month.
Edit: And yes, the test schedule does change when the log line gets printed the first time. From empty to something.
Updated by livdywan about 5 years ago
MDoucha wrote:
FYI, this is not a bug. The log message was printed in every job on purpose to debug misconfigured workers. Specifically the exact kind of misconfiguration which has been blocking https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8329 for the past month.
Edit: And yes, the test schedule does change when the log line gets printed the first time. From empty to something.
Would you mind clarifying the use case here? How is the message being used for debugging?
The message, the way I read it, suggests the schedule changed, which implies it's not the usual schedule but it was explicitly updated. That's also what was discussed when the code was being worked on. If that were shown in every test regardless it would seem to defeat the point?
Updated by tinita about 5 years ago
- Status changed from Feedback to Workable
- % Done changed from 100 to 0
Updated by mkittler over 4 years ago
- Status changed from Workable to New
Considering the discussion is isn't clear what's left to be done.
Updated by okurz over 4 years ago
- Description updated (diff)
- Status changed from New to Workable
I hope I could provide sufficient details that you also consider it "Workable" :)
Updated by Xiaojing_liu over 4 years ago
May I have a try? I am interested in this feature, although I am not familiar with this.
Updated by Xiaojing_liu over 4 years ago
- Status changed from Workable to In Progress
- Assignee set to Xiaojing_liu
- Target version set to Current Sprint
Updated by Xiaojing_liu over 4 years ago
I did some tests about this issue, here is test report:
For now we check that if the test_order has been changed by checking if the current_test_module
exists or the file test_order.json
modify time/size is different from before value. There are some reasons that maybe we should not use current_test_module
- the current test name is null between a test module finish and another test module start. In some case, the process can last more than 1 min. Here is an example: The message in os-autoinst.log
[0m[37m[2020-05-28T10:37:47.903 CST] [debug] ||| finished zypper_up update at 2020-05-28 02:37:47 (42 s)
[0m[37m[2020-05-28T10:37:47.904 CST] [debug] Creating a VM snapshot lastgood
[0m[37m[2020-05-28T10:37:47.905 CST] [debug] Saving snapshot (Current VM state is running).
...
[0m[37m[2020-05-28T10:38:25.585 CST] [debug] Snapshot complete
[0m[1;34m[2020-05-28T10:38:25.640 CST] [debug] ||| starting system_prepare tests/console/system_prepare.pm
Before ||| finished
, the function done
is called and it will set the current_test null.
here is the content of autoinst-status.json
[2020-05-28T10:38:24.0133 CST] [error] the test_status is {
'status' => 'running',
'test_execution_paused' => 0,
'current_test' => undef
}
So from 10:37:47.904
to 10:38:25.640
, the current_test is null, the log Test schedule has changed
will be printed many times.
- When all test modules have been finished, the current_test will be set null, but there is still other mission, e.g. creating qcow file. During this process, the current_test does not exists, and the log
Test schedule has changed...
will be printed many times. Here is an example:
[0m[37m[2020-05-27T17:50:26.901 CST] [debug] ||| finished shutdown_ltp kernel at 2020-05-27 09:50:26 (26 s)
[0m[37m[2020-05-27T17:50:26.903 CST] [debug] stopping autotest process 24786
[0m[37m[2020-05-27T17:50:26.909 CST] [debug] [autotest] process exited: 0
[0m[37m[2020-05-27T17:50:26.909 CST] [debug] done with autotest process
[0m[37m[2020-05-27T17:50:26.910 CST] [debug] terminating command server 24783 because test execution ended
[0m[37m[2020-05-27T17:50:26.910 CST] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/4_8v1amG6lqwzlWH/broadcast
[0m[37m[2020-05-27T17:50:26.924 CST] [debug] commands process exited: 0
[0m[37m[2020-05-27T17:50:26.924 CST] [debug] done with command server
[0m[37m[2020-05-27T17:50:26.924 CST] [debug] isotovideo done
[0m[37m[2020-05-27T17:50:26.925 CST] [debug] BACKEND SHUTDOWN 1
[0m[33m[2020-05-27T17:50:26.926 CST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[0m[37m[2020-05-27T17:50:28.964 CST] [debug] flushing frames
[0m[37m[2020-05-27T17:50:28.991 CST] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 24790 (/usr/bin/isotovideo: backen)
[0m[37m[2020-05-27T17:50:28.993 CST] [debug] sending magic and exit
[0m[37m[2020-05-27T17:50:28.993 CST] [debug] received magic close
[0m[37m[2020-05-27T17:50:29.026 CST] [debug] backend process exited: 0
[0m[33m[2020-05-27T17:50:29.027 CST] [info] ::: backend::qemu::do_extract_assets: Extracting (?^:^hd0$)
[0m[37m[2020-05-27T17:50:29.027 CST] [debug] running nice ionice qemu-img convert -c -O qcow2 /var/lib/openqa/pool/1/raid/hd0-overlay1 assets_public/sle-12-SP5-x86_64-4.12.14-212.1.gd51ffe9-Server-DVD-Incidents-Kernel@64bit-with-ltp.qcow2
[0m[37m[2020-05-27T17:53:47.656 CST] [debug] running qemu-img info --output=json assets_public/sle-12-SP5-x86_64-4.12.14-212.1.gd51ffe9-Server-DVD-Incidents-Kernel@64bit-with-ltp.qcow2
[0m[37m[2020-05-27T17:53:47.676 CST] [debug] stopping backend process 24790
[0m[37m[2020-05-27T17:53:47.676 CST] [debug] done with backend process
And the log in worker-log
[2020-05-27T17:52:21.0689 CST] [error] the test_status is {
'test_execution_paused' => 0,
'status' => 'finished',
'current_test' => ''
}
In above examples, the test schedule has not changed. Because of the current_test_module does not exist, there are many logs. And there is one problem, in these situations, the openQA will update the job modules again and again, because openQA will update job modules when test_order is not empty.
Updated by Xiaojing_liu over 4 years ago
Updated by Xiaojing_liu over 4 years ago
- Status changed from In Progress to Feedback
PR has been merged
Updated by tinita over 4 years ago
- Status changed from Feedback to In Progress
- Assignee changed from Xiaojing_liu to tinita
Updated by tinita over 4 years ago
Another attempt at fixing this: https://github.com/os-autoinst/openQA/pull/3180
Updated by tinita over 4 years ago
Updated by tinita over 4 years ago
Created a related PR to avoid unnecessary schedule reloads: https://github.com/os-autoinst/openQA/pull/3183
Updated by tinita over 4 years ago
- Status changed from Feedback to Resolved
Please reopen if necessary.