Project

General

Profile

Actions

action #31351

closed

[functional][u][medium] force_cron_run does not actually run any crons (occasionally)

Added by StefanBruens over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 17
Start date:
2018-02-03
Due date:
2018-07-03
% Done:

0%

Estimated time:
Difficulty:
medium

Description

Observation

openQA test in scenario opensuse-Tumbleweed-DVD-aarch64-lxde@aarch64
should run the cron jobs, but does not run any.
force_cron_run

For unknown reason, ${RUN} in /usr/lib/cron/run-crons is empty.

Reproducible

Circumstances leading to running crons not yet known, e.g. for xfce using
the same snapshot the cron scripts are run:
https://openqa.opensuse.org/tests/600008#step/force_cron_run/6

Expected result

Compare with:
https://openqa.opensuse.org/tests/600008#step/force_cron_run/6
which has

STATUS=
'[' '!' -z ' daily' ']'
...
echo 'running daily cronjob scripts'

The ' daily' is the value of the ${RUN} variable.

Last good: 20180201 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 11 (0 open11 closed)

Related to openQA Tests - action #25554: [functional][bsc#1063638][u] soft fail in force_cron_run is too strictResolvedzluo2017-09-252018-05-22

Actions
Related to openQA Tests - coordination #35302: [qe-core][opensuse][functional][epic][sporadic] Various unstable tests on o3ResolvedSLindoMansilla2018-04-26

Actions
Related to openQA Tests - action #36304: [opensuse][functional]sporadic][u] chromium test is unstableResolvedzluo2018-05-172018-09-25

Actions
Related to openQA Tests - action #38090: [sle][functional] Do not mask systemd timers on system_performance scenariosResolvedokurz2018-07-02

Actions
Related to openQA Tests - action #38093: [sle][functional] Rename the module force_cron_run, it is no more only for cron jobsResolvedokurz2018-07-02

Actions
Related to openQA Tests - action #18558: [sle][functional][u][hard][investigation] Snapper tests run too longResolvedSLindoMansilla2017-04-132018-07-03

Actions
Related to openQA Tests - action #38228: [sle12sp4][functional][u][fast] test fails in force_scheduled_tasks - need wait more time than 90s for the cron jobs finishResolvedokurz2018-07-052018-07-17

Actions
Related to openQA Tests - action #41459: [sle][functional][u] Explicit test module for btrfs snapshots cleanup performanceRejectedmgriessmeier2018-08-01

Actions
Has duplicate openQA Tests - action #37916: [sle][functional][u] test fails in gnome_control_center - control center doesn't show upRejected2018-06-27

Actions
Blocks openQA Tests - action #37354: [opensuse][functional][u][sporadic][medium] test fails in desktop_runner is unstableResolvedokurz2018-06-142018-07-17

Actions
Blocks openQA Tests - action #37662: [opensuse][functional][u] test fails in multi_users_dmResolvedokurz2018-06-222018-08-14

Actions
Actions #1

Updated by StefanBruens over 6 years ago

  • Related to action #25554: [functional][bsc#1063638][u] soft fail in force_cron_run is too strict added
Actions #2

Updated by okurz over 6 years ago

  • Subject changed from force_cron_run does not actually run any crons (occasionally) to [functional]force_cron_run does not actually run any crons (occasionally)
  • Target version set to Milestone 16
Actions #3

Updated by okurz over 6 years ago

  • Subject changed from [functional]force_cron_run does not actually run any crons (occasionally) to [functional][u]force_cron_run does not actually run any crons (occasionally)
  • Due date set to 2018-05-08
Actions #4

Updated by okurz over 6 years ago

  • Related to action #34210: [functional][u][medium]updates_packagekit_gpk restarts the updater several times added
Actions #5

Updated by okurz over 6 years ago

  • Related to deleted (action #34210: [functional][u][medium]updates_packagekit_gpk restarts the updater several times)
Actions #6

Updated by mgriessmeier over 6 years ago

  • Subject changed from [functional][u]force_cron_run does not actually run any crons (occasionally) to [functional][u][medium] force_cron_run does not actually run any crons (occasionally)
  • Status changed from New to Workable
Actions #7

Updated by cwh over 6 years ago

  • Difficulty set to medium
Actions #8

Updated by zluo over 6 years ago

  • Assignee set to zluo

will check this problem

Actions #9

Updated by zluo over 6 years ago

  • Status changed from Workable to In Progress

checking at first since it is an occasional issue:

https://openqa.opensuse.org/tests/664936

Actions #11

Updated by zluo over 6 years ago

  • Status changed from In Progress to Resolved

checked tests for 2 months ago and recent test run don't show up any issue with crons on o3.

Actions #12

Updated by StefanBruens over 6 years ago

  • Status changed from Resolved to Workable

Have you even read the original issue description?

Both test runs you linked to shows no sign of any started cron jobs. If the jobs were started, there would be a line stating "Running cronjob scripts"

Actions #13

Updated by zluo over 6 years ago

  • Status changed from Workable to In Progress
Actions #14

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-05-08 to 2018-05-22
Actions #15

Updated by zluo over 6 years ago

increase timeout to 120:

sub run {
select_console 'root-console';
# show dmesg output in console during cron run
assert_script_run "dmesg -n 7";
# Make sure there's no load before we trigger one via cron.
settle_load;
my $before = time;
assert_script_run "bash -x /usr/lib/cron/run-crons", 1000;
record_soft_failure 'bsc#1063638 - review I/O scheduling parameters of btrfsmaintenance' if (time - $before) > 120 && get_var('SOFTFAIL_BSC1063638');
sleep 3; # some head room for the load average to rise
settle_load;
# return dmesg output to normal
assert_script_run "dmesg -n 1";
}

got expected results (6 times):

http://e13.suse.de/tests/2448#step/force_cron_run/6

More test runs triggered now.

Actions #16

Updated by zluo over 6 years ago

checked cron job rules:

## Type:         string
## Default:      ""
#
# At which time cron.daily should start. Default is 15 minutes after booting
# the system. Example setting would be "14:00".
# Due to the fact that cron script runs only every 15 minutes,
# it will only run on xx:00, xx:15, xx:30, xx:45, not at the accurate time
# you set.
DAILY_TIME=""
## Type:         integer
## Default:      5
#
# Maximum days not running when using a fixed time set in DAILY_TIME.
# 0 to skip this. This is for users who will power off their system.
#
# There is a fixed max. of 14 days set,  if you want to override this
# change MAX_NOT_RUN_FORCE in /usr/lib/cron/run-crons
MAX_NOT_RUN="5"

--

this is the problem I'm afraid...

Actions #17

Updated by StefanBruens over 6 years ago

There is another aspect here:
The check was added, AFAIK, to "flush" any btrfs maintenance jobs, to avoid these influence any later running jobs.

Now, the btrfs cron jobs have been migrated to systemd timers:

$> systemctl list-timers
NEXT                          LEFT                LAST                          PASSED             UNIT                         ACTIVATES
Thu 2018-05-17 19:00:00 CEST  59min left          Thu 2018-05-17 18:00:03 CEST  16s ago            snapper-timeline.timer       snapper-timeline.service
Fri 2018-05-18 00:00:00 CEST  5h 59min left       Thu 2018-05-17 00:00:05 CEST  18h ago            logrotate.timer              logrotate.service
Fri 2018-05-18 00:05:01 CEST  6h left             Thu 2018-05-17 01:59:50 CEST  16h ago            backup-sysconfig.timer       backup-sysconfig.service
Fri 2018-05-18 00:10:57 CEST  6h left             Thu 2018-05-17 00:20:53 CEST  17h ago            backup-rpmdb.timer           backup-rpmdb.service
Fri 2018-05-18 01:32:24 CEST  7h left             Thu 2018-05-17 01:11:32 CEST  16h ago            check-battery.timer          check-battery.service
Fri 2018-05-18 10:15:22 CEST  16h left            Thu 2018-05-17 10:15:22 CEST  7h ago             snapper-cleanup.timer        snapper-cleanup.service
Fri 2018-05-18 10:20:22 CEST  16h left            Thu 2018-05-17 10:20:22 CEST  7h ago             systemd-tmpfiles-clean.timer systemd-tmpfiles-clean.service
Mon 2018-05-21 00:00:00 CEST  3 days left         Mon 2018-05-14 00:00:08 CEST  3 days ago         btrfs-balance.timer          btrfs-balance.service
Mon 2018-05-21 00:00:00 CEST  3 days left         Mon 2018-05-14 00:00:08 CEST  3 days ago         btrfs-trim.timer             btrfs-trim.service
Mon 2018-05-21 00:00:00 CEST  3 days left         Mon 2018-05-14 00:00:08 CEST  3 days ago         fstrim.timer                 fstrim.service
Fri 2018-06-01 00:00:00 CEST  2 weeks 0 days left Tue 2018-05-01 00:00:08 CEST  2 weeks 2 days ago btrfs-scrub.timer            btrfs-scrub.service

It should be comparatively easy to check for any soon-expiring timers (LEFT column), and trigger the the corresponding services immediately. Hopefully, retriggering the services short time after will be cheap.

Actions #18

Updated by zluo over 6 years ago

I run "bash -x /usr/lib/cron/run-crons" on just rebooted sles 15. No cron job got executed.

To me this command can work (it still depends on system timers.), but it has not to work as we assumed for executing cron jobs on a running system with very short online duration.

I'll ask coolo about this. We might think about to run each cron jobs separately. But at first I want to make sure about:

Summary: Avoid suprises later and run the cron jobs explicitly
Actions #19

Updated by zluo over 6 years ago

http://e13.suse.de/tests/2518#step/force_cron_run/6

my $run_daily_cronjobs = 'for file in /etc/cron.daily/*; do echo execute daily cron jobs: $file; $file; done';
assert_script_run "$run_daily_cronjobs", 1000;
Actions #21

Updated by StefanBruens over 6 years ago

and once a week and once a months QA jobs still fail, as the weekly/monthly maintenance jobs suddenly run ...

Actions #22

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-05-22 to 2018-06-05
Actions #23

Updated by zluo over 6 years ago

updated:

find /etc/cron.{hourly,daily,weekly,monthly} -type f -executable -exec echo run cron job: {} \; -exec {} \;

Actions #24

Updated by zluo over 6 years ago

info:
customers need to provide their own scripts for weekly, monthly, we don't provide packages for them, for hourly we have acronie-anacron which needs to be installed extra. systemd timers will replace cron job anyway. We can now check systemctl list-timers for details. but the question what should we do at moment.

coolo suggests to keep calling run-cron-jobs, but on top of it disable all systemd timers

Actions #25

Updated by zluo over 6 years ago

PR updated now.

Actions #26

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-06-05 to 2018-06-19
  • Status changed from In Progress to Feedback
  • Target version changed from Milestone 16 to Milestone 17

PR not merged yet, setting to feedback

Actions #27

Updated by okurz over 6 years ago

  • Target version changed from Milestone 17 to Milestone 17
Actions #28

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-06-19 to 2018-07-03

discussion in PR is ongoing - moving

Actions #29

Updated by SLindoMansilla over 6 years ago

  • Related to coordination #35302: [qe-core][opensuse][functional][epic][sporadic] Various unstable tests on o3 added
Actions #30

Updated by SLindoMansilla over 6 years ago

  • Blocks action #37354: [opensuse][functional][u][sporadic][medium] test fails in desktop_runner is unstable added
Actions #31

Updated by okurz over 6 years ago

  • Has duplicate action #37916: [sle][functional][u] test fails in gnome_control_center - control center doesn't show up added
Actions #32

Updated by okurz over 6 years ago

  • Related to action #36304: [opensuse][functional]sporadic][u] chromium test is unstable added
Actions #33

Updated by okurz over 6 years ago

  • Priority changed from Normal to High

So now more and more tasks depend on this, bumping prio

Actions #34

Updated by zluo over 6 years ago

  • Status changed from Feedback to In Progress

working again on this issue

Actions #35

Updated by riafarov over 6 years ago

  • Blocks action #36730: [sle][functional][y][medium] no route to the host 10.0.2.1 iscsi_client added
Actions #36

Updated by okurz over 6 years ago

  • Blocks action #37662: [opensuse][functional][u] test fails in multi_users_dm added
Actions #39

Updated by SLindoMansilla over 6 years ago

PR merged, creating two tickets to handle:

Actions #40

Updated by SLindoMansilla over 6 years ago

  • Related to action #38090: [sle][functional] Do not mask systemd timers on system_performance scenarios added
Actions #41

Updated by SLindoMansilla over 6 years ago

  • Related to action #38093: [sle][functional] Rename the module force_cron_run, it is no more only for cron jobs added
Actions #42

Updated by SLindoMansilla over 6 years ago

  • Status changed from In Progress to Resolved

Now that the following steps to be done are properly defined on follow up tickets (following agile management), I can consider this planned ticket as resolved.

Next tasks (that were not planned for this sprint) are ready to be refined and planned:

Actions #43

Updated by SLindoMansilla over 6 years ago

  • Related to action #18558: [sle][functional][u][hard][investigation] Snapper tests run too long added
Actions #44

Updated by okurz over 6 years ago

  • Related to action #38228: [sle12sp4][functional][u][fast] test fails in force_scheduled_tasks - need wait more time than 90s for the cron jobs finish added
Actions #45

Updated by riafarov about 6 years ago

  • Blocks deleted (action #36730: [sle][functional][y][medium] no route to the host 10.0.2.1 iscsi_client )
Actions #46

Updated by okurz about 6 years ago

  • Related to action #41459: [sle][functional][u] Explicit test module for btrfs snapshots cleanup performance added
Actions

Also available in: Atom PDF