Project

General

Profile

action #46682

[functional][y] lvm test need more than 2 hours to run - optimize select_patterns_and_packages

Added by michel_mno over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 23
Start date:
2019-01-25
Due date:
2019-02-26
% Done:

0%

Estimated time:
5.00 h
Difficulty:

Description

lvm test need more than 2 hours to run for ppc64le as per
https://openqa.opensuse.org/tests/837890

by increasing MAX_JOB_TIME the test completed as per
https://openqa.opensuse.org/tests/837707 (it lasted 2h23)

Acceptance criteria

  1. select_patterns_and_packages doesn't perform wasteful actions

Suggestions

In case we have "default" used for patterns, we can use early return if other patterns were already processed. In other scenarios this optimization is not possible as we have to go through the list of all patterns in order to deselect unwanted ones.


Related issues

Related to openQA Tests - coordination #47192: [sle][functional][y][epic] some openQA tests in staging take more than 50 minutesResolved2019-02-062019-06-04

Related to openQA Tests - action #47453: [qam][functional][y][fast] test fails in first_boot Resolved2019-02-122019-02-26

Related to openQA Tests - action #47924: [qam][functional][y][fast] wrong patterns in installation_overview on SLE 15Resolved2019-02-142019-02-26

History

#1 Updated by michel_mno over 2 years ago

I bypassed the problem adding MAX_JOB_TIME=9000 in o3 for LVM test.

For investigation captured here the time for each subtask:

$grep -E ' finished ' ~/Desktop/autoinst-log.txt 
[2019-01-25T09:20:18.074 UTC] [debug] ||| finished isosize installation at 2019-01-25 09:20:18 (1 s)
[2019-01-25T09:23:19.501 UTC] [debug] ||| finished bootloader installation at 2019-01-25 09:23:19 (181 s)
[2019-01-25T09:24:40.974 UTC] [debug] ||| finished welcome installation at 2019-01-25 09:24:40 (81 s)
[2019-01-25T09:24:50.596 UTC] [debug] ||| finished online_repos installation at 2019-01-25 09:24:50 (10 s)
[2019-01-25T09:24:52.014 UTC] [debug] ||| finished installation_mode installation at 2019-01-25 09:24:51 (1 s)
[2019-01-25T09:25:23.700 UTC] [debug] ||| finished logpackages installation at 2019-01-25 09:25:23 (32 s)
[2019-01-25T09:25:27.575 UTC] [debug] ||| finished system_role installation at 2019-01-25 09:25:27 (4 s)
[2019-01-25T09:25:29.376 UTC] [debug] ||| finished partitioning installation at 2019-01-25 09:25:29 (2 s)
[2019-01-25T09:25:33.413 UTC] [debug] ||| finished partitioning_lvm installation at 2019-01-25 09:25:33 (4 s)
[2019-01-25T09:25:48.934 UTC] [debug] ||| finished partitioning_finish installation at 2019-01-25 09:25:48 (15 s)
[2019-01-25T09:25:50.135 UTC] [debug] ||| finished installer_timezone installation at 2019-01-25 09:25:50 (2 s)
[2019-01-25T09:26:03.452 UTC] [debug] ||| finished user_settings installation at 2019-01-25 09:26:03 (13 s)
[2019-01-25T09:26:07.344 UTC] [debug] ||| finished installation_overview_before installation at 2019-01-25 09:26:07 (4 s)
[2019-01-25T09:58:42.708 UTC] [debug] ||| finished select_patterns_and_packages installation at 2019-01-25 09:58:42 (1955 s)
[2019-01-25T09:58:44.200 UTC] [debug] ||| finished installation_overview installation at 2019-01-25 09:58:44 (2 s)
[2019-01-25T09:59:43.823 UTC] [debug] ||| finished disable_grub_timeout installation at 2019-01-25 09:59:43 (59 s)
[2019-01-25T10:00:14.875 UTC] [debug] ||| finished start_install installation at 2019-01-25 10:00:14 (31 s)
[2019-01-25T10:07:03.554 UTC] [debug] ||| finished await_install installation at 2019-01-25 10:07:03 (409 s)
[2019-01-25T10:12:23.392 UTC] [debug] ||| finished logs_from_installation_system installation at 2019-01-25 10:12:23 (320 s)
[2019-01-25T10:12:24.809 UTC] [debug] ||| finished reboot_after_installation installation at 2019-01-25 10:12:24 (1 s)
[2019-01-25T10:12:36.544 UTC] [debug] ||| finished grub_test installation at 2019-01-25 10:12:36 (12 s)
[2019-01-25T10:13:29.319 UTC] [debug] ||| finished first_boot installation at 2019-01-25 10:13:29 (53 s)
[2019-01-25T10:16:57.743 UTC] [debug] ||| finished zypper_clear_repos update at 2019-01-25 10:16:57 (73 s)
[2019-01-25T10:17:47.011 UTC] [debug] ||| finished zypper_ar console at 2019-01-25 10:17:47 (50 s)
[2019-01-25T10:18:15.012 UTC] [debug] ||| finished zypper_ref console at 2019-01-25 10:18:15 (28 s)
[2019-01-25T10:21:31.870 UTC] [debug] ||| finished zypper_up update at 2019-01-25 10:21:31 (63 s)
[2019-01-25T10:24:25.243 UTC] [debug] ||| finished system_prepare console at 2019-01-25 10:24:25 (38 s)
[2019-01-25T10:26:50.314 UTC] [debug] ||| finished check_network console at 2019-01-25 10:26:50 (13 s)
[2019-01-25T10:27:13.899 UTC] [debug] ||| finished system_state console at 2019-01-25 10:27:13 (23 s)
[2019-01-25T10:28:06.233 UTC] [debug] ||| finished prepare_test_data console at 2019-01-25 10:28:06 (53 s)
[2019-01-25T10:32:22.458 UTC] [debug] ||| finished consoletest_setup console at 2019-01-25 10:32:22 (122 s)
[2019-01-25T10:48:04.636 UTC] [debug] ||| finished keymap_or_locale locale at 2019-01-25 10:48:04 (808 s)
[2019-01-25T10:57:33.338 UTC] [debug] ||| finished force_scheduled_tasks console at 2019-01-25 10:57:33 (436 s)
[2019-01-25T11:00:43.365 UTC] [debug] ||| finished textinfo console at 2019-01-25 11:00:43 (61 s)
[2019-01-25T11:02:04.615 UTC] [debug] ||| finished hostname console at 2019-01-25 11:02:04 (81 s)
[2019-01-25T11:04:25.019 UTC] [debug] ||| finished xorg_vt console at 2019-01-25 11:04:25 (14 s)
[2019-01-25T11:04:39.459 UTC] [debug] ||| finished zypper_lr console at 2019-01-25 11:04:39 (14 s)
[2019-01-25T11:05:03.211 UTC] [debug] ||| finished ncurses console at 2019-01-25 11:05:03 (24 s)
[2019-01-25T11:06:36.995 UTC] [debug] ||| finished yast2_lan console at 2019-01-25 11:06:36 (93 s)
[2019-01-25T11:07:04.267 UTC] [debug] ||| finished curl_https console at 2019-01-25 11:07:04 (28 s)
[2019-01-25T11:09:45.169 UTC] [debug] ||| finished salt console at 2019-01-25 11:09:45 (161 s)
[2019-01-25T11:10:24.689 UTC] [debug] ||| finished zypper_in console at 2019-01-25 11:10:24 (39 s)
[2019-01-25T11:12:11.560 UTC] [debug] ||| finished yast2_i console at 2019-01-25 11:12:11 (107 s)
[2019-01-25T11:12:47.003 UTC] [debug] ||| finished yast2_bootloader console at 2019-01-25 11:12:46 (35 s)
[2019-01-25T11:13:00.915 UTC] [debug] ||| finished vim console at 2019-01-25 11:13:00 (14 s)
[2019-01-25T11:13:10.382 UTC] [debug] ||| finished firewall_enabled console at 2019-01-25 11:13:10 (10 s)
[2019-01-25T11:19:38.404 UTC] [debug] ||| finished sshd console at 2019-01-25 11:19:38 (388 s)
[2019-01-25T11:22:12.291 UTC] [debug] ||| finished ssh_cleanup console at 2019-01-25 11:22:12 (17 s)
[2019-01-25T11:26:01.513 UTC] [debug] ||| finished sshfs console at 2019-01-25 11:26:01 (94 s)
[2019-01-25T11:26:19.995 UTC] [debug] ||| finished mtab console at 2019-01-25 11:26:19 (18 s)
[2019-01-25T11:39:55.813 UTC] [debug] ||| finished consoletest_finish console at 2019-01-25 11:39:55 (816 s)

#2 Updated by michel_mno over 2 years ago

compare in autoinst-log.txt the subtests working time of two snapshots (20190110 versus 20190121)
https://openqa.opensuse.org/tests/830336/file/autoinst-log.txt
https://openqa.opensuse.org/tests/837707/file/autoinst-log.txt
and identify two that drastically increased:
keymap_or_locale locale from (67 s) to (808 s)
consoletest_finish console from (79 s) to (816 s)

#3 Updated by okurz over 2 years ago

  • Subject changed from lvm test need more than 2 hours to run to [functional][u] lvm test need more than 2 hours to run
  • Category set to Bugs in existing tests

good observation. I can see in https://openqa.opensuse.org/tests/837890#step/keymap_or_locale/4 that there is a repetition in detection of the displaymanager/screenshot/lock-screen . Well, that sounds like https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6552 . I suggest you talk to ggardet or investigate further in this direction.

#4 Updated by okurz over 2 years ago

  • Status changed from New to In Progress
  • Assignee set to okurz

Hm, I think I know what's going on: In https://openqa.opensuse.org/tests/839191#step/keymap_or_locale/5 you can see that the test looks for the "desktop-runner" which never opens up until https://openqa.opensuse.org/tests/839191#step/keymap_or_locale/21 . The reason is that the wrong hotkey "alt-f2" instead of "super-space" is triggered on icewm. https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6649 might fix that. I will try.

bash <(curl -s https://raw.githubusercontent.com/okurz/scripts/master/openqa-clone-custom-git-refspec) \
  https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6649 \
  https://openqa.opensuse.org/tests/839191

-> https://openqa.opensuse.org/tests/840629

ugh snapshot vanished again. Need to retrigger based on last

bash <(curl -s https://raw.githubusercontent.com/okurz/scripts/master/openqa-clone-custom-git-refspec) \
  https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6649 \
  https://openqa.opensuse.org/tests/839674

https://openqa.opensuse.org/tests/840814 passed. However it took 1:54h still. However, https://openqa.opensuse.org/tests/840814/file/autoinst-log.txt shows that keymap_or_locale took 63s, so back to the previous good value and consoletest_finish within 81s.

By far the biggest time contributor is now

$ curl -s https://openqa.opensuse.org/tests/840814/file/autoinst-log.txt | sed -n 's/^.*||| finished \([a-z_]*\).* (\([0-9]* s\))/\2 \1/p' | sort -n | tail -n 3
388 s sshd
392 s await_install
1997 s select_patterns_and_packages

older jobs were certainly taking less time, e.g. see https://openqa.opensuse.org/tests/837890#next_previous however I could not find useful logs from older jobs anymore so that we could compare individual module runtimes. I guess partially we have simply more modules scheduled but I am sure also we have some inefficiencies to make tests more robust. For now I suggest to simply increase MAX_JOB_TIME as planned and we can plan individual tickets as follow-up to reduce the runtime by either scheduling some test modules elsewhere or optimizing runtime within each module.

#5 Updated by okurz over 2 years ago

  • Status changed from In Progress to Feedback

#6 Updated by okurz over 2 years ago

PR merged. Test triggered for validation: https://openqa.opensuse.org/tests/842682 -> passed

#7 Updated by okurz over 2 years ago

  • Subject changed from [functional][u] lvm test need more than 2 hours to run to [functional][y] lvm test need more than 2 hours to run
  • Due date set to 2019-02-26
  • Status changed from Feedback to Workable
  • Assignee deleted (okurz)
  • Target version set to Milestone 23

riafarov can you comment on the last part mentioned in #46682#note-4 regarding needing more than 30m to cover "select_patterns_and_packages"?

#8 Updated by riafarov over 2 years ago

select_patterns_and_packages is quite expensive test module to run as to get required setup we have to go through the full list of patterns. I cannot access any run where it took 30 minutes, but in other installations it still takes around 15 minutes. One of the optimizations we could introduce is to handle "default" case, once we are done with selecting other mentioned patters we can simply return instead of going through each test module and leave it as it is. I would propose to introduce this improvement as part of this ticket.

#9 Updated by riafarov over 2 years ago

  • Description updated (diff)

#10 Updated by okurz over 2 years ago

That sounds like a good idea. I suggest to also crosscheck with older tests, maybe SLE ones where we still have autoinst-log.txt and see if the runtime changed?

#11 Updated by okurz over 2 years ago

  • Related to coordination #47192: [sle][functional][y][epic] some openQA tests in staging take more than 50 minutes added

#12 Updated by okurz over 2 years ago

  • Subject changed from [functional][y] lvm test need more than 2 hours to run to [functional][y] lvm test need more than 2 hours to run - optimize select_patterns_and_packages

#13 Updated by riafarov over 2 years ago

  • Estimated time set to 5.00 h

@jrivera has a good idea that in case of default not provided in the PATTERNS setting we still can optimize by e.g. unselecting all patterns as a first step (similar approach is used in allpatterns test suites).

#14 Updated by JERiveraMoya over 2 years ago

  • Assignee set to JERiveraMoya

#15 Updated by JERiveraMoya over 2 years ago

  • Status changed from Workable to In Progress

#16 Updated by JERiveraMoya over 2 years ago

  • Status changed from In Progress to Feedback

#18 Updated by riafarov over 2 years ago

https://progress.opensuse.org/issues/47453
This one broke. It's actually who has confused you, as I've said that we can do early return once we have processed all patterns (even not default), which is not the case. So we need to put the line you've provided originally when we checked default variable.

#19 Updated by JERiveraMoya over 2 years ago

  • Status changed from In Progress to Feedback

#20 Updated by bfilho over 2 years ago

  • Related to action #47453: [qam][functional][y][fast] test fails in first_boot added

#22 Updated by JERiveraMoya over 2 years ago

  • Status changed from In Progress to Feedback

#23 Updated by JERiveraMoya over 2 years ago

  • Status changed from Feedback to In Progress

#24 Updated by JERiveraMoya over 2 years ago

  • Status changed from In Progress to Resolved

#25 Updated by bfilho over 2 years ago

  • Related to action #47924: [qam][functional][y][fast] wrong patterns in installation_overview on SLE 15 added

#26 Updated by JERiveraMoya over 2 years ago

  • Status changed from Resolved to In Progress

#27 Updated by JERiveraMoya over 2 years ago

There is some change in behavior of the product, and probably not a break but re-opening the ticket just in case: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6732#issuecomment-463930644. It breaks indeed some qam test but it's about how the current change affects some previous "kind of wrong" needling. Recreating needles to fix it.

#29 Updated by JERiveraMoya over 2 years ago

  • Status changed from In Progress to Resolved

All breaks addressed.

#30 Updated by okurz over 2 years ago

latest ppc64le took only 1:24h in https://openqa.opensuse.org/tests/854705 . That's a great achievement.

$ curl -s https://openqa.opensuse.org/tests/853830/file/autoinst-log.txt | sed -n 's/^.*||| finished \([a-z_]*\).* (\([0-9]* s\))/\2 \1/p' | sort -n | tail -n 5
126 s select_patterns_and_packages
141 s bootloader
228 s force_scheduled_tasks
275 s logs_from_installation_system
590 s await_install

Now that I call an enhancement :)

Also available in: Atom PDF