Project

General

Profile

action #20134

[aarch64] openQA takes >9s to match needles (slow arm worker?)

Added by algraf about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Support
Target version:
-
Start date:
2017-06-28
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

While trying to figure out why the grub2 snapshot test fails, we realized that OpenQA needs a whopping 9.3s to match whether we're inside the grub menu. Given that grub only has a 10s timeout, we're already way past the point of influencing anything by the time we want to type a key.

Every variant only takes <1s to match, but the grub screen has about 10 different possible needle variants. These matches all happen sequentially, so in sum we need almost 10s. If we were running these in parallel, we could easily cut that down to 1-2s, giving us enough time to react on the matching result.

Given that we have 48 or even 96 (slow) cores on these systems, parallelizing the matching is the only reasonable thing to do.

I quickly talked to Santiago about it and he indicated that he'll take a look at parallelizing OpenCV matching.

History

#2 Updated by coolo about 4 years ago

  • Assignee deleted (szarate)

There are about 20 things that need to sorted before that. Might be important for you, that doesn't mean you should 'talk to Santiago'

#3 Updated by okurz about 4 years ago

If you have too many needles for the grub screen to match make sure you have an up-to-date needles and tests repo as I invested some time to make that faster by at least not needing to check too many needles. URLs to failures please.

#4 Updated by algraf about 4 years ago

okurz wrote:

If you have too many needles for the grub screen to match make sure you have an up-to-date needles and tests repo as I invested some time to make that faster by at least not needing to check too many needles. URLs to failures please.

Sure, I can definitely give you a URL:

http://c166.arch.suse.de/tests/5216#step/grub_test/14

This is on the local test node that Nick gave me - I don't know how up to date the needles are on there. But it seems like the reboot timeouts in gnome are also related to it.

#5 Updated by okurz about 4 years ago

  • Status changed from New to In Progress

From the test I can see there are many unrelated needles (jeos, sap, …). That is the state of 2 months ago. Definitely suggested to update the git repos which are probably in /var/lib/openqa/share/tests/sle/ and /var/lib/openqa/share/tests/sle/products/sle/needles/ . That should help a lot.

#6 Updated by coolo about 4 years ago

But opencv is definitely very slow on this hardware - no suprise, taking that it's intel's software :)

tinycv::search_needle 164x219x153 0.557

#7 Updated by algraf about 4 years ago

okurz wrote:

From the test I can see there are many unrelated needles (jeos, sap, …). That is the state of 2 months ago. Definitely suggested to update the git repos which are probably in /var/lib/openqa/share/tests/sle/ and /var/lib/openqa/share/tests/sle/products/sle/needles/ . That should help a lot.

With the update, I'm still seeing issues here for example: http://c166.arch.suse.de/tests/5247#step/addon_products_sle/5

16:50:41.6866 3953 WARNING: check_asserted_screen took 17.05 seconds - make your needles more specific

#8 Updated by okurz about 4 years ago

  • Subject changed from OpenQA takes >9s to match needles to [aarch64] openQA takes >9s to match needles (slow arm worker?)
  • Category changed from 132 to Support
  • Status changed from In Progress to Feedback
  • Assignee set to okurz

yes, this is now back to "nothing wrong with the test or needles" but something about the host. We have this "stall detected" very seldomly on our aarch64 or ppc64le workers. Retrying works for "us". If it happens everytime then the hardware is hindered in its performance.

#9 Updated by okurz about 4 years ago

  • Status changed from Feedback to Resolved

I guess resolved as far as this "support ticket" goes

Also available in: Atom PDF