action #56822: Greyscaling of needle matches can produce false positives - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #56822

open

Greyscaling of needle matches can produce false positives

Added by AdamWill over 5 years ago. Updated over 4 years ago.

Status:

New

Priority:

Low

Assignee:

Category:

Regressions/Crashes

Target version:

QA (public) - future

Start date:

2019-09-11

Due date:

% Done:

Estimated time:

Description

So, check this out:

https://openqa.fedoraproject.org/tests/448886#step/disk_guided_empty/4

there's a needle match there that clearly isn't a match. The red text is completely different from the needle.

I'm pretty sure what's going on here is the color depth reduction and greyscaling that os-autoinst does when matching combine to make the background and the text the same color, so any text in that color will match the needle, as will no text at all, just the blue background (I've seen that happen in another test).

This kinda means we're stuck, beyond convincing the installer team to change the gradient or the text color or something - we just can't reliably create needles for this text. We've tried including or not including bits of the white keyboard layout indicator below the text to make this influence the image processing, but it doesn't seem to be reliable enough.

Actions

Copy link

Updated by coolo over 5 years ago

Red on Blue is your new thing, eh? :)

Actions

Copy link

Updated by AdamWill over 5 years ago

unicode U+1F941

(bug report: redmine chokes on unicode characters)

Actions

Copy link

Updated by coolo over 5 years ago

The only option I see is supporting 'HD needles' where we apply a different matching algorithm. E.g. we'd need to check the needles in RGB space, which means 3 times as much memory used (which might be acceptable) and 3 times as slow lookup (which would be painful for all needles). An alternative would be a cascading lookup - match the green channel, if it's close enough, match the others.

Stefan, I added you as watcher as you're about the only one who is into this code - what do you think?

Actions

Copy link

Updated by StefanBruens over 5 years ago

Another option may be specifying a custom color transform matrix. Currently it uses the transform from:

https://docs.opencv.org/4.1.0/de/d25/imgproc_color_conversions.html

Y ← 0.299⋅R + 0.587⋅G + 0.114⋅B

For the "red text on blue", this is the worst possible choice, almost any other coefficients keep some of the contrast, but that depends on the color scheme. E.g. extracting only the green channel (Y ← 0.0⋅R + 1.0⋅G + 0.0⋅B) would give bad results for openSUSE, as white/green/black is reduced to 2 levels (white/white/black).

The color transform could be per needle. This would incur some small additional cost (we may have to generate several "gray" versions of the current screen), but this is IMHO negligible compared with the 2D-DFT we run on each screen per needle.

Actions

Copy link

Updated by coolo over 5 years ago

While I accept this is is a valid option, I don't think it's user compatible ('are you more red or more green?'). If anything we would have to calculate a transformation out of the needle areas - but that sounds painful to do, next to the debugging fun of having configurable gray scales being calculated.

I can't really think of any backsides of checking G->R->B. We would get tons of false positives when looking for green channel only, true - but not in the cases that are important, white on black (and other gray levels) would look the same in green channel. And I would do this only on a boolean flag (or if the matching rate is > 99%, marking HD needles). But yes, in this case we'd even have 4 gray versions.

Or we double the fun for all and use a very different colour transformation if the standard matches. Duplicating effort for matching needles isn't as bad.

Actions

Copy link

Updated by okurz over 5 years ago

Hm, we could use a simple plain average

Y ← (R + G + B)/3 ~ 0.333⋅R + 0.333⋅G + 0.333⋅B

which would be more "fair" for computers to use but would appear more washed out to humans as the default opencv formula relies on humans physiological preference for green . Would this been an option?

On the other hand, I could not yet find the corresponding source code but we could use a proper desaturation as well which will generate "greyscale representations" that have higher contrast (for both computers and humans)

Actions

Copy link

Updated by StefanBruens over 5 years ago

Another cheap option is to keep the template matching in gray, but do the distance/error calculation in RGB - the position of the needle is correct, as there are enough elements which have to line up.

I think this is very often the case - we use colors for highlighting or denoting state. These are sensitive areas, where a slight mismatch often is critical.

Actions

Copy link

Updated by StefanBruens over 5 years ago

okurz wrote:

Hm, we could use a simple plain average

Y ← (R + G + B)/3 ~ 0.333⋅R + 0.333⋅G + 0.333⋅B

which would be more "fair" for computers to use but would appear more washed out to humans as the default opencv formula relies on humans physiological preference for green . Would this been an option?

On the other hand, I could not yet find the corresponding source code but we could use a proper desaturation as well which will generate "greyscale representations" that have higher contrast (for both computers and humans)

Does not work. When you go from 3 channels to 1, you have information loss. I tried the various desaturation (channel mixer) or component extraction options (HSL, HSV, YCbCr) in GIMP, no matter which one you choose there is always an area which completely looses contrast.

Actions

Copy link

Updated by coolo over 5 years ago

Right. You'd need an edge detection built into the grayscaling - while technically rather cheap (3x3 filter can do it), it will add details that aren't there in other cases.

The comment about doing the error calculation in RGB might help to avoid false positives, but as we grayscale needles too you can have very suprising results as it will match gray areas on gray areas and then it depends on where it found the 'best' it could skip valid matches.

And taking that we currently do the matching on a blurry 16 gray scale I wonder how many needles we'll miss :)

For reference:
What humans see: https://github.com/coolo/drunken-adventure/blob/master/osc2017-1/xterm-started-3.png
What openqa seees: https://github.com/coolo/drunken-adventure/blob/master/osc2017-1/xterm-started-4.png

I'm all to make openQA less blind, but we'd need to make the glasses sharper step by step to avoid a complete reneedling overnight :)

Actions

Copy link

#10

Updated by StefanBruens over 5 years ago

coolo wrote:

Right. You'd need an edge detection built into the grayscaling - while technically rather cheap (3x3 filter can do it), it will add details that aren't there in other cases.

The comment about doing the error calculation in RGB might help to avoid false positives, but as we grayscale needles too you can have very suprising results as it will match gray areas on gray areas and then it depends on where it found the 'best' it could skip valid matches.

The only case where it would discard a valid match is when the difference only exists in some color channels, but is this really a valid match? See Adams false positive case here.

If there are multiple needles which only differ before grayscale conversion, each search would return the correct position but would annotate the bad one with a low match quality.

If a needle has multiple candidate positions of the same quality in a scene, all but the nearest one (relative to the original position) are already discarded today.

And taking that we currently do the matching on a blurry 16 gray scale I wonder how many needles we'll miss :)

For reference:
What humans see: https://github.com/coolo/drunken-adventure/blob/master/osc2017-1/xterm-started-3.png
What openqa seees: https://github.com/coolo/drunken-adventure/blob/master/osc2017-1/xterm-started-4.png

I'm all to make openQA less blind, but we'd need to make the glasses sharper step by step to avoid a complete reneedling overnight :)

In case needle and scene match exactly, obviously there is not difference between comparing 1 or 3 channels.

Of course one can construct cases where the mixing and quantization suppresses some differences which are retained in the 3 channel case, but as long as as quantization is also applied to the individual channels the result is very similar. After all, blurring and difference are linear operations and the order of applying it to the gray channel or to each channel individually does not matter. The order of mixing and squaring only matters when the sign of the errors in the individual channels differ, i.e. changes in lightness are treated the same for both methods, only changes in hue/saturation become more relevant with RGB based error calculation.

I.e. the chance of introducing new false positives is exactly zero, the chance of introducing new false negatives is IMHO very low.

Actions

Copy link

#11

Updated by coolo over 5 years ago

We're talking about two different things. I have no doubt that your suggestion fixes the issue at hand and avoids false positives.

My thoughts are going beyond the problem at hand. In a scenario where we have to find needles in a blue on red scenario, we're basically looking for grey on grey - and the best matches matrix will be all ones()*X. And we don't look at each of them, so if we find the needle is random. No matter how you calculcate the delta.

And about introducing the change: So far we blur the grayscale image and do the delta on that. If we start calculating on RGB, I would also start comparing originals (it would also be faster as we can avoid blurring 2 more channels). And that will introduce false negatives as our needles are outdated but still match within the current algorithm.

Actions

Copy link

#12

Updated by okurz almost 5 years ago

Priority changed from Normal to Low

Actions

Copy link

#13

Updated by okurz over 4 years ago

Target version set to future

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #56822

Greyscaling of needle matches can produce false positives

Updated by coolo over 5 years ago

Updated by AdamWill over 5 years ago

Updated by coolo over 5 years ago

Updated by StefanBruens over 5 years ago

Updated by coolo over 5 years ago

Updated by okurz over 5 years ago

Updated by StefanBruens over 5 years ago

Updated by StefanBruens over 5 years ago

Updated by coolo over 5 years ago

Updated by StefanBruens over 5 years ago

Updated by coolo over 5 years ago

Updated by okurz almost 5 years ago

Updated by okurz over 4 years ago