coordination #55364
closedcoordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
[epic] Let's make codecov reports reliable
Description
Observation¶
codecov reports often report about coverage changes which are obviously not related to the actual changes of a PR, e.g. when documentation is changed or as in https://github.com/os-autoinst/openQA/pull/2253#issuecomment-520042072 . It seems some of our tests – maybe the full stack test – introduced a flakyness in coverage.
Suggestions¶
We should try to make the reports more reliable and hence less annoying with big red color when it is not helpful.
Let's try to collect our observations here:
- https://github.com/os-autoinst/openQA/pull/2253#issuecomment-520042072
- lib/OpenQA/Worker/Settings.pm 84.9% (-13.21%) arrow_down
lib/OpenQA/Worker/WebUIConnection.pm 84.42% (+2.51%) arrow_up-> https://github.com/os-autoinst/openQA/pull/3718
- https://github.com/os-autoinst/openQA/pull/2135#issuecomment-506641541
- lib/OpenQA/WebSockets.pm 88.4% (-5.8%) arrow_down
- lib/OpenQA/Scheduler/Model/Jobs.pm 86.11% (-2.78%) arrow_down
- lib/OpenQA/Worker/Commands.pm 81.17% (-2.36%) arrow_down
- lib/OpenQA/Utils.pm 95.81% (-0.2%) arrow_down
- https://github.com/os-autoinst/openQA/pull/2161#issuecomment-513477665
- lib/OpenQA/Setup.pm 90.83% (ø) arrow_up
- lib/OpenQA/Worker/Settings.pm 84.31% (-13.73%) arrow_down
- lib/OpenQA/Worker/Engines/isotovideo.pm 94.66% (+0.48%) arrow_up
- https://github.com/os-autoinst/openQA/pull/2161#issuecomment-513477665
- lib/OpenQA/Setup.pm 90.83% (ø) arrow_up
- lib/OpenQA/Worker/Settings.pm 84.31% (-13.73%) arrow_down
- lib/OpenQA/Worker/Engines/isotovideo.pm 94.66% (+0.48%) arrow_up
- https://codecov.io/gh/os-autoinst/openQA/pull/2258/changes
- https://github.com/os-autoinst/openQA/pull/2920#issuecomment-611509627
OpenQA::Schema::Result::Jobs::done
is only invoked in some testruns with$args{newbuild}
- https://codecov.io/gh/os-autoinst/openQA/pull/3276/changes from https://github.com/os-autoinst/openQA/pull/3276#issuecomment-694707108
- t/lib/OpenQA/SeleniumTest.pm:371
- t/34-developer_mode-unit.t:100-102
Other ideas for improving¶
- Mark individual sections as required to have 100% code coverage (how to do that?)
- Aim for 100% statement coverage at least
- Include coverage for javascript
- Check coverage for perl template code
Updated by okurz over 5 years ago
It also seems like we are getting duplicate codecov reports which I understand can happen if both an old bot are activated along with the github app? As the bot runs under the account "coolo" we can look into this with him.
Updated by okurz over 5 years ago
- Copied to action #55643: Prevent duplicate codecov reports added
Updated by andriinikitin about 5 years ago
I have strong impression that this happens when PR is not rebased to latest master.
I.e. codecov compares coverage stats in PR with current coverage on master, which will look incorrect if base of PR is behind several commits
Updated by okurz about 5 years ago
yes, I thought the same but it still seems as if the "usual suspects" crop up, e.g. lib/OpenQA/Worker/Settings.pm
even though we hardly ever touch it.
Updated by okurz about 5 years ago
- Related to action #57050: Turn off travis added
Updated by okurz about 5 years ago
@andriinikitin I assume this is closely related to your work on circle CI setup. Can you comment on what are your recent experiences regarding codecov reliability?
Updated by andriinikitin almost 5 years ago
okurz wrote:
@andriinikitin I assume this is closely related to your work on circle CI setup. Can you comment on what are your recent experiences regarding codecov reliability?
I believe few lines are unstable, but overall I don't see it as a big problem.
Updated by okurz over 4 years ago
- Status changed from New to Resolved
- Assignee set to andriinikitin
@andriinikitin thanks for your assessment. I personally trust coverage reports for openQA as well as os-autoinst lately so I guess we can call this done, probably mainly thanks to andriinikitin, hence assigning to him.
Updated by okurz over 4 years ago
- Status changed from Resolved to Workable
- Assignee deleted (
andriinikitin)
Actually I found out that recent pull requests seem to report often the same flaky, changing coverage diff, e.g. https://github.com/os-autoinst/openQA/pull/2782#issuecomment-591963100 :
- lib/OpenQA/Worker/Settings.pm 84.9% (-13.21%)
- lib/OpenQA/Utils.pm 96.84% (-0.23%)
-> https://codecov.io/gh/os-autoinst/openQA/pull/2782/changes
https://github.com/os-autoinst/openQA/pull/2783#issuecomment-592008196 :
- lib/OpenQA/Worker/Settings.pm 84.9% (-13.21%)
- lib/OpenQA/Scheduler/Model/Jobs.pm 91.49% (+2.83%)
-> https://codecov.io/gh/os-autoinst/openQA/pull/2783/changes
that is
https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker/Settings.pm#L100
which should have been "uncoverable".
and https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Scheduler/Model/Jobs.pm#L305 and following.
and https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Utils.pm#L807
I guess for the latest at least it depends on the full stack test if that part is actually executed based on timing behaviour. Probably a more level or unit test to cover the lines explicitly can help.
Updated by okurz over 4 years ago
- Status changed from Workable to In Progress
- Assignee set to okurz
https://github.com/os-autoinst/openQA/pull/2813 to cover https://codecov.io/gh/os-autoinst/openQA/pull/2765/changes about lib/OpenQA/Worker/Job.pm:315 which wasn't even found by me earlier.
Updated by okurz over 4 years ago
https://github.com/os-autoinst/openQA/pull/2826 to cover uncovered parts of _pick_siblings_of_running in lib/OpenQA/Scheduler/Model/Jobs.pm
Updated by okurz over 4 years ago
- Status changed from In Progress to Feedback
PR merged, next stop: https://codecov.io/gh/os-autoinst/openQA/pull/2811/changes with lib/OpenQA/Worker/WebUIConnection.pm and lib/OpenQA/WebAPI/Controller/API/V1/Job.pm
But I could also wait for more results from next pull requests.
Updated by andriinikitin over 4 years ago
Not sure if it is obvious, but will mention anyway: my understanding is that there is some race condition on end of some of test(s), when Worker shutdown doesn't always happen in time, so codecov stats don't always reported before test ends.
Updated by okurz over 4 years ago
yes, good observation. What you mentioned is one option but I think the full stack tests that currently shows the coverage problems is doing the right thing on top level and actually pretty stable. On lower level we have different code paths that are there to provide robustness for different timing and network behaviour so I think we are better off covering these code branches with explicit low-level tests. As an alternative we could also run the full stack tests "often enough", e.g. with make variables STABILITY_TEST=1 RETRY=3
in the hope we at least once cover all lines but that still seems risky.
https://github.com/os-autoinst/openQA/pull/2815 is another PR, merged.
https://github.com/os-autoinst/openQA/pull/2835 was another PR for more coverage that is now merged. Let's see what the next PRs will tell.
Updated by livdywan over 4 years ago
- https://github.com/os-autoinst/openQA/pull/2920#issuecomment-611509627
OpenQA::Schema::Result::Jobs::done
is only invoked in some testruns with$args{newbuild}
I gather I'm currently improving coverage in that area.
Updated by okurz over 4 years ago
- Status changed from Feedback to Workable
- Assignee deleted (
okurz) - Target version set to Ready
The unreliability has decreased but there are still some areas that seem to report different results from time to time, e.g. see https://github.com/os-autoinst/openQA/pull/3142#issuecomment-656651494 mentioning lib/OpenQA/WebAPI/Plugin/Helpers.pm despite the PR having clearly no related changes. https://github.com/os-autoinst/openQA/pull/2897 is still the pending change planned to increase coverage.
Updated by mkittler about 4 years ago
Seems like https://github.com/os-autoinst/openQA/pull/3399 didn't make this easier. Now we also see changes in code coverage within the test code itself, e.g. https://codecov.io/gh/os-autoinst/openQA/pull/3423/changes.
Loops like
for (0 .. 5) {
$ret = $driver->find_element($element)->get_text();
return $ret if &$test_break($ret);
sleep .1;
}
within our test code are hard to make stable. The sleep
may or may not be executed depending on the timing.
Updated by okurz about 4 years ago
I do not think the inclusion of test code is a problem but actually helps. We should be able to specify individual lines with "uncoverable statements" and introduce reliable coverage for all other places. And unreliable test code can likely be linked to unreliable production code coverage.
Updated by okurz about 4 years ago
- Subject changed from Let's make codecov reports reliable to [epic] Let's make codecov reports reliable
- Status changed from Workable to Blocked
- Assignee set to okurz
- Difficulty set to hard
Forced cdywan to discuss this with me ;) We agreed that it's actually not "Workable" for him and I should rework this, e.g. as an epic. There is already a subtask. That is not really the only task but we are not progressing anyway so I will block this epic based on the subtask.
Updated by szarate about 4 years ago
- Tracker changed from action to coordination
- Status changed from Blocked to New
- Difficulty deleted (
hard)
Updated by szarate about 4 years ago
See for the reason of tracker change: http://mailman.suse.de/mailman/private/qa-sle/2020-October/002722.html
Updated by okurz about 4 years ago
- Status changed from New to Blocked
- Difficulty set to hard
Updated by okurz almost 4 years ago
I have looked into recent codecov reports on https://app.codecov.io/gh/os-autoinst/openQA/pulls?page=1&state=open&order=-pullid and often I found two files with unreliable coverage results: t/05-scheduler-full.t and t/lib/OpenQA/Test/Utils.pm , created two new subtasks.
There is also https://app.codecov.io/gh/os-autoinst/openQA/compare/3555/changes showing uncovered lines in t/lib/OpenQA/SeleniumTest.pm still which I addressed in #78043
https://codecov.io/gh/os-autoinst/openQA/pull/3467/changes shows in lib/OpenQA/WebSockets.pm
97 = "Unable to assign job to worker $worker_id: the worker has not established a ws connection";
98 return $result;
-> #80274
Updated by okurz about 3 years ago
- Status changed from Blocked to Resolved
I think we finally managed – at least for now – to have consistent, trustworthy codecov reports \o/