Project

General

Profile

coordination #55364

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

[epic] Let's make codecov reports reliable

Added by okurz almost 2 years ago. Updated 3 days ago.

Status:
Blocked
Priority:
Normal
Assignee:
Category:
Organisational
Target version:
Start date:
2020-09-24
Due date:
2021-08-10
% Done:

86%

Estimated time:
(Total: 0.00 h)
Difficulty:
hard

Description

Observation

codecov reports often report about coverage changes which are obviously not related to the actual changes of a PR, e.g. when documentation is changed or as in https://github.com/os-autoinst/openQA/pull/2253#issuecomment-520042072 . It seems some of our tests – maybe the full stack test – introduced a flakyness in coverage.

Suggestions

We should try to make the reports more reliable and hence less annoying with big red color when it is not helpful.
Let's try to collect our observations here:

Other ideas for improving

  • Mark individual sections as required to have 100% code coverage (how to do that?)
  • Aim for 100% statement coverage at least
  • Include coverage for javascript
  • Check coverage for perl template code

Subtasks

action #71857: flaky/unstable/sporadic test coverage from t/34-developer_mode-unit.tResolvedcdywan

action #78043: unstable/flaky/inconsistent statement coverage in t/lib/OpenQA/SeleniumTestResolvedokurz

action #80268: Fix flaky coverage - t/05-scheduler-full.tResolvedokurz

action #80274: Fix flaky coverage - t/lib/OpenQA/Test/Utils.pm size:MFeedbackkraih

action #80298: Fix flaky coverage - lib/OpenQA/WebSockets.pmResolvedokurz

action #88594: codecov reports for individual files yield "forbidden" and files can not be shownResolvedokurz

action #89899: Fix flaky coverage - t/ui/27-plugin_obs_rsync_status_details.tResolvedkraih


Related issues

Related to openQA Project - action #57050: Turn off travisResolved2019-09-30

Copied to openQA Project - action #55643: Prevent duplicate codecov reportsResolved2019-08-12

History

#1 Updated by okurz almost 2 years ago

  • Description updated (diff)

#2 Updated by okurz almost 2 years ago

  • Description updated (diff)

#3 Updated by okurz almost 2 years ago

It also seems like we are getting duplicate codecov reports which I understand can happen if both an old bot are activated along with the github app? As the bot runs under the account "coolo" we can look into this with him.

#4 Updated by okurz almost 2 years ago

  • Copied to action #55643: Prevent duplicate codecov reports added

#5 Updated by andriinikitin almost 2 years ago

I have strong impression that this happens when PR is not rebased to latest master.
I.e. codecov compares coverage stats in PR with current coverage on master, which will look incorrect if base of PR is behind several commits

#6 Updated by okurz almost 2 years ago

yes, I thought the same but it still seems as if the "usual suspects" crop up, e.g. lib/OpenQA/Worker/Settings.pm even though we hardly ever touch it.

#7 Updated by okurz over 1 year ago

#8 Updated by okurz over 1 year ago

andriinikitin I assume this is closely related to your work on circle CI setup. Can you comment on what are your recent experiences regarding codecov reliability?

#9 Updated by andriinikitin over 1 year ago

okurz wrote:

andriinikitin I assume this is closely related to your work on circle CI setup. Can you comment on what are your recent experiences regarding codecov reliability?

I believe few lines are unstable, but overall I don't see it as a big problem.

#10 Updated by okurz over 1 year ago

  • Status changed from New to Resolved
  • Assignee set to andriinikitin

andriinikitin thanks for your assessment. I personally trust coverage reports for openQA as well as os-autoinst lately so I guess we can call this done, probably mainly thanks to andriinikitin, hence assigning to him.

#11 Updated by okurz over 1 year ago

  • Status changed from Resolved to Workable
  • Assignee deleted (andriinikitin)

Actually I found out that recent pull requests seem to report often the same flaky, changing coverage diff, e.g. https://github.com/os-autoinst/openQA/pull/2782#issuecomment-591963100 :

  • lib/OpenQA/Worker/Settings.pm 84.9% (-13.21%)
  • lib/OpenQA/Utils.pm 96.84% (-0.23%)

-> https://codecov.io/gh/os-autoinst/openQA/pull/2782/changes

https://github.com/os-autoinst/openQA/pull/2783#issuecomment-592008196 :

  • lib/OpenQA/Worker/Settings.pm 84.9% (-13.21%)
  • lib/OpenQA/Scheduler/Model/Jobs.pm 91.49% (+2.83%)

-> https://codecov.io/gh/os-autoinst/openQA/pull/2783/changes

that is
https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker/Settings.pm#L100
which should have been "uncoverable".

and https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Scheduler/Model/Jobs.pm#L305 and following.

and https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Utils.pm#L807

I guess for the latest at least it depends on the full stack test if that part is actually executed based on timing behaviour. Probably a more level or unit test to cover the lines explicitly can help.

#12 Updated by okurz over 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

https://github.com/os-autoinst/openQA/pull/2813 to cover https://codecov.io/gh/os-autoinst/openQA/pull/2765/changes about lib/OpenQA/Worker/Job.pm:315 which wasn't even found by me earlier.

#13 Updated by okurz over 1 year ago

https://github.com/os-autoinst/openQA/pull/2826 to cover uncovered parts of _pick_siblings_of_running in lib/OpenQA/Scheduler/Model/Jobs.pm

#14 Updated by okurz over 1 year ago

  • Status changed from In Progress to Feedback

PR merged, next stop: https://codecov.io/gh/os-autoinst/openQA/pull/2811/changes with lib/OpenQA/Worker/WebUIConnection.pm and lib/OpenQA/WebAPI/Controller/API/V1/Job.pm

But I could also wait for more results from next pull requests.

#15 Updated by andriinikitin over 1 year ago

Not sure if it is obvious, but will mention anyway: my understanding is that there is some race condition on end of some of test(s), when Worker shutdown doesn't always happen in time, so codecov stats don't always reported before test ends.

#16 Updated by okurz over 1 year ago

yes, good observation. What you mentioned is one option but I think the full stack tests that currently shows the coverage problems is doing the right thing on top level and actually pretty stable. On lower level we have different code paths that are there to provide robustness for different timing and network behaviour so I think we are better off covering these code branches with explicit low-level tests. As an alternative we could also run the full stack tests "often enough", e.g. with make variables STABILITY_TEST=1 RETRY=3 in the hope we at least once cover all lines but that still seems risky.

https://github.com/os-autoinst/openQA/pull/2815 is another PR, merged.

https://github.com/os-autoinst/openQA/pull/2835 was another PR for more coverage that is now merged. Let's see what the next PRs will tell.

#17 Updated by mkittler over 1 year ago

  • Description updated (diff)

#18 Updated by cdywan about 1 year ago

I gather I'm currently improving coverage in that area.

#19 Updated by okurz about 1 year ago

  • Status changed from Feedback to Workable
  • Assignee deleted (okurz)
  • Target version set to Ready

The unreliability has decreased but there are still some areas that seem to report different results from time to time, e.g. see https://github.com/os-autoinst/openQA/pull/3142#issuecomment-656651494 mentioning lib/OpenQA/WebAPI/Plugin/Helpers.pm despite the PR having clearly no related changes. https://github.com/os-autoinst/openQA/pull/2897 is still the pending change planned to increase coverage.

#20 Updated by okurz 11 months ago

  • Description updated (diff)

#21 Updated by okurz 10 months ago

  • Description updated (diff)

#22 Updated by mkittler 10 months ago

Seems like https://github.com/os-autoinst/openQA/pull/3399 didn't make this easier. Now we also see changes in code coverage within the test code itself, e.g. https://codecov.io/gh/os-autoinst/openQA/pull/3423/changes.

Loops like

    for (0 .. 5) {
        $ret = $driver->find_element($element)->get_text();
        return $ret if &$test_break($ret);
        sleep .1;
    }

within our test code are hard to make stable. The sleep may or may not be executed depending on the timing.

#23 Updated by okurz 10 months ago

I do not think the inclusion of test code is a problem but actually helps. We should be able to specify individual lines with "uncoverable statements" and introduce reliable coverage for all other places. And unreliable test code can likely be linked to unreliable production code coverage.

#24 Updated by okurz 10 months ago

  • Subject changed from Let's make codecov reports reliable to [epic] Let's make codecov reports reliable
  • Status changed from Workable to Blocked
  • Assignee set to okurz
  • Difficulty set to hard

Forced cdywan to discuss this with me ;) We agreed that it's actually not "Workable" for him and I should rework this, e.g. as an epic. There is already a subtask. That is not really the only task but we are not progressing anyway so I will block this epic based on the subtask.

#25 Updated by szarate 10 months ago

  • Tracker changed from action to coordination
  • Status changed from Blocked to New
  • Difficulty deleted (hard)

#27 Updated by okurz 10 months ago

  • Status changed from New to Blocked
  • Difficulty set to hard

#28 Updated by okurz 8 months ago

I have looked into recent codecov reports on https://app.codecov.io/gh/os-autoinst/openQA/pulls?page=1&state=open&order=-pullid and often I found two files with unreliable coverage results: t/05-scheduler-full.t and t/lib/OpenQA/Test/Utils.pm , created two new subtasks.

There is also https://app.codecov.io/gh/os-autoinst/openQA/compare/3555/changes showing uncovered lines in t/lib/OpenQA/SeleniumTest.pm still which I addressed in #78043

https://codecov.io/gh/os-autoinst/openQA/pull/3467/changes shows in lib/OpenQA/WebSockets.pm

97        = "Unable to assign job to worker $worker_id: the worker has not established a ws connection";
98      return $result;

-> #80274

#29 Updated by okurz 8 months ago

  • Estimated time set to 39719.00 h

#30 Updated by okurz 8 months ago

  • Estimated time deleted (39719.00 h)

#31 Updated by okurz 5 months ago

  • Description updated (diff)

#32 Updated by okurz 3 months ago

  • Parent task set to #80142

Also available in: Atom PDF