Project

General

Profile

Actions

coordination #55364

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

[epic] Let's make codecov reports reliable

Added by okurz about 5 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Organisational
Target version:
Start date:
2020-09-24
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)

Description

Observation

codecov reports often report about coverage changes which are obviously not related to the actual changes of a PR, e.g. when documentation is changed or as in https://github.com/os-autoinst/openQA/pull/2253#issuecomment-520042072 . It seems some of our tests – maybe the full stack test – introduced a flakyness in coverage.

Suggestions

We should try to make the reports more reliable and hence less annoying with big red color when it is not helpful.
Let's try to collect our observations here:

Other ideas for improving

  • Mark individual sections as required to have 100% code coverage (how to do that?)
  • Aim for 100% statement coverage at least
  • Include coverage for javascript
  • Check coverage for perl template code

Subtasks 7 (0 open7 closed)

action #71857: flaky/unstable/sporadic test coverage from t/34-developer_mode-unit.tResolvedlivdywan2020-09-24

Actions
action #78043: unstable/flaky/inconsistent statement coverage in t/lib/OpenQA/SeleniumTestResolvedokurz2020-11-16

Actions
action #80268: Fix flaky coverage - t/05-scheduler-full.tResolvedokurz2020-11-24

Actions
action #80274: Fix flaky coverage - t/lib/OpenQA/Test/Utils.pm size:MResolvedkraih2020-11-24

Actions
action #80298: Fix flaky coverage - lib/OpenQA/WebSockets.pmResolvedokurz2020-11-24

Actions
action #88594: codecov reports for individual files yield "forbidden" and files can not be shownResolvedokurz2021-02-15

Actions
action #89899: Fix flaky coverage - t/ui/27-plugin_obs_rsync_status_details.tResolvedkraih

Actions

Related issues 2 (0 open2 closed)

Related to openQA Project - action #57050: Turn off travisResolvedokurz2019-09-30

Actions
Copied to openQA Project - action #55643: Prevent duplicate codecov reportsResolvedokurz2019-08-12

Actions
Actions #1

Updated by okurz about 5 years ago

  • Description updated (diff)
Actions #2

Updated by okurz about 5 years ago

  • Description updated (diff)
Actions #3

Updated by okurz about 5 years ago

It also seems like we are getting duplicate codecov reports which I understand can happen if both an old bot are activated along with the github app? As the bot runs under the account "coolo" we can look into this with him.

Actions #4

Updated by okurz about 5 years ago

  • Copied to action #55643: Prevent duplicate codecov reports added
Actions #5

Updated by andriinikitin about 5 years ago

I have strong impression that this happens when PR is not rebased to latest master.
I.e. codecov compares coverage stats in PR with current coverage on master, which will look incorrect if base of PR is behind several commits

Actions #6

Updated by okurz about 5 years ago

yes, I thought the same but it still seems as if the "usual suspects" crop up, e.g. lib/OpenQA/Worker/Settings.pm even though we hardly ever touch it.

Actions #7

Updated by okurz almost 5 years ago

Actions #8

Updated by okurz almost 5 years ago

@andriinikitin I assume this is closely related to your work on circle CI setup. Can you comment on what are your recent experiences regarding codecov reliability?

Actions #9

Updated by andriinikitin almost 5 years ago

okurz wrote:

@andriinikitin I assume this is closely related to your work on circle CI setup. Can you comment on what are your recent experiences regarding codecov reliability?

I believe few lines are unstable, but overall I don't see it as a big problem.

Actions #10

Updated by okurz over 4 years ago

  • Status changed from New to Resolved
  • Assignee set to andriinikitin

@andriinikitin thanks for your assessment. I personally trust coverage reports for openQA as well as os-autoinst lately so I guess we can call this done, probably mainly thanks to andriinikitin, hence assigning to him.

Actions #11

Updated by okurz over 4 years ago

  • Status changed from Resolved to Workable
  • Assignee deleted (andriinikitin)

Actually I found out that recent pull requests seem to report often the same flaky, changing coverage diff, e.g. https://github.com/os-autoinst/openQA/pull/2782#issuecomment-591963100 :

  • lib/OpenQA/Worker/Settings.pm 84.9% (-13.21%)
  • lib/OpenQA/Utils.pm 96.84% (-0.23%)

-> https://codecov.io/gh/os-autoinst/openQA/pull/2782/changes

https://github.com/os-autoinst/openQA/pull/2783#issuecomment-592008196 :

  • lib/OpenQA/Worker/Settings.pm 84.9% (-13.21%)
  • lib/OpenQA/Scheduler/Model/Jobs.pm 91.49% (+2.83%)

-> https://codecov.io/gh/os-autoinst/openQA/pull/2783/changes

that is
https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker/Settings.pm#L100
which should have been "uncoverable".

and https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Scheduler/Model/Jobs.pm#L305 and following.

and https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Utils.pm#L807

I guess for the latest at least it depends on the full stack test if that part is actually executed based on timing behaviour. Probably a more level or unit test to cover the lines explicitly can help.

Actions #12

Updated by okurz over 4 years ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

https://github.com/os-autoinst/openQA/pull/2813 to cover https://codecov.io/gh/os-autoinst/openQA/pull/2765/changes about lib/OpenQA/Worker/Job.pm:315 which wasn't even found by me earlier.

Actions #13

Updated by okurz over 4 years ago

https://github.com/os-autoinst/openQA/pull/2826 to cover uncovered parts of _pick_siblings_of_running in lib/OpenQA/Scheduler/Model/Jobs.pm

Actions #14

Updated by okurz over 4 years ago

  • Status changed from In Progress to Feedback

PR merged, next stop: https://codecov.io/gh/os-autoinst/openQA/pull/2811/changes with lib/OpenQA/Worker/WebUIConnection.pm and lib/OpenQA/WebAPI/Controller/API/V1/Job.pm

But I could also wait for more results from next pull requests.

Actions #15

Updated by andriinikitin over 4 years ago

Not sure if it is obvious, but will mention anyway: my understanding is that there is some race condition on end of some of test(s), when Worker shutdown doesn't always happen in time, so codecov stats don't always reported before test ends.

Actions #16

Updated by okurz over 4 years ago

yes, good observation. What you mentioned is one option but I think the full stack tests that currently shows the coverage problems is doing the right thing on top level and actually pretty stable. On lower level we have different code paths that are there to provide robustness for different timing and network behaviour so I think we are better off covering these code branches with explicit low-level tests. As an alternative we could also run the full stack tests "often enough", e.g. with make variables STABILITY_TEST=1 RETRY=3 in the hope we at least once cover all lines but that still seems risky.

https://github.com/os-autoinst/openQA/pull/2815 is another PR, merged.

https://github.com/os-autoinst/openQA/pull/2835 was another PR for more coverage that is now merged. Let's see what the next PRs will tell.

Actions #17

Updated by mkittler over 4 years ago

  • Description updated (diff)
Actions #18

Updated by livdywan over 4 years ago

I gather I'm currently improving coverage in that area.

Actions #19

Updated by okurz about 4 years ago

  • Status changed from Feedback to Workable
  • Assignee deleted (okurz)
  • Target version set to Ready

The unreliability has decreased but there are still some areas that seem to report different results from time to time, e.g. see https://github.com/os-autoinst/openQA/pull/3142#issuecomment-656651494 mentioning lib/OpenQA/WebAPI/Plugin/Helpers.pm despite the PR having clearly no related changes. https://github.com/os-autoinst/openQA/pull/2897 is still the pending change planned to increase coverage.

Actions #20

Updated by okurz almost 4 years ago

  • Description updated (diff)
Actions #21

Updated by okurz almost 4 years ago

  • Description updated (diff)
Actions #22

Updated by mkittler almost 4 years ago

Seems like https://github.com/os-autoinst/openQA/pull/3399 didn't make this easier. Now we also see changes in code coverage within the test code itself, e.g. https://codecov.io/gh/os-autoinst/openQA/pull/3423/changes.

Loops like

    for (0 .. 5) {
        $ret = $driver->find_element($element)->get_text();
        return $ret if &$test_break($ret);
        sleep .1;
    }

within our test code are hard to make stable. The sleep may or may not be executed depending on the timing.

Actions #23

Updated by okurz almost 4 years ago

I do not think the inclusion of test code is a problem but actually helps. We should be able to specify individual lines with "uncoverable statements" and introduce reliable coverage for all other places. And unreliable test code can likely be linked to unreliable production code coverage.

Actions #24

Updated by okurz almost 4 years ago

  • Subject changed from Let's make codecov reports reliable to [epic] Let's make codecov reports reliable
  • Status changed from Workable to Blocked
  • Assignee set to okurz
  • Difficulty set to hard

Forced cdywan to discuss this with me ;) We agreed that it's actually not "Workable" for him and I should rework this, e.g. as an epic. There is already a subtask. That is not really the only task but we are not progressing anyway so I will block this epic based on the subtask.

Actions #25

Updated by szarate almost 4 years ago

  • Tracker changed from action to coordination
  • Status changed from Blocked to New
  • Difficulty deleted (hard)
Actions #27

Updated by okurz almost 4 years ago

  • Status changed from New to Blocked
  • Difficulty set to hard
Actions #28

Updated by okurz almost 4 years ago

I have looked into recent codecov reports on https://app.codecov.io/gh/os-autoinst/openQA/pulls?page=1&state=open&order=-pullid and often I found two files with unreliable coverage results: t/05-scheduler-full.t and t/lib/OpenQA/Test/Utils.pm , created two new subtasks.

There is also https://app.codecov.io/gh/os-autoinst/openQA/compare/3555/changes showing uncovered lines in t/lib/OpenQA/SeleniumTest.pm still which I addressed in #78043

https://codecov.io/gh/os-autoinst/openQA/pull/3467/changes shows in lib/OpenQA/WebSockets.pm

97        = "Unable to assign job to worker $worker_id: the worker has not established a ws connection";
98      return $result;

-> #80274

Actions #29

Updated by okurz almost 4 years ago

  • Estimated time set to 39719.00 h
Actions #30

Updated by okurz almost 4 years ago

  • Estimated time deleted (39719.00 h)
Actions #31

Updated by okurz over 3 years ago

  • Description updated (diff)
Actions #32

Updated by okurz over 3 years ago

  • Parent task set to #80142
Actions #33

Updated by okurz about 3 years ago

  • Status changed from Blocked to Resolved

I think we finally managed – at least for now – to have consistent, trustworthy codecov reports \o/

Actions

Also available in: Atom PDF