Project

General

Profile

action #80274

coordination #55364: [epic] Let's make codecov reports reliable

Fix flaky coverage - t/lib/OpenQA/Test/Utils.pm

Added by okurz about 2 months ago. Updated 11 days ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2020-11-24
Due date:
2021-01-23
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

See #55364 : codecov reports often report about coverage changes which are obviously not related to the actual changes of a PR, e.g. when documentation is changed. We can already trust our coverage analysis more but should have only coverage changes reported for actual changes we introduced in a pull request.

Acceptance criteria

  • AC1: t/lib/OpenQA/Test/Utils.pm does not appear anymore as changing code coverage in unrelated changes

Suggestions

  • Try to reproduce locally with rm -rf cover_db/ && make coverage KEEP_DB=1 TESTS=t/05-scheduler-full.t SCHEDULER_FULLSTACK=1
  • check coverage details in generated html report, e.g. call firefox cover_db/coverage.html
  • Fix uncovered lines with "uncoverable" statements, see previous commits adding these comments or look into https://metacpan.org/pod/Devel::Cover#UNCOVERABLE-CRITERIA or other means
  • retry multiple times to check for flakyness

Related issues

Copied from openQA Project - action #80268: Fix flaky coverage - t/05-scheduler-full.tResolved2020-11-24

Copied to openQA Project - action #80298: Fix flaky coverage - lib/OpenQA/WebSockets.pmResolved2020-11-24

History

#1 Updated by okurz about 2 months ago

  • Copied from action #80268: Fix flaky coverage - t/05-scheduler-full.t added

#2 Updated by okurz about 2 months ago

  • Status changed from In Progress to Workable
  • Assignee deleted (okurz)

reports like https://app.codecov.io/gh/os-autoinst/openQA/compare/3567/changes look like we are missing coverage reports from some processes. Check if we need to add something like Devel::Cover::report() if Devel::Cover->can('report'); in some more cases.

#3 Updated by okurz about 2 months ago

  • Copied to action #80298: Fix flaky coverage - lib/OpenQA/WebSockets.pm added

#4 Updated by okurz about 2 months ago

https://codecov.io/gh/os-autoinst/openQA/pull/3606/changes shows more unreliable coverage changes:

In t/lib/OpenQA/Test/Utils.pm the method sub setup_worker shows completely missing coverage now and sub unstable_worker partial coverage:

sub setup_worker {
    my ($worker, $host) = @_;
    $worker->settings->webui_hosts([]);
    $worker->settings->webui_host_specific_settings({});
    push(@{$worker->settings->webui_hosts}, $host);
    $worker->settings->webui_host_specific_settings->{$host} = {};
    $worker->log_setup_info;
…
sub unstable_worker {
…
        _setup_sub_process 'openqa-worker-unstable';
        my $worker = OpenQA::Worker->new(
        setup_worker($worker, $host);
        $worker->init();
        if ($ticks < 0) {
            Mojo::IOLoop->singleton->one_tick for (0 .. $ticks);
        Devel::Cover::report() if Devel::Cover->can('report');

so maybe the method was called but we could not successfully finish the last line which could explain the missing report.

#5 Updated by okurz about 1 month ago

  • Description updated (diff)

#6 Updated by Xiaojing_liu 20 days ago

I could reproduce this in my local. But I am not sure how to fix it. Could we simply use "uncoverable" to the flaky part, such as sub setup_worker and unstable_worker? or we should find out why sometimes these codes are not covered? I did not find out why this happened.

#7 Updated by cdywan 19 days ago

Commands used:

make coverage SCHEDULER_FULLSTACK=1 TESTS=t/05-scheduler-full.t | grep '^t/05-scheduler-full'
DIE_ON_FAIL=1 make test-with-database SCHEDULER_FULLSTACK=1 TESTS=t/05-scheduler-full.t

So I think the problem here is that the sigchild handler never completes. A simple note I added wasn't even showing. There's existing code checking if the worker's available via the Workers table but that's not useful for this purpose. The function at best makes it to the initial IOLoop invocation, before it gets to the explicit Devel::Cover::report call.
Having written down all of that I realize this is with $ticks=-1 which means the IOLoop blocks forever...🤦 Always forcing n ticks doesn't seem to help much, but moving up the report call at least gets coverage up to 55.4%.

https://github.com/os-autoinst/openQA/pull/3655 has some of my attempts at improving this, and although I'm not making a lot of headway here maybe someone else will have more ideas

#8 Updated by cdywan 19 days ago

  • Description updated (diff)

#9 Updated by cdywan 16 days ago

  • Status changed from Workable to In Progress
  • Assignee set to cdywan

I should probably take this ticket

#10 Updated by openqa_review 11 days ago

  • Due date set to 2021-01-23

Setting due date based on mean cycle time of SUSE QE Tools

Also available in: Atom PDF