Project

General

Profile

action #69346

flaky/unstable os-autoinst test "22-svirt.t"

Added by okurz 6 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2020-07-25
Due date:
% Done:

0%

Estimated time:
Difficulty:
easy

Description

Observation

https://travis-ci.org/github/os-autoinst/os-autoinst/builds/711450994#L1527 on master:

    #   Failed test 'Ensure run_ssh_cmd(keep_open => 0) uses a new SSH connection'
    #   at ./22-svirt.t line 192.
    #          got: '6'
    #     expected: '5'
    # Looks like you failed 1 test of 36.
#   Failed test 'SSH usage in svirt'
#   at ./22-svirt.t line 212.
# Looks like you failed 1 test of 8.

Acceptance criteria

  • AC1: test is stable locally and on travis CI

Suggestions

  • Call locally and see if problem reproduces (takes only a second to execute): for i in {1..100}; do prove -I. t/22-svirt.t || break; done
  • Apply fix either verifiable locally or also just based on travis CI results
  • Optional, if one can not reproduce it locally we could also exclude it from travis CI tests

History

#1 Updated by okurz 6 months ago

  • Description updated (diff)
  • Difficulty set to easy

#2 Updated by okurz 6 months ago

  • Description updated (diff)

#3 Updated by mkittler 6 months ago

  • Assignee set to mkittler

I could not reproduce the issue locally after 1000 runs. If it is a race condition it seems to be quite sticky to a certain outcome. Maybe a timeout is just set too low.

#4 Updated by mkittler 6 months ago

  • Assignee deleted (mkittler)

When reading the code I only noticed that the parameters for actual/expected are swapped: https://github.com/os-autoinst/os-autoinst/pull/1494

So there's actually one connection too less. This can be provoked by passing keep_open => 1 instead of => 0. I don't see any race conditions or timeouts within the code so I'm not sure how to fix this - especially since it can not be reproduced locally.

#5 Updated by okurz 6 months ago

  • Status changed from Workable to Resolved
  • Assignee set to okurz

Also while happening multiple times last week or so I have not seen that again. Glad you found at least something to fix :) I guess we can call it "Resolved" then because work has been done even though no actual "fix" was applied. I will know where to find the ticket if I see that again. Thanks.

#6 Updated by okurz 2 months ago

  • Status changed from Resolved to Workable
  • Assignee changed from okurz to cfconrad
  • Priority changed from High to Low
  • Target version changed from Ready to future

#7 Updated by cfconrad 2 months ago

  • Status changed from Workable to Feedback
  • Assignee changed from cfconrad to okurz

Can we give this change a try: https://github.com/os-autoinst/os-autoinst/pull/1568

okurz I guess you have better ways to monitor if the results are still sometimes fail, so I reassign to you. Feel free to throw back -- if it doesn't solve it.

#8 Updated by okurz 2 months ago

  • Target version changed from future to Ready

sure, can do. Your PR is still open and Martchus already commented on it with just tiny remarks. I assume you will still follow these changes. Then I am happy to monitor how it behaves.

#9 Updated by okurz about 2 months ago

  • Status changed from Feedback to Resolved

so the tests in the PR as well as in master were fine. As the original problem did not really happen often anyway I will just set this to "Resolved". We can hopefully find back to this ticket in case we see the test module failing again :)

Also available in: Atom PDF