Project

General

Profile

Actions

action #156067

closed

[alert] test fails in setup_multimachine

Added by jbaier_cz 10 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Bugs in existing tests
Start date:
2024-02-26
Due date:
2024-03-12
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

This job is scheduled via openqa-schedule-mm-ping-test, see #156052

openQA test in scenario sle-15-SP5-Server-DVD-Updates-x86_64-ping_client@64bit fails in
setup_multimachine

Test died: ping with packet size 100 failed, problems with MTU size are expected. If it is multi-machine job, it can be GRE tunnel setup issue. at sle/lib/utils.pm line 1943.
    utils::script_retry("ping -M do -s 100 -c 1 server", "retry", 3, "delay", 5, "fail_message", "ping with packet size 100 failed, problems with MTU size are "...) called at sle/lib/utils.pm line 2910
    utils::ping_size_check("server") called at sle/tests/network/setup_multimachine.pm line 57

Reproducible

Fails since (at least) Build 2024-02-26T10:08+00:00

Expected result

Last good: 2024-02-26T09:11+00:00 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 2 (0 open2 closed)

Related to openQA Project (public) - action #156052: [alert] Scripts CI pipeline failing after logging multiple Job state of job ID 13603796: running, waiting size:SResolvedmkittler2024-02-262024-03-13

Actions
Related to openQA Project (public) - action #155170: [openqa-in-openqa] [sporadic] test fails in test_running: parallel_failed size:MResolvedybonatakis2024-02-082024-02-29

Actions
Actions #1

Updated by jbaier_cz 10 months ago

  • Related to action #156052: [alert] Scripts CI pipeline failing after logging multiple Job state of job ID 13603796: running, waiting size:S added
Actions #2

Updated by mkittler 10 months ago

  • Status changed from New to In Progress
  • Assignee set to mkittler
Actions #3

Updated by mkittler 10 months ago

  • Status changed from In Progress to Feedback

The network is completely unreachable within the SUT: ping: connect: Network is unreachable

The most likely culprit is https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18713 which was merged 6 hours ago. The first failure is also from 6 hours ago. The last good still shows the root console and the first bad already shows the use of the serial console.

I'll reopen #155170 as this should supposedly be handled as part of the original ticket introducing the change.

Actions #4

Updated by mkittler 10 months ago

  • Related to action #155170: [openqa-in-openqa] [sporadic] test fails in test_running: parallel_failed size:M added
Actions #6

Updated by mkittler 10 months ago

  • Status changed from Feedback to In Progress
Actions #7

Updated by openqa_review 10 months ago

  • Due date set to 2024-03-12

Setting due date based on mean cycle time of SUSE QE Tools

Actions #8

Updated by mkittler 10 months ago

  • Status changed from In Progress to Resolved

The scenario looks good again and Antonios mentioned in the chat that he restarted relevant jobs.

Actions #9

Updated by livdywan 10 months ago

  • Status changed from Resolved to Workable
Actions #10

Updated by mkittler 10 months ago

  • Status changed from Workable to Resolved

This issue was about a concrete regression caused by https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/18713.

The new issue is something different. It seems like a sporadic issue considering all newer jobs are passing. Also judging by the logs it looks like a sporadic networking issue (which is a known problem for our MM jobs). The fail ratio (in that scenario) doesn't seem high enough to investigate further but if you nevertheless want to investigate I suggest we create a new ticket.

Actions

Also available in: Atom PDF