action #163745: [tools] tests on worker31 time out on yast2 firewall services add zone=EXT service=service:target - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

Copy link

action #163745

closed

[tools] tests on worker31 time out on yast2 firewall services add zone=EXT service=service:target

Added by livdywan 5 months ago. Updated 4 months ago.

Status:

Resolved

Priority:

High

Assignee:

okurz

Category:

Regressions/Crashes

Target version:

openQA Project (public) - Ready

Start date:

2024-07-11

Due date:

% Done:

Estimated time:

Tags:

infra, worker31

Description

Observation¶

openQA test in scenario sle-15-SP4-Server-DVD-HA-Incidents-x86_64-qam_ha_priorityfencing_supportserver@64bit fails in
setup

Also https://openqa.suse.de/tests/14884549#step/setup/43

Test suite description¶

The base test suite is used for job templates defined in YAML documents. It has no settings of its own.

Reproducible¶

Fails since (at least) Build :32329:nftables (current job)

Expected result¶

Last good: :34692:apache2 (or more recent)

Further details¶

Always latest result in this scenario: latest

Mitigations¶

Following Take machines out of salt-controlled production:
- sudo salt-key -y -d worker31.oqa.prg2.suse.org
- sudo systemctl disable --now telegraf $(systemctl list-units | gr ep openqa-worker-auto-restart | cut -d . -f 1 | xargs)

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by livdywan 5 months ago

Priority changed from Normal to High

I suppose this should be High as worker31 is effectively not usable right now.

Actions

Copy link

Updated by slo-gin 4 months ago

This ticket was set to High priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions

Copy link

Updated by livdywan 4 months ago

Related to action #162293: SMART errors on bootup of worker31, worker32 and worker34 size:M added

Actions

Copy link

Updated by livdywan 4 months ago

#162293 is about a different issue, but since both remove worker31 from salt as a mitigation I'm linking them for visibility

Actions

Copy link

Updated by okurz 4 months ago

Tags set to infra, worker31
Project changed from openQA Tests (public) to openQA Infrastructure (public)
Subject changed from tests on worker31 time out on yast2 firewall services add zone=EXT service=service:target to [tools] tests on worker31 time out on yast2 firewall services add zone=EXT service=service:target
Category changed from Bugs in existing tests to Regressions/Crashes
Target version set to Ready

Actions

Copy link

Updated by okurz 4 months ago

Status changed from New to Resolved
Assignee set to okurz

https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-HA-Incidents&machine=64bit&test=qam_ha_priorityfencing_supportserver&version=15-SP4#next_previous shows more than 100 jobs passing so we assume that worker31 should not even be in production due to #162293 at the time of the failure. Maybe the machine was partially up and destroying jobs unnoticed leading the reported error. anyway we are good by now.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #163745

[tools] tests on worker31 time out on yast2 firewall services add zone=EXT service=service:target

Observation¶

Test suite description¶

Reproducible¶

Expected result¶

Further details¶

Mitigations¶

Updated by livdywan 5 months ago

Updated by slo-gin 4 months ago

Updated by livdywan 4 months ago

Updated by livdywan 4 months ago

Updated by okurz 4 months ago

Updated by okurz 4 months ago