Project

General

Profile

Actions

action #178576

open

coordination #161414: [epic] Improved salt based infrastructure management

Workers unresponsive in salt pipelines including openqa-piworker, sapworker1 and monitor size:S

Added by livdywan 22 days ago. Updated about 11 hours ago.

Status:
In Progress
Priority:
Urgent
Assignee:
Category:
Regressions/Crashes
Start date:
2025-03-07
Due date:
2025-04-04 (Due in 3 days)
% Done:

0%

Estimated time:

Description

Observation

See https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/3959809

openqa-piworker.qe.nue2.suse.org:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:

    salt-run jobs.lookup_jid 20250310104340880948
sapworker1.qe.nue2.suse.org:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:

    salt-run jobs.lookup_jid 20250310104340880948
monitor.qe.nue2.suse.org:
    Minion did not return. [No response]

Suggestions

  • This seems to be reproducible

Rollback actions

  • Add back to salt and production: diesel,petrol,monitor,sapworker1,openqa-piworker for i in diesel.qe.nue2.suse.org petrol.qe.nue2.suse.org monitor.qe.nue2.suse.org sapworker1.qe.nue2.suse.org openqa-piworker.qe.nue2.suse.org ; do sudo salt-key -y -a $i; done
  • Remove silence Systemd services from https://monitor.qa.suse.de/alerting/silences

Related issues 1 (1 open0 closed)

Copied to openQA Infrastructure (public) - action #179302: Better monitoring for correct MTU size limitsNew2025-03-07

Actions
Actions

Also available in: Atom PDF