action #99117

malbec 🍷️ is not reachable via ssh or ipmi

Added by cdywan 4 months ago. Updated 4 months ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:



  • alert [Alerting] malbec: host up alert triggered at 13:41 CEST today
  • ssh isn't responsive.
  • ipmi on says Unable to establish IPMI v2 / RMCP+ session
  • chassis reboot fails with Unable to establish IPMI v2 / RMCP+ session


ipmitool -4 -I lanplus -C 3 -H -P $password


#1 Updated by cdywan 4 months ago

  • Project changed from QA to openQA Infrastructure

#2 Updated by okurz 4 months ago

I can confirm that IPMI does not work. ipmi-fsp1-malbec.arch yields Error: Unable to establish IPMI v2 / RMCP+ session. I suggest to report an EngInfra ticket

#3 Updated by cdywan 4 months ago

  • Status changed from New to Feedback

Somehow malbec came back 3:38 CEST yesterday, and it looks fine

#4 Updated by okurz 4 months ago

  • Due date set to 2021-10-07
  • Assignee set to okurz
  • Priority changed from Urgent to High

please only use "Feedback" with assignee. Otherwise the tickets tends to stay around for ages. mgriessmeier mentioned that gschlotter from EngInfra mentioned that they had "undefined network issues" in the past days. Let's assume that was it.

I could login over ssh with ssh malbec.arch and verify that openQA tests are running. Also says it's fine, no malbec related alerts on, says that malbec is up and working on jobs. The history of , , , looks very much ok with exception of some incomplete jobs like stating an oddfully specific reason "Reason: api failure: Failed to register at - 503 response: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> 503 Service Unavailable Service Unavailable The server is temporarily unable to service your request due to maintenance downtime … ".

I called WORKER=malbec failed_since=2021-09-23 openqa-advanced-retrigger-jobs

to handle the incompletes on this host. This retriggered some tests and we should be good.

I wonder, can we better handle restarting jobs with such reason?

EDIT: Also IPMI does not work. Created EngInfra ticket for that:

#5 Updated by okurz 4 months ago

  • Status changed from Feedback to Blocked

#6 Updated by cdywan 4 months ago

okurz wrote:


I can't access that. I guess the automatic addition of team members still isn't working?

#7 Updated by okurz 4 months ago

yes, not working. And that's not even expected because the solution was only discussed for the automatically created tickets. I now added (which is now a user) over the "Share" button. And I have seen the email confirmation. I also created to ask about the general procedure about efficient workflows.

#8 Updated by okurz 4 months ago

  • Description updated (diff)
  • Due date changed from 2021-10-07 to 2021-10-21
  • Priority changed from High to Normal

gschlotter responded. Seems like IPv6 DNS resolution fails. Workaround that works:
ipmitool -4 -I lanplus -C 3 -H -P $password

Documented workaround in ticket description, lowering prio.

#9 Updated by okurz 4 months ago

  • Status changed from Blocked to Resolved was resolved. trenninger fixed the IPv4/IPv6 DNS entries in the arch network so now IPMI access to malbec works (again) over both ways. I confirmed. No further changes needed in salt pillars.

Also available in: Atom PDF