Project

General

Profile

action #18144

[tools] restart ipmi management controller before every ipmi job

Added by RBrownSUSE over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2017-03-29
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Difficulty:

Description

ipmi cards are unreliable, but seem to be quite reliable if you restart them

before running a test, starting the card with "ipmitool mc reset hot" would be a sensible option to keep them reliable


Subtasks

action #18826: [tools] Investigate serial over lan disconnects for ipmiResolvedRBrownSUSE


Related issues

Related to openQA Infrastructure - action #17816: [functional][u][ipmi] Error: Unable to establish IPMI v2 / RMCP+ sessionRejected2017-03-20

Related to openQA Tests - action #13914: [qe-core][functional][ipmi] wait_serial does not get expected output because ipmi console connection is closed New2016-09-27

History

#1 Updated by okurz over 4 years ago

  • Related to action #17816: [functional][u][ipmi] Error: Unable to establish IPMI v2 / RMCP+ session added

#2 Updated by okurz over 4 years ago

  • Related to action #13914: [qe-core][functional][ipmi] wait_serial does not get expected output because ipmi console connection is closed added

#3 Updated by RBrownSUSE over 4 years ago

  • Assignee set to RBrownSUSE
  • Priority changed from Normal to Urgent
  • Target version set to Milestone 7

Evaluated as important for Milestone 7

#4 Updated by RBrownSUSE over 4 years ago

Still investigating, https://github.com/os-autoinst/os-autoinst/pull/767 should let os-autoinst get us more helpful info

#5 Updated by okurz over 4 years ago

the PR should be in. Is https://openqa.suse.de/tests/901944#step/textinfo/3 related to that maybe?

#6 Updated by RBrownSUSE over 4 years ago

the PR should be in

No, who are you to say what should or shouldn't be in?

Is https://openqa.suse.de/tests/901944#step/textinfo/3 related to that maybe?

No, but thank you for your opinion

#7 Updated by okurz over 4 years ago

RBrownSUSE wrote:

the PR should be in

No, who are you to say what should or shouldn't be in?

"should be in" as in: "Hm, depending on what I heard when the installation was last deployed the content of the pull request is probably already effective on the corresponding workers."

#9 Updated by coolo over 4 years ago

Could you guys please stop abusing this issue? I already removed okurz's tag - see #6

#10 Updated by RBrownSUSE over 4 years ago

  • Status changed from New to Resolved

Serial disconnect issues resolved by https://github.com/os-autoinst/os-autoinst/pull/777

Serial keep alive workaround removed as no longer beneficial with auto reconnect

No iKVM issues reported since regular nightly restart of the card, so no evidence that the whole mc controller needs to be restarted on every job

Also available in: Atom PDF