action #18144
[tools] restart ipmi management controller before every ipmi job
100%
Description
ipmi cards are unreliable, but seem to be quite reliable if you restart them
before running a test, starting the card with "ipmitool mc reset hot" would be a sensible option to keep them reliable
Subtasks
Related issues
History
#1
Updated by okurz about 6 years ago
- Related to action #17816: [functional][u][ipmi] Error: Unable to establish IPMI v2 / RMCP+ session added
#2
Updated by okurz about 6 years ago
- Related to action #13914: [qe-core][functional][ipmi] wait_serial does not get expected output because ipmi console connection is closed added
#3
Updated by RBrownSUSE about 6 years ago
- Assignee set to RBrownSUSE
- Priority changed from Normal to Urgent
- Target version set to Milestone 7
Evaluated as important for Milestone 7
#4
Updated by RBrownSUSE about 6 years ago
Still investigating, https://github.com/os-autoinst/os-autoinst/pull/767 should let os-autoinst get us more helpful info
#5
Updated by okurz about 6 years ago
the PR should be in. Is https://openqa.suse.de/tests/901944#step/textinfo/3 related to that maybe?
#6
Updated by RBrownSUSE about 6 years ago
the PR should be in
No, who are you to say what should or shouldn't be in?
Is https://openqa.suse.de/tests/901944#step/textinfo/3 related to that maybe?
No, but thank you for your opinion
#7
Updated by okurz about 6 years ago
RBrownSUSE wrote:
the PR should be in
No, who are you to say what should or shouldn't be in?
"should be in" as in: "Hm, depending on what I heard when the installation was last deployed the content of the pull request is probably already effective on the corresponding workers."
#8
Updated by SLindoMansilla about 6 years ago
It also happens here: https://openqa.suse.de/tests/910032#step/textinfo/3
#9
Updated by coolo about 6 years ago
Could you guys please stop abusing this issue? I already removed okurz's tag - see #6
#10
Updated by RBrownSUSE about 6 years ago
- Status changed from New to Resolved
Serial disconnect issues resolved by https://github.com/os-autoinst/os-autoinst/pull/777
Serial keep alive workaround removed as no longer beneficial with auto reconnect
No iKVM issues reported since regular nightly restart of the card, so no evidence that the whole mc controller needs to be restarted on every job