Project

General

Profile

action #23368

ipmi worker openqaw1:4 has issue to connect to the ipmi machine.

Added by xlai almost 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Infrastructure
Target version:
-
Start date:
2017-08-15
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

The worker's bounded SUT name can not be translated to IP.

Key log:
DIE ipmitool -I lanplus -H openqa4-sp.qa.suse.de -U admin -P qatesting mc guid: Address lookup for openqa4-sp.qa.suse.de failed
Could not open socket!
Error: Unable to establish IPMI v2 / RMCP+ session at /usr/lib/os-autoinst/backend/ipmi.pm line 62.

Job link:
https://openqa.suse.de/tests/1108863


Related issues

Related to openQA Tests - action #23514: [labs][64bit-ipmi_debug worker] SLE15 shows interface selection (because 2 NICs are connected?)Resolved2017-08-22

History

#1 Updated by okurz almost 5 years ago

  • Category set to Infrastructure
  • Status changed from New to Feedback
  • Assignee changed from mgriessmeier to okurz

was a typo introduced by nicksinger causing the wrong ipmi host to be addressed -> https://gitlab.suse.de/openqa/salt-pillars-openqa/merge_requests/46

https://openqa.suse.de/tests/1114208#live working now on openqaw1:4 as expected?

#2 Updated by xlai almost 5 years ago

Yes, the reported issue is fixed. SUT can be connected now via ipmitool.

But this worker still fails job when install host via pxe, see job https://openqa.suse.de/tests/1114208. Please temporarily disable the worker.

#3 Updated by okurz almost 5 years ago

Sorry, did not disable the worker myself but anyone can request it with a merge request on salt pillars -> https://gitlab.suse.de/openqa/salt-pillars-openqa
But when there is a problem about that then IMHO we should be more explicit -> make sure we have an issue and someone is working on it

#4 Updated by nicksinger almost 5 years ago

For now openqaw1:3 and openqaw1:4 have 64bit-ipmi_debug as WORKER_CLASS to assign jobs specifically to them.
I've also tried to investigate this issue further and I'm pretty certain that it's no longer a network issue.
What I found out is that the linked test (https://openqa.suse.de/tests/1114208) definitely tries to access media on OSD which is not any longer available (e.g. UPGRADE_REPO=ftp://openqa.suse.de/SLE-12-SP3-Server-DVD-x86_64-Build0473-Media1).
What I've noticed is that non existing assets cause exactly this behavior:

This test worked in before: https://openqa.suse.de/tests/1058835
This test uses exactly the same assets but they don't exist anymore: https://openqa.suse.de/tests/1123603

As you can see, it behaves exactly the same as in xlai's linked test.
The real physical machine shows a network interface selection which is for unknown reasons not visible in the SOL session and therefore not visible in the openqa screenshots/live view.

xlai: I'd kindly ask you to try to fix the assets paths (e.g. use GM instead of "Build0473") and create a new ticket for this since the initial issue with a wrongly configured DNS is already resolved.

#5 Updated by xlai almost 5 years ago

  • Category deleted (Infrastructure)
  • Status changed from Feedback to New
  • Assignee changed from okurz to mgriessmeier

nicksinger wrote:

For now openqaw1:3 and openqaw1:4 have 64bit-ipmi_debug as WORKER_CLASS to assign jobs specifically to them.

Thanks for this, and we will not be disturbed any more :).

I've also tried to investigate this issue further and I'm pretty certain that it's no longer a network issue.
What I found out is that the linked test (https://openqa.suse.de/tests/1114208) definitely tries to access media on OSD which is not any longer available (e.g. UPGRADE_REPO=ftp://openqa.suse.de/SLE-12-SP3-Server-DVD-x86_64-Build0473-Media1).
What I've noticed is that non existing assets cause exactly this behavior:

This test worked in before: https://openqa.suse.de/tests/1058835
This test uses exactly the same assets but they don't exist anymore: https://openqa.suse.de/tests/1123603

As you can see, it behaves exactly the same as in xlai's linked test.

The root cause here is that the installation media can not be accessed via openqa.suse.de, but it should be accessed via http://openqa.suse.de. The upgrade_repo is not used here at all(used in latter test steps after installation succeeds). I have told oliver about it, and will fix in openqa boot_from_pxe test code.

The real physical machine shows a network interface selection which is for unknown reasons not visible in the SOL session and therefore not visible in the openqa screenshots/live view.

Yes, I also believe this to be a issue.

xlai: I'd kindly ask you to try to fix the assets paths (e.g. use GM instead of "Build0473") and create a new ticket for this since the initial issue with a wrongly configured DNS is already resolved.

I will open a new ticket.

#6 Updated by xlai almost 5 years ago

  • Related to action #23514: [labs][64bit-ipmi_debug worker] SLE15 shows interface selection (because 2 NICs are connected?) added

#7 Updated by xlai almost 5 years ago

  • Status changed from New to Resolved

The original access issue is fixed. Now create a new ticket to follow the issue when using it as a real ipmi worker, #23514

#8 Updated by okurz almost 5 years ago

  • Category set to Infrastructure

#9 Updated by okurz almost 5 years ago

  • Assignee changed from mgriessmeier to okurz

#10 Updated by nicksinger almost 5 years ago

  • Copied to action #23724: [infrastructure][ipmi] openQA is unable to reconnect to quinn (openQA ipmi worker) added

#11 Updated by nicksinger almost 5 years ago

  • Copied to deleted (action #23724: [infrastructure][ipmi] openQA is unable to reconnect to quinn (openQA ipmi worker))

Also available in: Atom PDF