Project

General

Profile

Actions

action #23368

closed

ipmi worker openqaw1:4 has issue to connect to the ipmi machine.

Added by xlai over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Infrastructure
Target version:
-
Start date:
2017-08-15
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

The worker's bounded SUT name can not be translated to IP.

Key log:
DIE ipmitool -I lanplus -H openqa4-sp.qa.suse.de -U admin -P qatesting mc guid: Address lookup for openqa4-sp.qa.suse.de failed
Could not open socket!
Error: Unable to establish IPMI v2 / RMCP+ session at /usr/lib/os-autoinst/backend/ipmi.pm line 62.

Job link:
https://openqa.suse.de/tests/1108863


Related issues 1 (0 open1 closed)

Related to openQA Tests - action #23514: [labs][64bit-ipmi_debug worker] SLE15 shows interface selection (because 2 NICs are connected?)Resolvednicksinger2017-08-22

Actions
Actions #1

Updated by okurz over 6 years ago

  • Category set to Infrastructure
  • Status changed from New to Feedback
  • Assignee changed from mgriessmeier to okurz

was a typo introduced by nicksinger causing the wrong ipmi host to be addressed -> https://gitlab.suse.de/openqa/salt-pillars-openqa/merge_requests/46

https://openqa.suse.de/tests/1114208#live working now on openqaw1:4 as expected?

Actions #2

Updated by xlai over 6 years ago

Yes, the reported issue is fixed. SUT can be connected now via ipmitool.

But this worker still fails job when install host via pxe, see job https://openqa.suse.de/tests/1114208. Please temporarily disable the worker.

Actions #3

Updated by okurz over 6 years ago

Sorry, did not disable the worker myself but anyone can request it with a merge request on salt pillars -> https://gitlab.suse.de/openqa/salt-pillars-openqa
But when there is a problem about that then IMHO we should be more explicit -> make sure we have an issue and someone is working on it

Actions #4

Updated by nicksinger over 6 years ago

For now openqaw1:3 and openqaw1:4 have 64bit-ipmi_debug as WORKER_CLASS to assign jobs specifically to them.
I've also tried to investigate this issue further and I'm pretty certain that it's no longer a network issue.
What I found out is that the linked test (https://openqa.suse.de/tests/1114208) definitely tries to access media on OSD which is not any longer available (e.g. UPGRADE_REPO=ftp://openqa.suse.de/SLE-12-SP3-Server-DVD-x86_64-Build0473-Media1).
What I've noticed is that non existing assets cause exactly this behavior:

This test worked in before: https://openqa.suse.de/tests/1058835
This test uses exactly the same assets but they don't exist anymore: https://openqa.suse.de/tests/1123603

As you can see, it behaves exactly the same as in xlai's linked test.
The real physical machine shows a network interface selection which is for unknown reasons not visible in the SOL session and therefore not visible in the openqa screenshots/live view.

@xlai: I'd kindly ask you to try to fix the assets paths (e.g. use GM instead of "Build0473") and create a new ticket for this since the initial issue with a wrongly configured DNS is already resolved.

Actions #5

Updated by xlai over 6 years ago

  • Category deleted (Infrastructure)
  • Status changed from Feedback to New
  • Assignee changed from okurz to mgriessmeier

nicksinger wrote:

For now openqaw1:3 and openqaw1:4 have 64bit-ipmi_debug as WORKER_CLASS to assign jobs specifically to them.

Thanks for this, and we will not be disturbed any more :).

I've also tried to investigate this issue further and I'm pretty certain that it's no longer a network issue.
What I found out is that the linked test (https://openqa.suse.de/tests/1114208) definitely tries to access media on OSD which is not any longer available (e.g. UPGRADE_REPO=ftp://openqa.suse.de/SLE-12-SP3-Server-DVD-x86_64-Build0473-Media1).
What I've noticed is that non existing assets cause exactly this behavior:

This test worked in before: https://openqa.suse.de/tests/1058835
This test uses exactly the same assets but they don't exist anymore: https://openqa.suse.de/tests/1123603

As you can see, it behaves exactly the same as in xlai's linked test.

The root cause here is that the installation media can not be accessed via openqa.suse.de, but it should be accessed via http://openqa.suse.de. The upgrade_repo is not used here at all(used in latter test steps after installation succeeds). I have told oliver about it, and will fix in openqa boot_from_pxe test code.

The real physical machine shows a network interface selection which is for unknown reasons not visible in the SOL session and therefore not visible in the openqa screenshots/live view.

Yes, I also believe this to be a issue.

@xlai: I'd kindly ask you to try to fix the assets paths (e.g. use GM instead of "Build0473") and create a new ticket for this since the initial issue with a wrongly configured DNS is already resolved.

I will open a new ticket.

Actions #6

Updated by xlai over 6 years ago

  • Related to action #23514: [labs][64bit-ipmi_debug worker] SLE15 shows interface selection (because 2 NICs are connected?) added
Actions #7

Updated by xlai over 6 years ago

  • Status changed from New to Resolved

The original access issue is fixed. Now create a new ticket to follow the issue when using it as a real ipmi worker, #23514

Actions #8

Updated by okurz over 6 years ago

  • Category set to Infrastructure
Actions #9

Updated by okurz over 6 years ago

  • Assignee changed from mgriessmeier to okurz
Actions #10

Updated by nicksinger over 6 years ago

  • Copied to action #23724: [infrastructure][ipmi] openQA is unable to reconnect to quinn (openQA ipmi worker) added
Actions #11

Updated by nicksinger over 6 years ago

  • Copied to deleted (action #23724: [infrastructure][ipmi] openQA is unable to reconnect to quinn (openQA ipmi worker))
Actions

Also available in: Atom PDF