Project

General

Profile

Actions

action #121306

closed

[virtualization][hyperv] worker7-hyperv.oqa.suse.de can not be reached

Added by rcai over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Category:
-
Target version:
Start date:
2022-12-01
Due date:
% Done:

100%

Estimated time:

Description

##Backgroud
VLAN Migration
openqaw7-hyperv.qa.suse.de(10.162.0.101) -> worker7-hyperv.oqa.suse.de(10.137.10.7)

I debugged windows network card and disabled ExternalVirtualSwitch.
Then I can not ping it...ah
ping worker7-hyperv.oqa.suse.de
PING worker7-hyperv.oqa.suse.de (10.137.10.7) 56(84) bytes of data.
From 10.136.0.1 (10.136.0.1) icmp_seq=1 Destination Host Unreachable
From 10.136.0.1 (10.136.0.1) icmp_seq=2 Destination Host Unreachable
From 10.136.0.1 (10.136.0.1) icmp_seq=3 Destination Host Unreachable

##Before VLAN Migration
Web access: https://sp.openqaw7-hyperv.qa.suse.de
I can access windows desktop and make some configuration, including network configuration.
It's convenient to enable it again.

##After VLAN Migration
As far as you know, just only use jumpy server to access now.
ssh -4 jumpy@qe-xxx.suse.de -- ipmitool -I lanplus -H worker7-hyperv.qe-xxx-xx -U ADMIN -P xxxxxxx sol activate
but I can not access windows desktop by this method.

##Expected
It is better to access windows desktop after VLAN Migration and helpful to debug environment sometimes.

##Impact
It effects the job running of build 52.4 on this server(5 jobs)


Files

worker7-hyperv.png (70.7 KB) worker7-hyperv.png nanzhang, 2022-12-05 13:20
Actions #1

Updated by rcai over 1 year ago

  • Copied from action #116344: openqaw9-hyperv.qa.suse.de (flexo.qa.suse.cz) can not be reached size:M added
Actions #2

Updated by rcai over 1 year ago

  • Copied from deleted (action #116344: openqaw9-hyperv.qa.suse.de (flexo.qa.suse.cz) can not be reached size:M)
Actions #3

Updated by rcai over 1 year ago

  • Start date changed from 2022-09-08 to 2022-12-01
Actions #6

Updated by okurz over 1 year ago

  • Subject changed from worker7-hyperv.oqa.suse.de can not be reached to [virtualization] worker7-hyperv.oqa.suse.de can not be reached
  • Assignee set to nanzhang
  • Target version deleted (Ready)

nanzhang is the maintainer and managing the migration

Actions #7

Updated by okurz over 1 year ago

  • Target version set to future
Actions #8

Updated by rcai over 1 year ago

Apply access to the server(with jumper server:qe-jumpy.suse.de)

Submit MR: https://gitlab.suse.de/rcai/salt/-/merge_requests/1

Actions #9

Updated by rcai over 1 year ago

Merged https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/2926.
Nan had access to server worker7-hyperv.oqa.suse.de

Actions #10

Updated by nanzhang over 1 year ago

Actually I can do nothing for this problem, as I can't see anything on Remote Console Preview via ipmi web console (refer to screenshot worker7-hyperv.png). And also its host can't be pingable.

$ ping worker7-hyperv.oqa.suse.de
PING worker7-hyperv.oqa.suse.de (10.137.10.7) 56(84) bytes of data.
From 10.136.0.1 icmp_seq=1 Destination Host Unreachable
From 10.136.0.1 icmp_seq=2 Destination Host Unreachable
From 10.136.0.1 icmp_seq=3 Destination Host Unreachable

Actions #11

Updated by rcai over 1 year ago

  • Subject changed from [virtualization] worker7-hyperv.oqa.suse.de can not be reached to [virtualization][hyperv] worker7-hyperv.oqa.suse.de can not be reached
Actions #12

Updated by rcai over 1 year ago

Actions #13

Updated by rcai over 1 year ago

  • % Done changed from 0 to 20
Actions #14

Updated by rcai over 1 year ago

  • Status changed from New to In Progress
Actions #15

Updated by xlai over 1 year ago

  • Priority changed from Urgent to Immediate

nanzhang wrote:

Actually I can do nothing for this problem, as I can't see anything on Remote Console Preview via ipmi web console (refer to screenshot worker7-hyperv.png). And also its host can't be pingable.

$ ping worker7-hyperv.oqa.suse.de
PING worker7-hyperv.oqa.suse.de (10.137.10.7) 56(84) bytes of data.
From 10.136.0.1 icmp_seq=1 Destination Host Unreachable
From 10.136.0.1 icmp_seq=2 Destination Host Unreachable
From 10.136.0.1 icmp_seq=3 Destination Host Unreachable

@okurz Hi Oliver, this issue is blocking all hyperv tests, especially for current 15sp5 beta2 milestone. It needs to be fixed ASAP.

In my understanding, this is OSD infra issue after security zone migration. Why tools team not fix it? Would you please explain the reasons? Besides, if you think Nan has the relevant infra permission to do so, we can do it by ourselves. But would you please give some guide to him? It seems stuck now. He can do nothing...

Actions #16

Updated by xlai over 1 year ago

  • Status changed from In Progress to Feedback
  • Assignee changed from nanzhang to okurz
Actions #17

Updated by rcai over 1 year ago

Actions #18

Updated by xlai over 1 year ago

  • Status changed from Feedback to Resolved
  • Assignee deleted (okurz)
Actions #19

Updated by okurz over 1 year ago

xlai wrote:

[…] In my understanding, this is OSD infra issue after security zone migration. Why tools team not fix it? Would you please explain the reasons? Besides, if you think Nan has the relevant infra permission to do so, we can do it by ourselves. But would you please give some guide to him? It seems stuck now. He can do nothing...

Just to make sure this is answered completely: The SUSE QE tools team does not maintain the machine worker7-hyperv.oqa.suse.de. This is also reflected in https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=10407 which gives as contact person "qa-apac2@suse.de", not "osd-admins@suse.de".

This is also explained in https://progress.opensuse.org/projects/qa/wiki/Tools#Out-of-scope in the point:

Maintenance of special worker addendums needed for tests, e.g. external hypervisor hosts for s390x, powerVM, xen, hyperv, IPMI, VMWare (Clarification: We maintain the code for all backends but we are no experts in specific domains. So we always try to help but it's a case by case decision based on what we realistically can provide based on our competence. We can't be expected to be experts in everything and also we are limited in what we can actually test.)

So let's explain that in examples:

  • The SUSE QE Tools team does not monitor the machine so the team won't immediately realize if the machine is not reachable or otherwise misbehaving
  • The SUSE QE Tools team can react to problems if they are brought up, like as happened in a ticket here. The team tries to help as far as their competences and capabilities go. Often frustration comes from the expectation that people would be more experienced than they actually are. It is often better to expect that the ones "on the other side" are overwhelmed, unexperienced, already pre-occupied with other stories and more. You, know, mere humans ;)
  • The SUSE QE Tools team does not ensure that the machine is properly updated or upgraded, especially because it's "non-standard" as not an openSUSE Leap system for which we have good automation

One more suggestion: If your workflows rely on the presence of "an hyperv" host so much that any problem needs to be fixed ASAP then I strongly suggest you build up redundancy. openQA is very good with providing worker redundancy. This is why mostly nobody cares if a single openQA worker hardware goes down and is unavailable even for months while the problem is being worked on.

Actions #20

Updated by rcai over 1 year ago

  • % Done changed from 20 to 100
Actions

Also available in: Atom PDF