Project

General

Profile

action #121306

[virtualization][hyperv] worker7-hyperv.oqa.suse.de can not be reached

Added by rcai 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Immediate
Assignee:
-
Target version:
Start date:
2022-12-01
Due date:
% Done:

100%

Estimated time:

Description

##Backgroud
VLAN Migration
openqaw7-hyperv.qa.suse.de(10.162.0.101) -> worker7-hyperv.oqa.suse.de(10.137.10.7)

I debugged windows network card and disabled ExternalVirtualSwitch.
Then I can not ping it...ah
ping worker7-hyperv.oqa.suse.de
PING worker7-hyperv.oqa.suse.de (10.137.10.7) 56(84) bytes of data.
From 10.136.0.1 (10.136.0.1) icmp_seq=1 Destination Host Unreachable
From 10.136.0.1 (10.136.0.1) icmp_seq=2 Destination Host Unreachable
From 10.136.0.1 (10.136.0.1) icmp_seq=3 Destination Host Unreachable

##Before VLAN Migration
Web access: https://sp.openqaw7-hyperv.qa.suse.de
I can access windows desktop and make some configuration, including network configuration.
It's convenient to enable it again.

##After VLAN Migration
As far as you know, just only use jumpy server to access now.
ssh -4 jumpy@qe-xxx.suse.de -- ipmitool -I lanplus -H worker7-hyperv.qe-xxx-xx -U ADMIN -P xxxxxxx sol activate
but I can not access windows desktop by this method.

##Expected
It is better to access windows desktop after VLAN Migration and helpful to debug environment sometimes.

##Impact
It effects the job running of build 52.4 on this server(5 jobs)

worker7-hyperv.png (70.7 KB) worker7-hyperv.png nanzhang, 2022-12-05 13:20
14224

History

#1 Updated by rcai 2 months ago

  • Copied from action #116344: openqaw9-hyperv.qa.suse.de (flexo.qa.suse.cz) can not be reached size:M added

#2 Updated by rcai 2 months ago

  • Copied from deleted (action #116344: openqaw9-hyperv.qa.suse.de (flexo.qa.suse.cz) can not be reached size:M)

#3 Updated by rcai 2 months ago

  • Start date changed from 2022-09-08 to 2022-12-01

#6 Updated by okurz 2 months ago

  • Subject changed from worker7-hyperv.oqa.suse.de can not be reached to [virtualization] worker7-hyperv.oqa.suse.de can not be reached
  • Assignee set to nanzhang
  • Target version deleted (Ready)

nanzhang is the maintainer and managing the migration

#7 Updated by okurz 2 months ago

  • Target version set to future

#8 Updated by rcai about 2 months ago

Apply access to the server(with jumper server:qe-jumpy.suse.de)

Submit MR: https://gitlab.suse.de/rcai/salt/-/merge_requests/1

#9 Updated by rcai about 2 months ago

Merged https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/2926.
Nan had access to server worker7-hyperv.oqa.suse.de

#10 Updated by nanzhang about 2 months ago

14224

Actually I can do nothing for this problem, as I can't see anything on Remote Console Preview via ipmi web console (refer to screenshot worker7-hyperv.png). And also its host can't be pingable.

$ ping worker7-hyperv.oqa.suse.de
PING worker7-hyperv.oqa.suse.de (10.137.10.7) 56(84) bytes of data.
From 10.136.0.1 icmp_seq=1 Destination Host Unreachable
From 10.136.0.1 icmp_seq=2 Destination Host Unreachable
From 10.136.0.1 icmp_seq=3 Destination Host Unreachable

#11 Updated by rcai about 2 months ago

  • Subject changed from [virtualization] worker7-hyperv.oqa.suse.de can not be reached to [virtualization][hyperv] worker7-hyperv.oqa.suse.de can not be reached

#12 Updated by rcai about 2 months ago

#13 Updated by rcai about 2 months ago

  • % Done changed from 0 to 20

#14 Updated by rcai about 2 months ago

  • Status changed from New to In Progress

#15 Updated by xlai about 2 months ago

  • Priority changed from Urgent to Immediate

nanzhang wrote:

Actually I can do nothing for this problem, as I can't see anything on Remote Console Preview via ipmi web console (refer to screenshot worker7-hyperv.png). And also its host can't be pingable.

$ ping worker7-hyperv.oqa.suse.de
PING worker7-hyperv.oqa.suse.de (10.137.10.7) 56(84) bytes of data.
From 10.136.0.1 icmp_seq=1 Destination Host Unreachable
From 10.136.0.1 icmp_seq=2 Destination Host Unreachable
From 10.136.0.1 icmp_seq=3 Destination Host Unreachable

okurz Hi Oliver, this issue is blocking all hyperv tests, especially for current 15sp5 beta2 milestone. It needs to be fixed ASAP.

In my understanding, this is OSD infra issue after security zone migration. Why tools team not fix it? Would you please explain the reasons? Besides, if you think Nan has the relevant infra permission to do so, we can do it by ourselves. But would you please give some guide to him? It seems stuck now. He can do nothing...

#16 Updated by xlai about 2 months ago

  • Status changed from In Progress to Feedback
  • Assignee changed from nanzhang to okurz

#17 Updated by rcai about 2 months ago

#18 Updated by xlai about 2 months ago

  • Status changed from Feedback to Resolved
  • Assignee deleted (okurz)

#19 Updated by okurz about 2 months ago

xlai wrote:

[…] In my understanding, this is OSD infra issue after security zone migration. Why tools team not fix it? Would you please explain the reasons? Besides, if you think Nan has the relevant infra permission to do so, we can do it by ourselves. But would you please give some guide to him? It seems stuck now. He can do nothing...

Just to make sure this is answered completely: The SUSE QE tools team does not maintain the machine worker7-hyperv.oqa.suse.de. This is also reflected in https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=10407 which gives as contact person "qa-apac2@suse.de", not "osd-admins@suse.de".

This is also explained in https://progress.opensuse.org/projects/qa/wiki/Tools#Out-of-scope in the point:

Maintenance of special worker addendums needed for tests, e.g. external hypervisor hosts for s390x, powerVM, xen, hyperv, IPMI, VMWare (Clarification: We maintain the code for all backends but we are no experts in specific domains. So we always try to help but it's a case by case decision based on what we realistically can provide based on our competence. We can't be expected to be experts in everything and also we are limited in what we can actually test.)

So let's explain that in examples:

  • The SUSE QE Tools team does not monitor the machine so the team won't immediately realize if the machine is not reachable or otherwise misbehaving
  • The SUSE QE Tools team can react to problems if they are brought up, like as happened in a ticket here. The team tries to help as far as their competences and capabilities go. Often frustration comes from the expectation that people would be more experienced than they actually are. It is often better to expect that the ones "on the other side" are overwhelmed, unexperienced, already pre-occupied with other stories and more. You, know, mere humans ;)
  • The SUSE QE Tools team does not ensure that the machine is properly updated or upgraded, especially because it's "non-standard" as not an openSUSE Leap system for which we have good automation

One more suggestion: If your workflows rely on the presence of "an hyperv" host so much that any problem needs to be fixed ASAP then I strongly suggest you build up redundancy. openQA is very good with providing worker redundancy. This is why mostly nobody cares if a single openQA worker hardware goes down and is unavailable even for months while the problem is being worked on.

#20 Updated by rcai about 1 month ago

  • % Done changed from 20 to 100

Also available in: Atom PDF