Project

General

Profile

Actions

action #97745

open

[virtualization][hyperv] ensure_serialdev_permissions fails for hyperv

Added by JERiveraMoya over 2 years ago. Updated over 1 year ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
2021-08-31
Due date:
% Done:

0%

Estimated time:

Description

We noticed that ensure_serialdev_permissions fails for hyperv
https://openqa.suse.de/tests/6964523.
We collected other failure, but it is probably caused by not scheduling the preparation test there:
https://openqa.suse.de/tests/6964495#step/validate_lvm_raid1/11

No problem were seen 13 days ago https://openqa.suse.de/tests/6888660#step/system_prepare/9 using the same HYPERV_VERSION:2019


Files


Related issues 5 (2 open3 closed)

Related to openQA Tests - action #102236: [qe-core] test fails in system_prepare - change of permissions failsBlockedszarate

Actions
Related to qe-yam - action #105536: Test module perform_installation times out on svirt-hyperv-uefi backendRejected2022-01-26

Actions
Related to qe-yam - action #105521: svirt-hyperv backend seems to have trouble with getting script outputRejected2022-01-26

Actions
Related to openQA Tests - action #40715: [hyperv] Hyper-V 2012 R2 serial console unstableResolved2018-09-07

Actions
Related to openQA Tests - action #107302: [qe-core] Work around serial console problems in Hyper-VWorkableszarate2022-02-22

Actions
Actions #1

Updated by okurz over 2 years ago

  • Category set to Regressions/Crashes
  • Priority changed from Normal to High
  • Target version set to Ready

I suggest to carefully crosscheck which settings have changed since the last good. I consider regressions from os-autoinst unlikely. If we do not find problems in os-autoinst we should delegate to "QE Container & Public Cloud"

Actions #2

Updated by okurz over 2 years ago

  • Project changed from openQA Project to 190
  • Category deleted (Regressions/Crashes)

According to https://chat.suse.de/channel/testing?msg=L7YuMy2PQwPtMWt9r mloviska and pdostal know more, mloviska said:

"VIRSH_HOSTNAME="win2k19.qa.suse.cz" test uses the new hyperv which still kinda has some configuration issues, it would be better to use the old one for now. Not really sure why it has not reconnected the socat though. Pavel Dostál Do the migrated VMs use serial, please?

Actions #3

Updated by okurz over 2 years ago

  • Target version deleted (Ready)
Actions #4

Updated by mloviska over 2 years ago

  • Project changed from 190 to 208
  • Status changed from New to In Progress
  • Assignee set to mloviska

Worked for me http://kepler.suse.cz/tests/6681
But let's keep this one opened to see whether it occurs again.

Actions #5

Updated by mloviska over 2 years ago

  • Status changed from In Progress to Feedback
  • Priority changed from High to Normal
Actions #6

Updated by ilausuch over 2 years ago

Can we consider that this problem is fixed and close the ticket?

Actions #7

Updated by JERiveraMoya over 2 years ago

in latest build we could only find this one, not sure if related: https://openqa.suse.de/tests/7403395#step/validate_lvm_raid1/11
for now we thought it could be this: https://progress.opensuse.org/issues/100970

Actions #8

Updated by JERiveraMoya over 2 years ago

It is related, additional modules do not make any difference: https://openqa.suse.de/tests/7476991

Actions #9

Updated by mloviska over 2 years ago

I truly have no idea what happens over here. The other jobs are passing, therefore the serial line does not get reset or connection seems to be active.
With lvm+RAID1@svirt-hyperv-uefi I have noticed the same behaviour regarding serial console. When it comes to hyperv, the serial line is over TCP/IP Named pipes. While my jobs were running, I have seen both connections active as long as the jobs ran.
I have cloned the same problematic job on both hyperv servers:

In both cases the serial connection is active hence message attaching console,wait ...connected!, but after a while it seems like it freezes.

Actions #10

Updated by JERiveraMoya over 2 years ago

Thanks for taking a look, we also found this one in new build, which is a bit different because ended up in graphical system after installation, but we identified the same problem changing root console: https://openqa.suse.de/tests/7591216#step/integration_services/1

Actions #11

Updated by mloviska over 2 years ago

JERiveraMoya wrote:

Thanks for taking a look, we also found this one in new build, which is a bit different because ended up in graphical system after installation, but we identified the same problem changing root console: https://openqa.suse.de/tests/7591216#step/integration_services/1

HyperV just created a snapshot, not sure if that can have any effect on gnome. It does not really seem to be a problem related to the hypervisor itself.

Actions #12

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: lvm+RAID1@svirt-hyperv-uefi
https://openqa.suse.de/tests/7656832

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #13

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: lvm+RAID1@svirt-hyperv-uefi
https://openqa.suse.de/tests/7793640

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #14

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: lvm+RAID1@svirt-hyperv2016
https://openqa.suse.de/tests/7924467

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #15

Updated by JERiveraMoya over 2 years ago

  • Related to action #102236: [qe-core] test fails in system_prepare - change of permissions fails added
Actions #16

Updated by apappas over 2 years ago

As Joaquín linked this happens in the functional group too. From what I can tell it seems intermittent. I have made a build group to force the issue to appear, but I cannot reliably trigger it.

The times it happened the bootloader module warned that there was a boot parameter mismatch even though the video shows no obvious mismatch. The installation proceeded normally and then the test failed at the first assert_script_run, which happens to be at the ensure_serialdev_permissions module.

Here is the "normal" run with the failures https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Online&machine=svirt-hyperv&test=default&version=15-SP4#next_previous

Actions #18

Updated by pdostal over 2 years ago

I don't know much about Hyper-V svirt backend but I noticed this:
1) The Named Named Pipe TCP proxy tends to forget it's configuration (probably when it's reopened).
2) When there is a test running and someone reuse the serial port number it will destroy both tests.

Actions #20

Updated by openqa_review about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: minimal+base_yast@svirt-hyperv
https://openqa.suse.de/tests/8010746

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #21

Updated by jlausuch about 2 years ago

  • Project changed from 208 to openQA Tests
  • Assignee deleted (mloviska)

Moving out of JeOS project, looking at this comment this is more related to our infra than JeOS it self.

Actions #22

Updated by okurz about 2 years ago

  • Category set to Bugs in existing tests
Actions #23

Updated by maritawerner about 2 years ago

Oli, I move the ticket to the tools team, if that is not correct please reassign to the right team.

Actions #24

Updated by maritawerner about 2 years ago

  • Project changed from openQA Tests to openQA Infrastructure
  • Category deleted (Bugs in existing tests)
Actions #25

Updated by okurz about 2 years ago

  • Target version set to Ready

Given that we have recurring reminder comments the issue seems to be still present so we should look into recent occurences and see what we can do

Actions #26

Updated by okurz about 2 years ago

  • Status changed from Feedback to New
Actions #27

Updated by openqa_review about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: lvm+RAID1@svirt-hyperv-uefi
https://openqa.suse.de/tests/8162314

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #28

Updated by okurz about 2 years ago

  • Assignee set to okurz

looks like related to the Windows server serial port forwarding instability issues mentioned elsewhere. I need to lookup according ticket references and who can work on that using socat or something as replacement.

Actions #29

Updated by okurz about 2 years ago

  • Related to action #105536: Test module perform_installation times out on svirt-hyperv-uefi backend added
Actions #30

Updated by okurz about 2 years ago

  • Related to action #105521: svirt-hyperv backend seems to have trouble with getting script output added
Actions #34

Updated by okurz about 2 years ago

  • Related to action #40715: [hyperv] Hyper-V 2012 R2 serial console unstable added
Actions #35

Updated by okurz about 2 years ago

  • Subject changed from ensure_serialdev_permissions fails for hyperv to [virtualization][hyperv] ensure_serialdev_permissions fails for hyperv
  • Assignee changed from okurz to xlai
  • Target version deleted (Ready)

@xlai I have linked related tickets for this story. I think the problems mentioned in this ticket also boil down to the unstable solution of forwarding the serial port from within the Windows server. I recommend you look into the proposed "socat" solution we discussed lately. Assigning to you and team scope "[virtualization]" for followup.

Actions #36

Updated by xlai about 2 years ago

  • Assignee changed from xlai to jstehlik

okurz wrote:

@xlai I have linked related tickets for this story. I think the problems mentioned in this ticket also boil down to the unstable solution of forwarding the serial port from within the Windows server. I recommend you look into the proposed "socat" solution we discussed lately. Assigning to you and team scope "[virtualization]" for followup.

@okurz I have the same feeling that this is related to unstable serial forwarding issue. I can have a look, but this is not the area that I am expert in. And Nan, who is responsible for hyperv in VT team, is too busy on fulfilling test and automation requirements and won't have time for this in very long time.

I thought this was tools team's scope and @jstehlik said he would further talk with you about the socat implementation. Jan, right?

Actions #37

Updated by okurz about 2 years ago

  • Target version set to future
Actions #38

Updated by okurz about 2 years ago

  • Related to action #107302: [qe-core] Work around serial console problems in Hyper-V added
Actions #39

Updated by okurz about 2 years ago

xlai wrote:

okurz wrote:

@xlai I have linked related tickets for this story. I think the problems mentioned in this ticket also boil down to the unstable solution of forwarding the serial port from within the Windows server. I recommend you look into the proposed "socat" solution we discussed lately. Assigning to you and team scope "[virtualization]" for followup.

@okurz I have the same feeling that this is related to unstable serial forwarding issue. I can have a look, but this is not the area that I am expert in. And Nan, who is responsible for hyperv in VT team, is too busy on fulfilling test and automation requirements and won't have time for this in very long time.

I thought this was tools team's scope and @jstehlik said he would further talk with you about the socat implementation. Jan, right?

Yes, the tools team can do everything ;) Well, we have to be realistic with what to expect You have 7 members in your team with your domain being virtualization including HyperV. We have currently 5 FTE + 3 part time workers with most of us hired to do software development plus hardware maintenance. We already are stretching our competences with taking over maintainership for backends that no one of us has developed. We have no or very little experience with administrating something on Windows servers. I am sure you will benefit in your team if you build up the necessary competence to solve problems related to services on the Windows host. As I already explained in #105473#note-4 I think ensuring necessary requirements from within the test automation code can have multiple benefits and would likely solve the problem and stabilize the setup. As you noted that currently one person should be responsible for HyperV I suggest to build up the competence within the team to not again run into the situation that a single person leaving a team would cause such damage as happened previously with members of the virtualization team building up the solutions that you rely upon. Of course you can decide on your own how you select priorities and plans for the individual tasks as you mentioned that Nan would be currently busy with other tasks.

If you consider other tasks, e.g. VMWare related backend implementations, less important than this task here of course we can try to free up capacity to work on HyperV related topics instead.

Actions #40

Updated by xlai about 2 years ago

  • Assignee deleted (jstehlik)

@okurz Vmware 7.0 svirt backend support definitely has much higher priority than this because it has high and increasing business value.

Based on my understanding, issue in this ticket is likely to be related to serial console handling of vm, which is likely to be related to broken NPTP settings on windows server. Windows 2019 can support persistent configuration of maximum 9 ports in NPTP, once exceeds, all settings will be lost. This setting aims to setup serial console redirection for vm, so if the configuration is lost, all openqa svirt-hyperv jobs will fail.
To solve the problem, either(at least work for virtualization tests), manually recover the NPTP settings, or as I gave in https://progress.opensuse.org/issues/105473#note-9, it can be added in test code to check and correct the NPTP settings. Virtualization team can work on this test code enhancement. But for us, this is a normal priority issue, and we can only work on it after other high priority tasks are done. Of course anyone interested in having this done earlier can contribute too.

Actions #41

Updated by openqa_review about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: lvm+RAID1@svirt-hyperv-uefi
https://openqa.suse.de/tests/8292191

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #43

Updated by openqa_review about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: minimal+base_yast@svirt-hyperv
https://openqa.suse.de/tests/8496177#step/system_prepare/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #44

Updated by openqa_review almost 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: minimal+base_yast@svirt-hyperv
https://openqa.suse.de/tests/8570379#step/system_prepare/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #45

Updated by openqa_review almost 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: lvm+RAID1@svirt-hyperv
https://openqa.suse.de/tests/8752318#step/validate_lvm_raid1/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 56 days if nothing changes in this ticket.

Actions #46

Updated by openqa_review over 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: minimal+base_yast@svirt-hyperv
https://openqa.suse.de/tests/9340996#step/first_boot/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 112 days if nothing changes in this ticket.

Actions

Also available in: Atom PDF