action #97745
open[virtualization][hyperv] ensure_serialdev_permissions fails for hyperv
0%
Description
We noticed that ensure_serialdev_permissions fails for hyperv
https://openqa.suse.de/tests/6964523.
We collected other failure, but it is probably caused by not scheduling the preparation test there:
https://openqa.suse.de/tests/6964495#step/validate_lvm_raid1/11
No problem were seen 13 days ago https://openqa.suse.de/tests/6888660#step/system_prepare/9 using the same HYPERV_VERSION:2019
Files
Updated by okurz over 3 years ago
- Category set to Regressions/Crashes
- Priority changed from Normal to High
- Target version set to Ready
I suggest to carefully crosscheck which settings have changed since the last good. I consider regressions from os-autoinst unlikely. If we do not find problems in os-autoinst we should delegate to "QE Container & Public Cloud"
Updated by okurz over 3 years ago
- Project changed from openQA Project (public) to Containers and images
- Category deleted (
Regressions/Crashes)
According to https://chat.suse.de/channel/testing?msg=L7YuMy2PQwPtMWt9r mloviska and pdostal know more, mloviska said:
"VIRSH_HOSTNAME="win2k19.qa.suse.cz" test uses the new hyperv which still kinda has some configuration issues, it would be better to use the old one for now. Not really sure why it has not reconnected the socat though. Pavel Dostál Do the migrated VMs use serial, please?
Updated by mloviska over 3 years ago
- Project changed from Containers and images to 208
- Status changed from New to In Progress
- Assignee set to mloviska
Worked for me http://kepler.suse.cz/tests/6681
But let's keep this one opened to see whether it occurs again.
Updated by mloviska over 3 years ago
- Status changed from In Progress to Feedback
- Priority changed from High to Normal
Updated by ilausuch over 3 years ago
Can we consider that this problem is fixed and close the ticket?
Updated by JERiveraMoya over 3 years ago
in latest build we could only find this one, not sure if related: https://openqa.suse.de/tests/7403395#step/validate_lvm_raid1/11
for now we thought it could be this: https://progress.opensuse.org/issues/100970
Updated by JERiveraMoya about 3 years ago
It is related, additional modules do not make any difference: https://openqa.suse.de/tests/7476991
Updated by mloviska about 3 years ago
I truly have no idea what happens over here. The other jobs are passing, therefore the serial line does not get reset or connection seems to be active.
With lvm+RAID1@svirt-hyperv-uefi I have noticed the same behaviour regarding serial console. When it comes to hyperv, the serial line is over TCP/IP Named pipes. While my jobs were running, I have seen both connections active as long as the jobs ran.
I have cloned the same problematic job on both hyperv servers:
In both cases the serial connection is active hence message attaching console,wait ...connected!
, but after a while it seems like it freezes.
Updated by JERiveraMoya about 3 years ago
Thanks for taking a look, we also found this one in new build, which is a bit different because ended up in graphical system after installation, but we identified the same problem changing root console: https://openqa.suse.de/tests/7591216#step/integration_services/1
Updated by mloviska about 3 years ago
JERiveraMoya wrote:
Thanks for taking a look, we also found this one in new build, which is a bit different because ended up in graphical system after installation, but we identified the same problem changing root console: https://openqa.suse.de/tests/7591216#step/integration_services/1
HyperV just created a snapshot, not sure if that can have any effect on gnome. It does not really seem to be a problem related to the hypervisor itself.
Updated by openqa_review about 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: lvm+RAID1@svirt-hyperv-uefi
https://openqa.suse.de/tests/7656832
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by openqa_review about 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: lvm+RAID1@svirt-hyperv-uefi
https://openqa.suse.de/tests/7793640
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by openqa_review about 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: lvm+RAID1@svirt-hyperv2016
https://openqa.suse.de/tests/7924467
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by JERiveraMoya about 3 years ago
- Related to action #102236: [qe-core] test fails in system_prepare - change of permissions fails added
Updated by apappas about 3 years ago
As Joaquín linked this happens in the functional group too. From what I can tell it seems intermittent. I have made a build group to force the issue to appear, but I cannot reliably trigger it.
The times it happened the bootloader
module warned that there was a boot parameter mismatch even though the video shows no obvious mismatch. The installation proceeded normally and then the test failed at the first assert_script_run, which happens to be at the ensure_serialdev_permissions
module.
Here is the "normal" run with the failures https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Online&machine=svirt-hyperv&test=default&version=15-SP4#next_previous
Updated by JERiveraMoya about 3 years ago
I took the time to re-trigger jobs for both hyperv machines in order to compare in latest product validation, this was the result:
https://openqa.suse.de/tests/overview?arch=&flavor=&machine=svirt-hyperv%2Csvirt-hyperv-uefi%2Csvirt-hyperv2016%2Csvirt-hyperv2016-uefi&test=&modules=&module_re=&distri=sle&version=15-SP4&build=79.1&groupid=129#
Updated by pdostal about 3 years ago
I don't know much about Hyper-V svirt backend but I noticed this:
1) The Named Named Pipe TCP proxy tends to forget it's configuration (probably when it's reopened).
2) When there is a test running and someone reuse the serial port number it will destroy both tests.
Updated by szarate about 3 years ago
- Related to action #103863: [JeOS 15-SP3 QU2 hyperv] - test fails in firstrun - setterm -blank 0' timed out added
Updated by openqa_review almost 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: minimal+base_yast@svirt-hyperv
https://openqa.suse.de/tests/8010746
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by jlausuch almost 3 years ago
- Project changed from 208 to openQA Tests (public)
- Assignee deleted (
mloviska)
Moving out of JeOS project, looking at this comment this is more related to our infra than JeOS it self.
Updated by maritawerner almost 3 years ago
Oli, I move the ticket to the tools team, if that is not correct please reassign to the right team.
Updated by maritawerner almost 3 years ago
- Project changed from openQA Tests (public) to openQA Infrastructure (public)
- Category deleted (
Bugs in existing tests)
Updated by okurz almost 3 years ago
- Target version set to Ready
Given that we have recurring reminder comments the issue seems to be still present so we should look into recent occurences and see what we can do
Updated by openqa_review almost 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: lvm+RAID1@svirt-hyperv-uefi
https://openqa.suse.de/tests/8162314
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Updated by okurz almost 3 years ago
- Assignee set to okurz
looks like related to the Windows server serial port forwarding instability issues mentioned elsewhere. I need to lookup according ticket references and who can work on that using socat or something as replacement.
Updated by okurz almost 3 years ago
- Related to action #105536: Test module perform_installation times out on svirt-hyperv-uefi backend added
Updated by okurz almost 3 years ago
- Related to action #105521: svirt-hyperv backend seems to have trouble with getting script output added
Updated by okurz almost 3 years ago
- Related to coordination #95422: [MinimalVM][epic] separate hyperv from svirt backend added
Updated by okurz almost 3 years ago
- Related to action #40715: [hyperv] Hyper-V 2012 R2 serial console unstable added
Updated by okurz almost 3 years ago
- Subject changed from ensure_serialdev_permissions fails for hyperv to [virtualization][hyperv] ensure_serialdev_permissions fails for hyperv
- Assignee changed from okurz to xlai
- Target version deleted (
Ready)
@xlai I have linked related tickets for this story. I think the problems mentioned in this ticket also boil down to the unstable solution of forwarding the serial port from within the Windows server. I recommend you look into the proposed "socat" solution we discussed lately. Assigning to you and team scope "[virtualization]" for followup.
Updated by xlai almost 3 years ago
- Assignee changed from xlai to jstehlik
okurz wrote:
@xlai I have linked related tickets for this story. I think the problems mentioned in this ticket also boil down to the unstable solution of forwarding the serial port from within the Windows server. I recommend you look into the proposed "socat" solution we discussed lately. Assigning to you and team scope "[virtualization]" for followup.
@okurz I have the same feeling that this is related to unstable serial forwarding issue. I can have a look, but this is not the area that I am expert in. And Nan, who is responsible for hyperv in VT team, is too busy on fulfilling test and automation requirements and won't have time for this in very long time.
I thought this was tools team's scope and @jstehlik said he would further talk with you about the socat implementation. Jan, right?
Updated by okurz almost 3 years ago
- Related to action #107302: [qe-core] Work around serial console problems in Hyper-V added
Updated by okurz almost 3 years ago
xlai wrote:
okurz wrote:
@xlai I have linked related tickets for this story. I think the problems mentioned in this ticket also boil down to the unstable solution of forwarding the serial port from within the Windows server. I recommend you look into the proposed "socat" solution we discussed lately. Assigning to you and team scope "[virtualization]" for followup.
@okurz I have the same feeling that this is related to unstable serial forwarding issue. I can have a look, but this is not the area that I am expert in. And Nan, who is responsible for hyperv in VT team, is too busy on fulfilling test and automation requirements and won't have time for this in very long time.
I thought this was tools team's scope and @jstehlik said he would further talk with you about the socat implementation. Jan, right?
Yes, the tools team can do everything ;) Well, we have to be realistic with what to expect You have 7 members in your team with your domain being virtualization including HyperV. We have currently 5 FTE + 3 part time workers with most of us hired to do software development plus hardware maintenance. We already are stretching our competences with taking over maintainership for backends that no one of us has developed. We have no or very little experience with administrating something on Windows servers. I am sure you will benefit in your team if you build up the necessary competence to solve problems related to services on the Windows host. As I already explained in #105473#note-4 I think ensuring necessary requirements from within the test automation code can have multiple benefits and would likely solve the problem and stabilize the setup. As you noted that currently one person should be responsible for HyperV I suggest to build up the competence within the team to not again run into the situation that a single person leaving a team would cause such damage as happened previously with members of the virtualization team building up the solutions that you rely upon. Of course you can decide on your own how you select priorities and plans for the individual tasks as you mentioned that Nan would be currently busy with other tasks.
If you consider other tasks, e.g. VMWare related backend implementations, less important than this task here of course we can try to free up capacity to work on HyperV related topics instead.
Updated by xlai almost 3 years ago
- Assignee deleted (
jstehlik)
@okurz Vmware 7.0 svirt backend support definitely has much higher priority than this because it has high and increasing business value.
Based on my understanding, issue in this ticket is likely to be related to serial console handling of vm, which is likely to be related to broken NPTP settings on windows server. Windows 2019 can support persistent configuration of maximum 9 ports in NPTP, once exceeds, all settings will be lost. This setting aims to setup serial console redirection for vm, so if the configuration is lost, all openqa svirt-hyperv jobs will fail.
To solve the problem, either(at least work for virtualization tests), manually recover the NPTP settings, or as I gave in https://progress.opensuse.org/issues/105473#note-9, it can be added in test code to check and correct the NPTP settings. Virtualization team can work on this test code enhancement. But for us, this is a normal priority issue, and we can only work on it after other high priority tasks are done. Of course anyone interested in having this done earlier can contribute too.
Updated by openqa_review almost 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: lvm+RAID1@svirt-hyperv-uefi
https://openqa.suse.de/tests/8292191
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by openqa_review almost 3 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: minimal+base_yast@svirt-hyperv
https://openqa.suse.de/tests/8496177#step/system_prepare/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: minimal+base_yast@svirt-hyperv
https://openqa.suse.de/tests/8570379#step/system_prepare/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: lvm+RAID1@svirt-hyperv
https://openqa.suse.de/tests/8752318#step/validate_lvm_raid1/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 56 days if nothing changes in this ticket.
Updated by openqa_review over 2 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: minimal+base_yast@svirt-hyperv
https://openqa.suse.de/tests/9340996#step/first_boot/1
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released" or "EOL" (End-of-Life)
- The bugref in the openQA scenario is removed or replaced, e.g.
label:wontfix:boo1234
Expect the next reminder at the earliest in 112 days if nothing changes in this ticket.