Project

General

Profile

Actions

action #55100

closed

[hyperv] Need to delete ISO with issue when checksum does not match for ISO happens

Added by xlai over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Enhancement to existing tests
Target version:
-
Start date:
2019-08-05
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

All vmware&hyperv jobs in virtualization job group fail by similar error https://openqa.suse.de/tests/3204511#step/welcome/10.

Need to find why checksum does not match and fix it.

Actions #1

Updated by xlai over 4 years ago

  • Assignee set to xlai
Actions #2

Updated by xlai over 4 years ago

  • Subject changed from [vmware&hyperv] Checksum does not match for ISO to [hyperv] Checksum does not match for ISO
Actions #3

Updated by xlai over 4 years ago

One correction to description:

Vmware tests are not affected by it. Only hyperv tests do.

Actions #4

Updated by xlai over 4 years ago

Ssh to the windows server machine " ssh -X Administrator@openqaw7-hyperv.qa.suse.de" and manually check the checksum via "sha256sum C:\cache\SLE-12-SP5-Server-DVD-x86_64-Build0261-Media1.iso ", the checksum is wrong. So it should fail. Guess failure reason is that for whatever reason, the downloaded ISO file is incomplete/incorrect.

Workaround:

  • manually remove the ISO file from the windows server machine openqaw7-hyperv.qa.suse.de by "DEL C:\cache[iso file]"
  • mount the openqa nfs by "mount \openqa.suse.de\var\lib\openqa\share\factory N:"
  • copy the iso file to cache by cmd "copy /Z /Y N:\iso[iso file] C:\cache\"
  • make sure the checksum is correct by comparing cmd output "sha256sum C:\cache[iso file] " with openqa testsuite CHECKSUM_ISO setting
  • repeat above steps until checksum is correct

TODO: Automation enhancement:
Remove the downloaded ISOs when the checksum checking fails

Actions #5

Updated by xlai over 4 years ago

  • Category set to Enhancement to existing tests

Successful job after recovering ISO, https://openqa.nue.suse.com/tests/3238057#.

So change this ticket to have category 'enhancement in test'. Leave it to the coming owner of hyperv tests to enhance it.

Actions #6

Updated by xlai over 4 years ago

  • Subject changed from [hyperv] Checksum does not match for ISO to [hyperv] Need to delete ISO with issue when checksum does not match for ISO happens
Actions #7

Updated by xlai over 4 years ago

It keeps reproducing for new build runs, root cause is that the copy new build ISO is too slow and fails at first try, but later try only detects whether the needed iso exist in cache, if yes, skip copy again. So always with wrong ISO then.

To avoid it, need code change suggested earlier which deletes the incomplete iso when copy failed. Also if disk speed can be faster, then this issue should not happen.

Failure job https://openqa.nue.suse.com/tests/3243007 and key log:

31% copied 0 file(s) copied.
[2019-08-13T19:47:01.878 CEST] [debug] Command's stderr:
The semaphore timeout period has expired.
[2019-08-13T19:47:01.878 CEST] [debug] Command executed: 'if not exist C:\cache\SLE-12-SP5-Server-DVD-x86_64-Build0265-Media1.iso ( copy /Z /Y N:\iso\SLE-12-SP5-Server-DVD-x86_64-Build0265-Media1.iso C:\cache\ )', ret=1
[2019-08-13T19:47:02.123 CEST] [debug] Command on Hyper-V returned: 1
[2019-08-13T19:47:02.123 CEST] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/installation/bootloader_hyperv.pm:26 called testapi::console
[2019-08-13T19:47:02.123 CEST] [debug] <<< testapi::console(testapi_console='svirt')
[2019-08-13T19:47:02.123 CEST] [debug] /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/installation/bootloader_hyperv.pm:26 called backend::console_proxy::ANON
[2019-08-13T19:47:02.123 CEST] [debug] <<< backend::console_proxy::ANON(wrapped_call={
'function' => 'run_cmd',
'args' => [
'if not exist C:\cache\SLE-12-SP5-Server-DVD-x86_64-Build0265-Media1.iso ( copy /Z /Y N:\iso\fixed\SLE-12-SP5-Server-DVD-x86_64-Build0265-Media1.iso C:\cache\ )'
],
'console' => 'svirt'
})
[2019-08-13T19:47:02.125 CEST] [debug] <<< backend::svirt::run_cmd(Net::SSH2=SCALAR(0x564088cb1898)='if not exist C:\cache\SLE-12-SP5-Server-DVD-x86_64-Build0265-Media1.iso ( copy /Z /Y N:\iso\fixed\SLE-12-SP5-Server-DVD-x86_64-Build0265-Media1.iso C:\cache\ )')
[2019-08-13T19:47:02.141 CEST] [debug] Command executed: 'if not exist C:\cache\SLE-12-SP5-Server-DVD-x86_64-Build0265-Media1.iso ( copy /Z /Y N:\iso\fixed\SLE-12-SP5-Server-DVD-x86_64-Build0265-Media1.iso C:\cache\ )', ret=0
[2019-08-13T19:47:02.347 CEST] [debug] Command on Hyper-V returned: 0

Actions #8

Updated by xlai over 4 years ago

xlai wrote:

It keeps reproducing for new build runs, root cause is that the copy new build ISO is too slow and fails at first try, but later try only detects whether the needed iso exist in cache, if yes, skip copy again. So always with wrong ISO then.

To avoid it, need code change suggested earlier which deletes the incomplete iso when copy failed. Also if disk speed can be faster, then this issue should not happen.

Failure job https://openqa.nue.suse.com/tests/3243007 and key log:

31% copied 0 file(s) copied.
[2019-08-13T19:47:01.878 CEST] [debug] Command's stderr:
The semaphore timeout period has expired.

I find out a way to avoid above semaphore timeout expired error when copying by disable firewall. With this , copy does not fail any more. So the automation enhancement to delete iso with checksum error is not urgent any more. Will leave it to new comer replacing michal.

Disable firewall cmd on windows:

To Turn Off:
NetSh Advfirewall set allprofiles state off.
To check the status of Windows Firewall:
Netsh Advfirewall show allprofiles.

Actions #9

Updated by xlai over 4 years ago

xlai wrote:

xlai wrote:

It keeps reproducing for new build runs, root cause is that the copy new build ISO is too slow and fails at first try, but later try only detects whether the needed iso exist in cache, if yes, skip copy again. So always with wrong ISO then.

To avoid it, need code change suggested earlier which deletes the incomplete iso when copy failed. Also if disk speed can be faster, then this issue should not happen.

Failure job https://openqa.nue.suse.com/tests/3243007 and key log:

31% copied 0 file(s) copied.
[2019-08-13T19:47:01.878 CEST] [debug] Command's stderr:
The semaphore timeout period has expired.

I find out a way to avoid above semaphore timeout expired error when copying by disable firewall. With this , copy does not fail any more. So the automation enhancement to delete iso with checksum error is not urgent any more. Will leave it to new comer replacing michal.

Disable firewall cmd on windows:

To Turn Off:
NetSh Advfirewall set allprofiles state off.
To check the status of Windows Firewall:
Netsh Advfirewall show allprofiles.

Well, although firewall disabled on hyperv servers, but download issues still happen from time to time -- build 283 pass while build 263 met the issue. So make the code enhancement to let test delete iso with issues so that rerun test or new test can start download again without manual deletion of iso on windows server.

PR https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/8221

Actions #10

Updated by xlai over 4 years ago

  • Status changed from New to Resolved
Actions #11

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: textmode_svirt@svirt-hyperv2012r2-uefi
https://openqa.suse.de/tests/3449500

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions

Also available in: Atom PDF