action #60662
closed[SLE][Migration][SLE15SP2][Regression] Some regression tests fails on ppc64le - /dev/hvc0: permission denied
100%
Description
Observation¶
openQA test in scenario sle-15-SP2-Regression-on-Migration-from-SLE15-SPX-to-SLE15-SP2-ppc64le-online_sles15_pscc_Addons-basesys-srv-desk-dev-contm-lgm-pcm-wsm_all_full@ppc64le fails in
curl_https
Test suite description¶
Reproducible¶
Fails since (at least) Build 80.5
Further details¶
Always latest result in this scenario: latest
Updated by hjluo over 4 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 20
rerun it in 108.1 to check it it still hit this issue.
https://openqa.suse.de/tests/3739548 this case failed at zypper migration
reason: proxy scc is not working during zypper migration
Updated by hjluo over 4 years ago
- Status changed from In Progress to Blocked
- % Done changed from 20 to 30
migration was not completed and can't reach curl_https module.
Updated by hjluo over 4 years ago
another one: https://openqa.suse.de/tests/3739608
Updated by hjluo over 4 years ago
skip module install_service and check_upgraded_service and rerun https://openqa.suse.de/tests/375091
Updated by hjluo over 4 years ago
- % Done changed from 30 to 40
it looks like the connection that curl to https://eu.httpbin.org was timedout sometimes.
curl -f -v https://eu.httpbin.org/get failed: script timeout: curl -f -v https://httpbin.org/get 2>&1 at /usr/lib/os-autoinst/testapi.pm line 1091.
[2019-12-04T08:45:27.131 CET] [debug] Maybe the network is busy. Retry: 7 of 7
[2019-12-04T08:45:27.131 CET] [debug] curl failed (with retries) at /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/console/curl_https.pm line 36.
curl_https-dmesg.log:[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinux-5.3.13-1-default root=UUID=567b5484-b2b4-40b4-85a1-b19d7550fc9a plymouth.ignore-serial-consoles console=hvc0 console=tty splash=silent showopts crashkernel=640M
Updated by hjluo over 4 years ago
now it passed on 108.1 build: https://openqa.suse.de/tests/3750916#step/curl_https/4
Updated by hjluo over 4 years ago
try console=tty console=hvc0 in kernel boot parameters
Updated by tinawang123 over 4 years ago
Related test case:
https://openqa.suse.de/tests/3767728#step/curl_https/5
Updated by hjluo over 4 years ago
- % Done changed from 40 to 50
now it was blocked by patching it.
https://openqa.suse.de/tests/3783366#step/zypper_migration/21
https://openqa.suse.de/tests/3787514#step/zypper_migration/21
Updated by leli over 4 years ago
- Subject changed from [SLE][Migration][SLE15SP2]test fails in curl_https - permission denied to [SLE][Migration][SLE15SP2][Regression]test fails in curl_https - permission denied
Updated by hjluo about 4 years ago
the test case name is online_sles15_pscc_lp-basesys-srv-desk-dev-contm-lgm-wsm_all_full on ppc64le. On latest build 132.1 I saw this case passed.
and in 132.1 blocked at zypper migration for poo#63184
Updated by hjluo about 4 years ago
- Status changed from Blocked to Resolved
- % Done changed from 50 to 100
till 142.1 didn't hit this issue so we'll close this by now and will reopen it once the test case hit this case again.
Updated by tinawang123 about 4 years ago
- Status changed from Resolved to In Progress
Updated by hjluo about 4 years ago
- % Done changed from 100 to 60
actually it's not the permission issue, just because the shell output is not fully returned.
Updated by hjluo about 4 years ago
verify run with enlarging timeout https://openqa.nue.suse.com/tests/3975555 passed.
and the previous failed one is: http://openqa.suse.de/tests/3971265
Updated by hjluo about 4 years ago
- Subject changed from [SLE][Migration][SLE15SP2][Regression]test fails in curl_https - permission denied to [SLE][Migration][SLE15SP2][Regression] Some regression tests fails on ppc64le - /dev/hvc0: permission denied
This kind of issue includes curl_https,salt,mtab, zypper_lifecycle,keymap_or_locale_x11,sshxterm
Updated by coolgw about 4 years ago
- Related to action #60935: [SLE][Migration][SLE15SP2][Regression]test fails in salt - salt-run timeout added
Updated by okurz about 4 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: offline_sles15sp1_pscc_lp-basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full
https://openqa.suse.de/tests/4042033
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by hjluo about 4 years ago
- % Done changed from 60 to 70
Updated by hjluo about 4 years ago
in the logfile we can see the console wait time of 3o is not enough
https://openqa.nue.suse.com/tests/4042033/file/autoinst-log.txt
[2020-03-26T16:02:31.936 CET] [debug] tests/console/curl_https.pm:24 called testapi::select_console -> lib/susedistribution.pm:884 called testapi::assert_screen
[2020-03-26T16:02:31.936 CET] [debug] <<< testapi::assert_screen(mustmatch="user-console", no_wait=1, timeout=30)
[2020-03-26T16:02:32.050 CET] [debug] >>> testapi::_handle_found_needle: found user-console-20191207, similarity 1.00 @ 145/210
[2020-03-26T16:02:32.050 CET] [debug] tests/console/curl_https.pm:29 called (eval) -> tests/console/curl_https.pm:30 called testapi::validate_script_output
[2020-03-26T16:02:32.050 CET] [debug] <<< testapi::type_string(string="cat - > /tmp/script4V9zF.sh;\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2020-03-26T16:02:35.132 CET] [debug] tests/console/curl_https.pm:29 called (eval) -> tests/console/curl_https.pm:30 called testapi::validate_script_output
[2020-03-26T16:02:35.133 CET] [debug] <<< testapi::type_string(string="curl -f -v https://httpbin.org/get 2>&1\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2020-03-26T16:02:39.327 CET] [debug] tests/console/curl_https.pm:29 called (eval) -> tests/console/curl_https.pm:30 called testapi::validate_script_output
[2020-03-26T16:02:39.328 CET] [debug] <<< testapi::send_key(key="ctrl-d", wait_screen_change=0, do_wait=0)
[2020-03-26T16:02:39.929 CET] [debug] tests/console/curl_https.pm:29 called (eval) -> tests/console/curl_https.pm:30 called testapi::validate_script_output
[2020-03-26T16:02:39.929 CET] [debug] <<< testapi::type_string(string="(echo 4V9zF; bash -eox pipefail /tmp/script4V9zF.sh ; echo SCRIPT_FINISHED4V9zF-\$?-) | tee /dev/hvc0\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2020-03-26T16:02:51.402 CET] [debug] tests/console/curl_https.pm:29 called (eval) -> tests/console/curl_https.pm:30 called testapi::validate_script_output
[2020-03-26T16:02:51.403 CET] [debug] <<< testapi::wait_serial(quiet=undef, regexp="SCRIPT_FINISHED4V9zF-\d+-", timeout=30, expect_not_found=0, no_regex=0, buffer_size=undef, record_output=1)
[2020-03-26T16:03:22.536 CET] [debug] >>> testapi::wait_serial: SCRIPT_FINISHED4V9zF-\d+-: fail
[2020-03-26T16:03:22.537 CET] [debug] curl -f -v https://eu.httpbin.org/get failed: script timeout: curl -f -v https://httpbin.org/get 2>&1 at /usr/lib/os-autoinst/testapi.pm line 1104.
[2020-03-26T16:03:22.537 CET] [debug] Maybe the network is busy. Retry: 1 of 7
[2020-03-26T16:03:22.537 CET] [debug] tests/console/curl_https.pm:29 called (eval) -> tests/console/curl_https.pm:30 called testapi::validate_script_output
[2020-03-26T16:03:22.537 CET] [debug] <<< testapi::type_string(string="cat - > /tmp/script4V9zF.sh;\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2020-03-26T16:03:25.606 CET] [debug] tests/console/curl_https.pm:29 called (eval) -> tests/console/curl_https.pm:30 called testapi::validate_script_output
[2020-03-26T16:03:25.607 CET] [debug] <<< testapi::type_string(string="curl -f -v https://httpbin.org/get 2>&1\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2020-03-26T16:03:29.782 CET] [debug] tests/console/curl_https.pm:29 called (eval) -> tests/console/curl_https.pm:30 called testapi::validate_script_output
[2020-03-26T16:03:29.782 CET] [debug] <<< testapi::send_key(key="ctrl-d", wait_screen_change=0, do_wait=0)
[2020-03-26T16:03:30.383 CET] [debug] tests/console/curl_https.pm:29 called (eval) -> tests/console/curl_https.pm:30 called testapi::validate_script_output
[2020-03-26T16:03:30.383 CET] [debug] <<< testapi::type_string(string="(echo 4V9zF; bash -eox pipefail /tmp/script4V9zF.sh ; echo SCRIPT_FINISHED4V9zF-\$?-) | tee /dev/hvc0\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2020-03-26T16:03:41.870 CET] [debug] tests/console/curl_https.pm:29 called (eval) -> tests/console/curl_https.pm:30 called testapi::validate_script_output
[2020-03-26T16:03:41.870 CET] [debug] <<< testapi::wait_serial(quiet=undef, timeout=30, regexp="SCRIPT_FINISHED4V9zF-\d+-", record_output=1, buffer_size=undef, expect_not_found=0, no_regex=0)
[2020-03-26T16:04:12.999 CET] [debug] >>> testapi::wait_serial: SCRIPT_FINISHED4V9zF-\d+-: fail
[2020-03-26T16:04:13.000 CET] [debug] curl -f -v https://eu.httpbin.org/get failed: script timeout: curl -f -v https://httpbin.org/get 2>&1 at /usr/lib/os-autoinst/testapi.pm line 1104.
Updated by hjluo about 4 years ago
- % Done changed from 70 to 80
now we use the script_run to avoid the console output which may be caused by the valide_script_output. with new fix it will works even
we hit the /dev/hvc0 permission denied.
http://149.44.176.58/tests/4072395#step/curl_https/8 ==>permission denied
http://149.44.176.58/tests/4072613#step/curl_https/2 ==>no permission denied
Updated by coolgw about 4 years ago
also happen on curl
https://openqa.nue.suse.com/tests/4103400#
Updated by hjluo about 4 years ago
Oliver asked for a change request
'There are already many test modules that call ensure_serialdev_permissions, not every unrelated test module should do the same. In your test run already https://openqa.nue.suse.com/tests/4079380#step/system_prepare/8 does the necessary parts. If something has gone missing than this is most likely about either loosing the group for the user or using a different user. This should be fixed rather than putting architecture specific single fixes in arbitrary test modules.'
we need check where we 'loosing the group for the user or using a different user' after the system_prepare.
Updated by hjluo about 4 years ago
clone and run without all service_check to see if our services have missed up the permission on ppc64le.
https://openqa.nue.suse.com/tests/4114809.
this case passed without /dev/hvc0 permission denied when select_console 'user-console'.
Updated by hjluo about 4 years ago
clone without install_service and check_upgraded_service and it passed without permission denied issue in /dev/hvc0, so we think the
issue was triggered by install_service and check_upgraded_service, then we re-clone it to just include install_service to see the result
the job is: http://openqa.suse.de/t4116232 and it passed.
Updated by hjluo about 4 years ago
run with both install_service and check_upgraded_service added https://openqa.nue.suse.com/tests/4116299
result curl_https and matab failed with /dev/hvc0 permission denied.
Updated by hjluo about 4 years ago
run with delete select_console 'root-console' in apparmor.pm in service check.
https://openqa.nue.suse.com/tests/4116300 FAILED.
Updated by hjluo about 4 years ago
run with delete of ensure_serialdev_permissions in keymap_or_locale.pm http://openqa.suse.de/t4117050, still failed.
Updated by hjluo about 4 years ago
Run with remove select_console 'root-console' in check_upgraded_service and INCLUDE_SERVICES=apache2 http://openqa.suse.de/t4122658
Updated by hjluo about 4 years ago
run with exclude_services=kdump http://openqa.suse.de/t4122649 PASSED,
rerun another one with EXCLUDE_SERVICES=kdump http://openqa.suse.de/t4123407 PASSED
so the kdump in check_upgraded_service was the culprit.¶
EXCLUDE_SERVICES=yast2-nfs-server,kdump http://openqa.suse.de/t4123042 PASSED
RUN with fix in lib/kdump_utils.pm http://openqa.suse.de/t4123495 failed at system_prepare.
http://openqa.suse.de/t4127984 FAILED
Updated by hjluo about 4 years ago
Now rerun with fix in kdump
http://openqa.nue.suse.com/t4159973
http://openqa.nue.suse.com/t4160279 PASSED which restore the serialdev_permission after panic reboot.
check_function();
if ($stage ne 'before') {
ensure_serialdev_permissions;
}
Updated by hjluo about 4 years ago
AS Gaowei suggested to run just kdump in service_check to verify that kdump is the culprit of this mess.
barry:~/:[0]# /usr/share/openqa/script/clone_job.pl --from http://openqa.suse.de --host http://openqa.suse.de 4152563 --apikey XXXX --apisecret XXXX _GROUP=0 --skip-download INCLUDE_SERVICES=kdump
Created job #4159854: sle-15-SP2-Regression-on-Migration-from-SLE15-SPX-to-SLE15-SP2-ppc64le-Build181.4-offline_sles15sp1_pscc_lp-basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full@ppc64le -> http://openqa.suse.de/t4159854
barry:~/:[0]# /usr/share/openqa/script/clone_job.pl --from http://openqa.suse.de --host http://openqa.suse.de 4154324 --apikey XXXX --apisecret XXXXX _GROUP=0 --skip-download INCLUDE_SERVICES=kdump
Created job #4159855: sle-15-SP2-Regression-on-Migration-from-SLE15-SPX-to-SLE15-SP2-ppc64le-Build181.4-offline_sles15_pscc_lp-basesys-srv-desk-dev-contm-lgm-wsm_all_full@ppc64le -> http://openqa.suse.de/t4159855
and the results VERIFIED that kdump is the culprit for the permission denied for /dev/hvc0
http://149.44.176.58/tests/4159854#step/curl_https/5
http://149.44.176.58/tests/4159855#step/curl_https/5
Updated by hjluo about 4 years ago
rerun with fix for kdump after 2nd reboot.
http://openqa.nue.suse.com/t4165103
http://openqa.nue.suse.com/t4165104
Updated by hjluo about 4 years ago
rerun with debugging code:
- script_run 'ls -l /dev/hvc*'; check_function(); +
- if ($stage ne 'before') {
- script_run 'ls -l /dev/hvc*';
- select_console 'user-console';
http://openqa.nue.suse.com/t4165135
http://openqa.nue.suse.com/t4165136
you can see from http://149.44.176.58/tests/4165136#step/check_upgraded_service/5, ->right owner
before the system panic which was issued by kdump, before system crash it's owner is right.
but after the reboot the owner back to root. which caused select_console 'user-console' failed.
this debugging code reproduced the issue.
http://149.44.176.58/tests/4165136#step/check_upgraded_service/31 -->root /dev/hvc0
http://149.44.176.58/tests/4165136#step/check_upgraded_service/37 -->select_console 'user-console'
Updated by hjluo almost 4 years ago
Updated by hjluo almost 4 years ago
- Status changed from In Progress to Resolved
- % Done changed from 80 to 100
PR merged.