action #104304: Crosscheck results of https://github.com/os-autoinst/os-autoinst#verifying-a-runtime-environment on arm-1/2/3 vs. arm-4/5 to find out if arm-4/5 are "typing stable" size:M - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

action #104304

closed

openQA Project (public) - coordination #101048: [epic] Investigate and fix higher instability of openqaworker-arm-4/5 vs. arm-1/2/3

Crosscheck results of https://github.com/os-autoinst/os-autoinst#verifying-a-runtime-environment on arm-1/2/3 vs. arm-4/5 to find out if arm-4/5 are "typing stable" size:M

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:

Resolved

Priority:

High

Assignee:

mkittler

Category:

Target version:

openQA Project (public) - Ready

Start date:

2021-12-22

Due date:

% Done:

Estimated time:

Description

Acceptance criteria¶

AC1: Reliable results exist from all arm-1/2/3 and arm-4/5 from https://github.com/os-autoinst/os-autoinst#verifying-a-runtime-environment with both KVM and without

Suggestions¶

Run https://github.com/os-autoinst/os-autoinst#verifying-a-runtime-environment, e.g. using for i in 1 2 3 4 5; do ssh openqaworker-arm-$i "…"; :one and compare results
Try to extend the command with access to KVM, e.g. podman -v /dev/kvm … and patch out https://github.com/os-autoinst/os-autoinst/blob/master/t/99-full-stack.t#L38 "QEMU_NO_KVM" : "1", on the fly, e.g. using sed -i '/QEMU_NO_KVM/d' /opt/t/99-full-stack.t on the podman command line. Compare results to the above w/o KVM
As necessary run multiple times, e.g. use https://github.com/okurz/scripts/blob/master/count_fail_ratio in combination with the above commands
Make sense of all the results what this means regarding typing stability of arm-4+5 :)

Actions

Copy link

Updated by okurz over 3 years ago

Project changed from openQA Project (public) to openQA Infrastructure (public)
Target version set to Ready

Actions

Copy link

Updated by okurz over 3 years ago

Subject changed from Crosscheck results of https://github.com/os-autoinst/os-autoinst#verifying-a-runtime-environment on arm-1/2/3 vs. arm-4/5 to find out if arm-4/5 are "typing stable" to Crosscheck results of https://github.com/os-autoinst/os-autoinst#verifying-a-runtime-environment on arm-1/2/3 vs. arm-4/5 to find out if arm-4/5 are "typing stable" size:M
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by mkittler over 3 years ago

Assignee set to mkittler

Actions

Copy link

Updated by mkittler over 3 years ago

Status changed from Workable to In Progress

I've not come very far:

martchus@openqa:~> sudo salt 'openqaworker-arm*' cmd.run 'bash -c "ok=0; for i in {1..5}; do echo \"i: \$i\" && podman run --pull=always --rm -it --entrypoint \"\" registry.opensuse.org/devel/openqa/containers/os-autoinst_dev:latest /bin/sh -c \"git -C /opt clone --depth 1 https://github.com/os-autoinst/os-autoinst && make -C /opt/os-autoinst/ test-perl-testsuite TESTS=t/99-full-stack.t\" && ok=$((ok + 1)); done; echo \"ok: \$ok of 5\" "'
openqaworker-arm-2.suse.de:
    i: 1
    Trying to pull registry.opensuse.org/devel/openqa/containers/os-autoinst_dev:latest...
    Getting image source signatures
    Copying blob sha256:32d063c8d4f5f084c9410a80820cbdbcea5072ad6960ccf906d74eb4c910b485
    Copying blob sha256:0c9b09d8f0b99b7a3c2d3f9eb86e60098ec27e3f3a9f76ee12cfe66f8569db6a
    Copying config sha256:1dc3c70f60a100e83d2ae29414f4ade5f268cf4771ec2fe9307b55ac477979c9
    Writing manifest to image destination
    Storing signatures
    standard_init_linux.go:228: exec user process caused: exec format error

It works locally with Docker. I suppose the podman setup is broken on those workers.

Actions

Copy link

Updated by openqa_review over 3 years ago

Due date set to 2022-01-25

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

Updated by mkittler over 3 years ago

The problem was that we ceased to build the container image for aarch64. I re-enabled the build and with the current image it generally runs. However, I've ran into the next issue:

# [2022-01-11T11:42:55.496453Z] [debug] <<< testapi::assert_script_run(cmd="wget -q http://10.0.2.2:15223/nOlEDGIpPA/files/modified.xml", fail_message="", quiet=undef, timeout=90)
# [2022-01-11T11:42:55.496999Z] [debug] tests/modify_and_upload_file.pm:33 called testapi::assert_script_run
# [2022-01-11T11:42:55.497679Z] [debug] <<< testapi::type_string(string="wget -q http://10.0.2.2:15223/nOlEDGIpPA/files/modified.xml", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
# [2022-01-11T11:42:57.796942Z] [debug] tests/modify_and_upload_file.pm:33 called testapi::assert_script_run
# [2022-01-11T11:42:57.797923Z] [debug] <<< testapi::type_string(string="; echo 8I~_t-\$?- > /dev/ttyS0\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
# [2022-01-11T11:42:59.034733Z] [debug] tests/modify_and_upload_file.pm:33 called testapi::assert_script_run
# [2022-01-11T11:42:59.035608Z] [debug] <<< testapi::wait_serial(regexp=qr/8I~_t-\d+-/u, record_output=undef, buffer_size=undef, timeout=90, quiet=undef, no_regex=0, expect_not_found=0)
# [2022-01-11T11:43:01.330864Z] [debug] >>> testapi::wait_serial: (?^u:8I~_t-\d+-): ok
# [2022-01-11T11:43:01.620512Z] [info] ::: basetest::runtest: # Test died: command 'wget -q http://10.0.2.2:15223/nOlEDGIpPA/files/modified.xml' failed at /opt/os-autoinst/t/data//tests/tests/modify_and_upload_file.pm line 33.

This exact same error is happening consistently on all ARM workers I've tested it so far on (but does not happen locally or on an x86_64 worker¹).

¹ In the good case the test log looks like this:

[2022-01-11T11:26:37.058618Z] [debug] ||| starting modify_and_upload_file tests/modify_and_upload_file.pm
[2022-01-11T11:26:37.221007Z] [debug] tests/modify_and_upload_file.pm:24 called testapi::get_test_data
[2022-01-11T11:26:37.221396Z] [debug] <<< testapi::get_test_data(path="/opt/os-autoinst/t/data//tests/data/autoinst.xml")
[2022-01-11T11:26:37.227357Z] [debug] tests/modify_and_upload_file.pm:26 called testapi::type_string
[2022-01-11T11:26:37.228021Z] [debug] <<< testapi::type_string(string="echo get_test_data returned expected file\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-11T11:26:38.707440Z] [debug] tests/modify_and_upload_file.pm:31 called testapi::save_tmp_file
[2022-01-11T11:26:38.707822Z] [debug] <<< testapi::save_tmp_file(path="modified.xml")
[2022-01-11T11:26:38.708649Z] [debug] tests/modify_and_upload_file.pm:33 called testapi::assert_script_run
[2022-01-11T11:26:38.709230Z] [debug] <<< testapi::assert_script_run(cmd="wget -q http://10.0.2.2:15223/brRAHMeVWk/files/modified.xml", quiet=undef, timeout=90, fail_message="")
[2022-01-11T11:26:38.709738Z] [debug] tests/modify_and_upload_file.pm:33 called testapi::assert_script_run
[2022-01-11T11:26:38.710503Z] [debug] <<< testapi::type_string(string="wget -q http://10.0.2.2:15223/brRAHMeVWk/files/modified.xml", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-11T11:26:40.976618Z] [debug] tests/modify_and_upload_file.pm:33 called testapi::assert_script_run
[2022-01-11T11:26:40.977477Z] [debug] <<< testapi::type_string(string="; echo mzNDB-\$?- > /dev/ttyS0\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-11T11:26:42.173453Z] [debug] tests/modify_and_upload_file.pm:33 called testapi::assert_script_run
[2022-01-11T11:26:42.173889Z] [debug] <<< testapi::wait_serial(regexp=qr/mzNDB-\d+-/u, no_regex=0, timeout=90, quiet=undef, record_output=undef, buffer_size=undef, expect_not_found=0)
[2022-01-11T11:26:44.275461Z] [debug] >>> testapi::wait_serial: (?^u:mzNDB-\d+-): ok
[2022-01-11T11:26:44.275689Z] [debug] tests/modify_and_upload_file.pm:34 called testapi::script_run
[2022-01-11T11:26:44.275916Z] [debug] <<< testapi::script_run(cmd="echo '72d2c15cb10535f36862d7d2eecc8a79  modified.xml' > modified.md5", timeout=30, quiet=undef, output="", die_on_timeout=-1)
[2022-01-11T11:26:44.276071Z] [debug] tests/modify_and_upload_file.pm:34 called testapi::script_run
[2022-01-11T11:26:44.276295Z] [debug] <<< testapi::type_string(string="echo '72d2c15cb10535f36862d7d2eecc8a79  modified.xml' > modified.md5", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-11T11:26:46.705473Z] [debug] tests/modify_and_upload_file.pm:34 called testapi::script_run
[2022-01-11T11:26:46.706233Z] [debug] <<< testapi::type_string(string="; echo ~liyM-\$?- > /dev/ttyS0\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-11T11:26:47.868134Z] [debug] tests/modify_and_upload_file.pm:34 called testapi::script_run
[2022-01-11T11:26:47.868980Z] [debug] <<< testapi::wait_serial(no_regex=0, regexp=qr/~liyM-\d+-/u, quiet=undef, timeout=30, record_output=undef, expect_not_found=0, buffer_size=undef)
[2022-01-11T11:26:48.961609Z] [debug] >>> testapi::wait_serial: (?^u:~liyM-\d+-): ok
[2022-01-11T11:26:48.961772Z] [warn] !!! testapi::script_run: DEPRECATED call of script_run() in tests/modify_and_upload_file.pm:34 add `die_on_timeout => ?` to avoid this warning
[2022-01-11T11:26:48.961938Z] [debug] tests/modify_and_upload_file.pm:35 called testapi::assert_script_run
[2022-01-11T11:26:48.962134Z] [debug] <<< testapi::assert_script_run(cmd="md5sum -c modified.md5", quiet=undef, timeout=90, fail_message="")
[2022-01-11T11:26:48.962285Z] [debug] tests/modify_and_upload_file.pm:35 called testapi::assert_script_run
[2022-01-11T11:26:48.962461Z] [debug] <<< testapi::type_string(string="md5sum -c modified.md5", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-11T11:26:49.725770Z] [debug] tests/modify_and_upload_file.pm:35 called testapi::assert_script_run
[2022-01-11T11:26:49.726529Z] [debug] <<< testapi::type_string(string="; echo ib~GQ-\$?- > /dev/ttyS0\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-11T11:26:50.912502Z] [debug] tests/modify_and_upload_file.pm:35 called testapi::assert_script_run
[2022-01-11T11:26:50.912916Z] [debug] <<< testapi::wait_serial(record_output=undef, expect_not_found=0, buffer_size=undef, regexp=qr/ib~GQ-\d+-/u, no_regex=0, quiet=undef, timeout=90)
[2022-01-11T11:26:52.013581Z] [debug] >>> testapi::wait_serial: (?^u:ib~GQ-\d+-): ok
[2022-01-11T11:26:52.013763Z] [debug] tests/modify_and_upload_file.pm:37 called testapi::type_string
[2022-01-11T11:26:52.013987Z] [debug] <<< testapi::type_string(string="echo save_tmp_file returned expected file\n", max_interval=250, wait_screen_changes=0, wait_still_screen=0, timeout=30, similarity_level=47)
[2022-01-11T11:26:53.521354Z] [debug] ||| finished modify_and_upload_file tests (runtime: 16 s)

The errors

[2022-01-11T11:26:21.791426Z] [warn] !!! consoles::VNC::login: Error connecting to VNC server <novnc.nowhere:5901>: IO::Socket::INET: Bad hostname 'novnc.nowhere'

in the previous test module are also happening in the good case.

Actions

Copy link

Updated by mkittler over 3 years ago

The test actually passes on arm 4 and 5 (and only fails on arm 1, 2 and 3). The results are at least consistent within each host (I've run the test at least 5 times on each host).

Actions

Copy link

Updated by mkittler over 3 years ago

Try to extend the command with access to KVM…

I don't think that will lead anywhere because the image we're using in the fullstack test is i368 which we really need to emulate on aarch64. Also note that "QEMU_NO_KVM" : "1" is no longer present in the test settings so there's nothing to patch out.

Actions

Copy link

Updated by mkittler over 3 years ago

Out of curiosity I've just tried adding -v /dev/kvm to podman args which in turn leads to -enable-kvm being added to QEMU args which leads to the error qemu-system-i386: invalid accelerator kvm.

Actions

Copy link

#10

Updated by mkittler over 3 years ago

I give up on figuring out why it isn't working on arm-1, 2 and 3. We mainly want to find out whether the typing problems we see on arm-4 and 5 in production can be reproduced with the os-autoinst fullstack test. So far I couldn't reproduce a single occurrence. I'll let it run in a loop 10 more times on each host¹. As stated before, using KVM isn't possible using this approach. (There would be arm ports of tinycore but using them is likely out of scope for this issue.)

¹ Still couldn't reproduce it.

Actions

Copy link

#11

Updated by mkittler over 3 years ago

Status changed from In Progress to Feedback

Still not reproducible after 60 more runs. I suppose running the fullstack test is not a way to reproduce typing issues.

Actions

Copy link

#12

Updated by mkittler over 3 years ago

Status changed from Feedback to Resolved

I'd like to conclude this ticket. I cannot exclude that the typing issues are happening more often when KVM is enabled. That actually seems likely considering the typing issues could not be reproduced without KVM. Maybe it isn't actually KVM but simply the faster execution is provides.

Actions

Copy link

#13

Updated by okurz over 3 years ago

Please provide an update in the parent epic what this means and what we should do as next step

Actions

Copy link

#14

Updated by okurz over 3 years ago

Due date deleted (~~2022-01-25~~)

Actions

Copy link

#15

Updated by okurz over 3 years ago

Priority changed from Normal to High

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #104304

Crosscheck results of https://github.com/os-autoinst/os-autoinst#verifying-a-runtime-environment on arm-1/2/3 vs. arm-4/5 to find out if arm-4/5 are "typing stable" size:M

Acceptance criteria¶

Suggestions¶

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by mkittler over 3 years ago

Updated by mkittler over 3 years ago

Updated by openqa_review over 3 years ago

Updated by mkittler over 3 years ago

Updated by mkittler over 3 years ago

Updated by mkittler over 3 years ago

Updated by mkittler over 3 years ago

Updated by mkittler over 3 years ago

Updated by mkittler over 3 years ago

Updated by mkittler over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago

Updated by okurz over 3 years ago