action #45650
closed[functional][u][aarch64] test fails in first_boot because of username letters not written in capital case during installation user_settings
0%
Description
Observation¶
openQA test in scenario opensuse-Tumbleweed-DVD-aarch64-create_hdd_gnome@aarch64 fails in
first_boot
Reproducible¶
Fails since (at least) Build 20181231
Expected result¶
Last good: 20181224 (or more recent)
Further details¶
Always latest result in this scenario: latest
The problem is earlier in installer where the M and W are not typed as upper cases in username: https://openqa.opensuse.org/tests/822544#step/user_settings/2
This is probably related to the worker (which is running Leap 15.0) since a local test in os-autoinst with an up-to-date Tumbleweed aarch64 host works fine.
Files
Updated by okurz almost 6 years ago
- File snapper_diff_109..115 snapper_diff_109..115 added
See attached the difference of the two root filesystem snapshots "109" from 2018-12-23 and the current one. The complete change of packages between the oldest still available snapshot 105 from 2018-12-19 and the current one is:
# diff -Naur <(rpm -r /.snapshots/105/snapshot/ -qa) <(rpm -qa) | grep '^[+-]' | sort
+++ /dev/fd/62 2019-01-02 17:07:26.626769544 +0100
--- /dev/fd/63 2019-01-02 17:07:26.626769544 +0100
+dracut-044.1-lp150.14.12.1.aarch64
-dracut-044.1-lp150.14.9.1.aarch64
-git-core-2.16.4-lp150.2.6.1.aarch64
+git-core-2.16.4-lp150.2.9.1.aarch64
-git-gui-2.16.4-lp150.2.6.1.aarch64
+git-gui-2.16.4-lp150.2.9.1.aarch64
-gitk-2.16.4-lp150.2.6.1.aarch64
+gitk-2.16.4-lp150.2.9.1.aarch64
+grub2-2.02-lp150.13.10.1.aarch64
-grub2-2.02-lp150.13.7.1.aarch64
+grub2-arm64-efi-2.02-lp150.13.10.1.aarch64
-grub2-arm64-efi-2.02-lp150.13.7.1.aarch64
+grub2-snapper-plugin-2.02-lp150.13.10.1.noarch
-grub2-snapper-plugin-2.02-lp150.13.7.1.noarch
+grub2-systemd-sleep-plugin-2.02-lp150.13.10.1.noarch
-grub2-systemd-sleep-plugin-2.02-lp150.13.7.1.noarch
-libbluetooth3-5.48-lp150.4.3.1.aarch64
+libbluetooth3-5.48-lp150.4.6.1.aarch64
-libfreebl3-3.36.6-lp150.2.6.1.aarch64
+libfreebl3-3.40.1-lp150.2.10.2.aarch64
-libgcrypt20-1.8.2-lp150.5.3.1.aarch64
+libgcrypt20-1.8.2-lp150.5.6.1.aarch64
-libhogweed4-3.4-lp150.2.3.aarch64
+libhogweed4-3.4-lp150.3.3.1.aarch64
-libmtp9-1.1.15-lp150.1.2.aarch64
+libmtp9-1.1.16-lp150.2.3.1.aarch64
-libmtp-udev-1.1.15-lp150.1.2.aarch64
+libmtp-udev-1.1.16-lp150.2.3.1.aarch64
-libnettle6-3.4-lp150.2.3.aarch64
+libnettle6-3.4-lp150.3.3.1.aarch64
-libsoftokn3-3.36.6-lp150.2.6.1.aarch64
+libsoftokn3-3.40.1-lp150.2.10.2.aarch64
-mozilla-nspr-4.19-lp150.1.2.aarch64
+mozilla-nspr-4.20-lp150.2.3.1.aarch64
-mozilla-nss-3.36.6-lp150.2.6.1.aarch64
+mozilla-nss-3.40.1-lp150.2.10.2.aarch64
-mozilla-nss-certs-3.36.6-lp150.2.6.1.aarch64
+mozilla-nss-certs-3.40.1-lp150.2.10.2.aarch64
-nfs-client-2.1.1-lp150.4.3.1.aarch64
+nfs-client-2.1.1-lp150.4.6.1.aarch64
-openQA-client-4.6.1545139030.66c0b50f-lp150.1023.1.noarch
+openQA-client-4.6.1545406149.53968c1e-lp150.1054.1.noarch
-openQA-common-4.6.1545139030.66c0b50f-lp150.1023.1.noarch
+openQA-common-4.6.1545406149.53968c1e-lp150.1054.1.noarch
-openQA-worker-4.6.1545139030.66c0b50f-lp150.1023.1.noarch
+openQA-worker-4.6.1545406149.53968c1e-lp150.1054.1.noarch
-os-autoinst-4.5.1544691921.44e93d8d-lp150.8.1.aarch64
+os-autoinst-4.5.1545369866.fc084a6a-lp150.16.1.aarch64
-ovmf-2017+git1510945757.b2662641d5-lp150.4.6.1.aarch64
+ovmf-2017+git1510945757.b2662641d5-lp150.4.9.1.aarch64
-perl-5.26.1-lp150.6.3.1.aarch64
+perl-5.26.1-lp150.6.6.1.aarch64
-perl-base-5.26.1-lp150.6.3.1.aarch64
+perl-base-5.26.1-lp150.6.6.1.aarch64
-perl-Bootloader-0.921-lp150.3.3.1.aarch64
+perl-Bootloader-0.923-lp150.3.6.1.aarch64
-polkit-default-privs-13.2-lp150.8.6.1.noarch
+polkit-default-privs-13.2-lp150.8.9.1.noarch
-psmisc-23.0-lp150.4.3.1.aarch64
+psmisc-23.0-lp150.4.6.1.aarch64
-psmisc-lang-23.0-lp150.4.3.1.noarch
+psmisc-lang-23.0-lp150.4.6.1.noarch
-qemu-ovmf-x86_64-2017+git1510945757.b2662641d5-lp150.4.6.1.noarch
+qemu-ovmf-x86_64-2017+git1510945757.b2662641d5-lp150.4.9.1.noarch
-qemu-uefi-aarch64-2017+git1510945757.b2662641d5-lp150.4.6.1.noarch
+qemu-uefi-aarch64-2017+git1510945757.b2662641d5-lp150.4.9.1.noarch
-suse-module-tools-15.0.1-lp150.2.3.1.aarch64
+suse-module-tools-15.0.2-lp150.2.6.1.aarch64
From this list I suspect one of
os-autoinst-4.5.1545369866.fc084a6a-lp150.16.1.aarch64
ovmf-2017+git1510945757.b2662641d5-lp150.4.9.1.aarch64
qemu-ovmf-x86_64-2017+git1510945757.b2662641d5-lp150.4.9.1.noarch
qemu-uefi-aarch64-2017+git1510945757.b2662641d5-lp150.4.9.1.noarch
to be the culprit.
Maybe you can use os-autoinst in a local container of openSUSE Leap 15.0 with the above package versions to see if the original problem can be reproduced?
Updated by ggardet_arm almost 6 years ago
Test run on 2018-12-26 was fine: https://openqa.opensuse.org/tests/821310#step/user_settings/2
So, the problem is probably in the very last update. What is the timestamp of the last snapshot (115)?
I will try to reproduce locally with a 15.0 system.
Updated by ggardet_arm almost 6 years ago
I restarted gnome test from snapshot 20181224 and it passes just fine: https://openqa.opensuse.org/tests/822688#step/user_settings/2
Updated by ggardet_arm almost 6 years ago
And now, it passes again: https://openqa.opensuse.org/tests/822692#step/user_settings/2
The machine has been updated or something since the last failing?
Updated by okurz almost 6 years ago
- Subject changed from [aarch64] test fails in first_boot to [aarch64] test fails in first_boot because of username letters not written in capital case during installation user_settings
Yes, the worker is auto-updated every night depending on update availability.
# diff -Naur <(rpm -r /.snapshots/115/snapshot/ -qa) <(rpm -qa) | grep '^[+-]' | sort
-openQA-client-4.6.1545406149.53968c1e-lp150.1054.1.noarch
+openQA-client-4.6.1545406149.53968c1e-lp150.1055.1.noarch
-openQA-common-4.6.1545406149.53968c1e-lp150.1054.1.noarch
+openQA-common-4.6.1545406149.53968c1e-lp150.1055.1.noarch
-openQA-worker-4.6.1545406149.53968c1e-lp150.1054.1.noarch
+openQA-worker-4.6.1545406149.53968c1e-lp150.1055.1.noarch
That is just a rebuild of the openQA packages without any software changes in openQA itself so I doubt the package changes really make a difference. If you like I recommend you conduct a https://progress.opensuse.org/projects/openqatests/wiki/Wiki#Statistical-investigation by triggering a bigger set of tests to crosscheck exactly the originally reported symptoms. This week might be a good time anyway as the ressources on o3 are mainly underused ;)
Regarding testing the "typing stability" in general I discussed this topic today with lnussel and dimstar and we mentioned the idea to include better tests for the VNC typing emulation in OBS packages, could be the os-autoinst unit tests or also in e.g. qemu.
Updated by ggardet_arm almost 6 years ago
I know it is updated each night (if updates are available), but it started to work yesterday afternoon, so before the auto-update...
Updated by ggardet_arm almost 6 years ago
Results (with QEMU_COMPRESS_LEVEL=6 QEMU_COMPRESS_THREADS=6 and VNC_TYPING_LIMIT=10):
- succeeded: only 7 times (out of 20)
- first_boot error (due to this current problem in user settings in installation): 7 times (out of 20)
- problems due to missed key(s): 6 times (out of 20)
Updated by okurz almost 6 years ago
- Subject changed from [aarch64] test fails in first_boot because of username letters not written in capital case during installation user_settings to [functional][u][aarch64] test fails in first_boot because of username letters not written in capital case during installation user_settings
- Status changed from New to In Progress
- Assignee set to okurz
- Priority changed from Normal to High
- Target version set to Milestone 22
[03/01/2019 15:16:28] <guillaume_g> okurz: how are handled upper cases in qemu/vnc for openQA? Is it a shift+key? If so, it could be just the same 'missed keys' problem. [03/01/2019 15:18:03] <okurz> guillaume_g: yes, it's a combination of keys and it could be that we just "miss" the shift. if you take a look in progress tickets you should be able to find others. Most severe the issue seems to be on virtio+wayland.
[03/01/2019 15:18:48] <okurz> guillaume_g: did you crosscheck to run about 20 jobs of the previous snapshot to find out if it's worker or product changes?
[03/01/2019 15:20:00] <guillaume_g> okurz: no, currently I scheduled 20 jobs for latest snapshot only. I will do it for previous snapshot right now.
[03/01/2019 15:22:43] <guillaume_g> okurz: gnome tests encouter more errors than other test. Not sure where wayland is used or not.
[03/01/2019 15:24:36] <okurz> guillaume_g: I think it's not using wayland here, in https://openqa.opensuse.org/tests/822800#step/xorg_vt/3 it says "X" and not "Xwayland"
[03/01/2019 15:26:57] <guillaume_g> okurz: Indeed
[03/01/2019 15:27:49] <guillaume_g> okurz: kde uses X too: https://openqa.opensuse.org/tests/822330#step/xorg_vt/3
[03/01/2019 15:28:01] <okurz> yes it does
[03/01/2019 15:32:38] <guillaume_g> okurz: previous snapshot (1224) testing: https://openqa.opensuse.org/tests/overview?build=poo45650_investigation_snap1224&distri=opensuse&version=Tumbleweed
[03/01/2019 15:37:23] <guillaume_g> okurz: do we have the same 'missed keys' errors on x86 or ppc?
[03/01/2019 15:44:06] <okurz> in principle yes but I do not know how the probability of appearance compares
[03/01/2019 15:45:32] <guillaume_g> okurz: it seems that latest x86 snapshot has also lots of errors ;)
[03/01/2019 15:45:56] <okurz> same symptoms? I'm not sure
[03/01/2019 15:48:49] <guillaume_g> okurz: At least here: https://openqa.opensuse.org/tests/822879#step/user_settings/2 but as auto-login works on x86, it can be hidden
[03/01/2019 15:51:40] <guillaume_g> okurz: qemu update: https://build.opensuse.org/package/rdiff/openSUSE:Leap:15.0:Update/qemu?linkrev=base&rev=5 seems to be a good candidate though
[03/01/2019 15:56:39] <guillaume_g> or not. the char retry patches are for serial, which is not used here.
[03/01/2019 16:18:47] <okurz> guillaume_g: https://openqa.opensuse.org/tests/823125#step/zypper_ref/8 fails because the repo is most likely already pruned. You may want to skip these test modules with "SKIP_MODULES=…" or we just regard the jobs failing here as "passed" because the installation at least passed as well as first_boot
[03/01/2019 16:31:15] <guillaume_g> okurz: yes, I will just ignore this failure
[03/01/2019 16:32:07] <guillaume_g> but it fails later with keys missed: https://openqa.opensuse.org/tests/823123#step/prepare_system_for_update_tests/2
[03/01/2019 17:02:13] <okurz> guillaume_g: so this is supporting the hypothesis that the updates on the worker caused this problem. I can try to revert to an older root filesystem snapshot on the worker host, Should I try that and maybe go back to the oldest snapshot 2018-12-19T00:14:59 ?
[03/01/2019 17:09:55] <guillaume_g> ok,yeah, they are all failing so far! So, please try to revert to oldest snapshot.
On the aarch64 machine:
# snapper list -a
Type | # | Pre # | Date | User | Cleanup | Description | Userdata
-------+-----+-------+-------------------------+------+---------+-------------------------+--------------
single | 0 | | | root | | current |
single | 105 | | 2018-12-19T00:14:59 CET | root | number | Snapshot Update of #104 | important=yes
single | 106 | | 2018-12-20T00:40:39 CET | root | number | Snapshot Update of #105 | important=yes
single | 107 | | 2018-12-21T00:17:16 CET | root | number | Snapshot Update of #106 | important=yes
single | 108 | | 2018-12-22T00:52:40 CET | root | number | Snapshot Update of #107 | important=yes
single | 109 | | 2018-12-23T00:53:52 CET | root | number | Snapshot Update of #108 | important=yes
single | 110 | | 2018-12-25T01:18:09 CET | root | number | Snapshot Update of #109 | important=yes
single | 111 | | 2018-12-26T01:57:43 CET | root | number | Snapshot Update of #110 | important=yes
single | 112 | | 2018-12-28T00:34:33 CET | root | number | Snapshot Update of #111 | important=yes
single | 113 | | 2018-12-29T00:34:04 CET | root | number | Snapshot Update of #112 | important=yes
single | 114 | | 2018-12-30T01:51:22 CET | root | number | Snapshot Update of #113 | important=yes
single | 115 | | 2019-01-01T01:45:06 CET | root | | Snapshot Update of #114 |
single | 116 | | 2019-01-03T01:14:56 CET | root | | Snapshot Update of #115 |
# transactional-update rollback 105
transactional-update 2.11 started
Options: rollback 105
Separate /var detected.
Separate /etc detected.
Rollback to snapshot 105...
Please reboot to finish rollback!
# reboot
After reboot:
# zypper info openQA-worker
Loading repository data...
Reading installed packages...
Information for package openQA-worker:
--------------------------------------
Repository : Providing openQA dependencies (openSUSE_Leap_15.0)
Name : openQA-worker
Version : 4.6.1545406149.53968c1e-lp150.1055.1
Arch : noarch
Vendor : obs://build.opensuse.org/devel:openQA
Installed Size : 13.1 KiB
Installed : Yes
Status : out-of-date (version 4.6.1545139030.66c0b50f-lp150.1023.1 installed)
Source package : openQA-4.6.1545406149.53968c1e-lp150.1055.1.src
Summary : The openQA worker
Description :
The openQA worker manages test engine (provided by os-autoinst package).
so I assume we are on the old version again. I am not sure what will happen during the next update window if that snapshot will be preserved or if the system still auto-updates to the most recent state but anyway we should have enough time to run tests and gather statistics. Some jobs already started again.
Triggered tests again with:
for i in {001..100}; do openqa-clone-job --from https://openqa.opensuse.org --host https://openqa.opensuse.org --skip-download --skip-chained-deps 822796 TEST=ggardet_poo45650_$i _GROUP="Development Tumbleweed" BUILD="poo45650_investigation_aarch64_worker_snapshot_105_2018-12-19T00:14:59"; done
Updated by okurz almost 6 years ago
- Status changed from In Progress to Feedback
With the tests running on the old worker packages we found already two cases of mistyping: https://openqa.opensuse.org/tests/823316#step/user_settings/2 and https://openqa.opensuse.org/tests/823318#step/user_settings/2 , see the "m." and "w".
Created new needle "inst-userinfostyped-99p_match-20190103" with a little lower match area and 99% match level which should prevent the false-matches like https://openqa.opensuse.org/tests/823316#step/user_settings/2 and https://openqa.opensuse.org/tests/823318#step/user_settings/2
Proposed deletion of the old one in https://github.com/os-autoinst/os-autoinst-needles-opensuse/pull/492
@waitfor results of https://openqa.opensuse.org/tests/overview?build=poo45650_investigation_aarch64_worker_snapshot_105_2018-12-19T00:14:59&distri=opensuse&version=Tumbleweed&groupid=38 and the next update window of the worker host if it has any effect.
Updated by okurz almost 6 years ago
- Assignee changed from okurz to ggardet_arm
Over night the worker was updated automatically again so the rollback was temporary. 53/100 jobs have been obsoleted, probably due to a more recent openSUSE Tumbleweed snapshot. Many jobs failed. The most recent job was https://openqa.opensuse.org/tests/823366 that failed at 00:18 UTC so way before the automatic worker update meaning that all jobs have been executed with the old state of worker packages. 26/39 failed so 67% so therefore I reject the hypothesis that any package updates on the worker installed since 2018-12-19 cause any regression.
back to @ggardet_arm for the next ideas ;)
Updated by okurz almost 6 years ago
- Copied to action #45713: [functional][u] test for nested virtualization / qemu / typing over VNC added
Updated by ggardet_arm almost 6 years ago
Setting 'QEMUMACHINE=virt,gic-version=host' to use host GIC version seems to improve the current problem in 'user_settings', but still fails later, where 'echo ' become 'ech o' in prepare_system_for_update_tests. See: https://openqa.opensuse.org/tests/overview?build=poo45650_investigation_gic_version&distri=opensuse&version=Tumbleweed
Lower VNC_TYPING_LIMIT improve things too, but I must find the right value to improve things, without becoming too slow. 10 triggers the problem, 1 seems to be fine, but very slow.
Updated by okurz almost 6 years ago
The idea was raised to compare against the state on the openqa.suse.de aarch64 workers as the package updates should be equivalent. However we have to keep in mind that the SLE workers are based on SLE12SP3. The latest deployment with package updates was on 2018-12-10 (from /var/log/zypp/history) and latest SLE15SP1 tests were conducted 17 days ago. I suggest to wait for the next deployment with package updates and see how the systems compare.
ggardet_arm wrote:
I will try to reproduce locally with a 15.0 system.
Any results?
Updated by ggardet_arm almost 6 years ago
okurz wrote:
ggardet_arm wrote:
I will try to reproduce locally with a 15.0 system.
Any results?
No, I had problems to set-up a working 15.0 chroot with qemu (because of missing noarch qemu-ipxe package in 15.0 in aarch64 repo). Now, it is fixed, tests are running. :)
Updated by ggardet_arm almost 6 years ago
I managed to reproduce this kind of failure locally (missing key), but not this specific failure.
BTW, setting 'QEMUMACHINE=virt,gic-version=host' to use host GIC version instead of the default GICv2, improves things a lot. (aarch64 machine updated accordingly)
If I copy a huge file over network to the worker while worker is testing, then I got a 'cat /etc/reeeeeeeeeeeeesolv.conf', with lots of 'e'.
Updated by ggardet_arm almost 6 years ago
Something to note is aarch64 kernel has CONFIG_PREEMPT_NONE=y instead of CONFIG_PREEMPT=y (contrary to most archs) so it is less responsive and could cause this kind of failure.
Updated by mgriessmeier almost 6 years ago
- Related to action #46190: [functional][u] test fails in user_settings - mistyping in Username (lowercase instead of uppercase) added
Updated by mgriessmeier almost 6 years ago
also happens on TW on 32bit, see https://progress.opensuse.org/issues/46190 for details
Updated by okurz almost 6 years ago
- Target version changed from Milestone 22 to Milestone 23
@ggardet_arm after a month, how would you describe the current state?
Updated by ggardet_arm almost 6 years ago
okurz wrote:
@ggardet_arm after a month, how would you describe the current state?
This problem does not occur very often (at all?) those days on o3. And the problem is now caught earlier in user_settings
, which avoid a useless installation if it occurs.
Updated by okurz almost 6 years ago
alright. So what do you suggest to do to continue with this ticket?
Updated by ggardet_arm almost 6 years ago
- Status changed from Feedback to Resolved
okurz wrote:
alright. So what do you suggest to do to continue with this ticket?
Set to resolved.
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: btrfs_libstorage-ng
https://openqa.suse.de/tests/3953797
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed