Project

General

Profile

Actions

action #65750

closed

[17/04/2020 07:56:17] <DimStar> okurz: Martchus_ would appreciate if you could gave a look at tw snapshot 0415... dozens of incomplete tests (attempting to retrigger comolains about missing assets)

Added by okurz about 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
-
Start date:
2020-04-17
Due date:
% Done:

0%

Estimated time:

Description

Observation

From IRC:

[17/04/2020 07:56:17] <DimStar> okurz: Martchus_ would appreciate if you could gave a look at tw snapshot 0415... dozens of incomplete tests (attempting to retrigger comolains about missing assets)
[17/04/2020 07:57:33] <DimStar> opensuse-42.1-x86_64-Updates-20170213-1-kde@.qcow2"  looks like a strange name... missing machine type
[17/04/2020 08:07:21] *** Channel modes: no colors allowed, no messages from outside, r, topic protection
[17/04/2020 08:07:21] *** This channel was created on 26/11/2006 07.42.
[17/04/2020 08:14:04] <okurz> DimStar: the HDD name should be opensuse-42.1-x86_64-Updates-20170213-1-gnome@64bit_cirrus.qcow2 
[17/04/2020 08:14:29] <okurz> either settings changed or it was a code change in openQA regarding setting evaluation
[17/04/2020 08:15:11] <okurz> I would have suspected https://github.com/os-autoinst/openQA/pull/2931 but then I don't see why this only affected later jobs
[17/04/2020 08:15:12] <|Anna|> Github project os-autoinst/openQA pull request#2931: "Extract duplicated code from generating job settings", created on 2020-04-14, status: closed on 2020-04-15, https://github.com/os-autoinst/openQA/pull/2931
[17/04/2020 08:15:38] <okurz> the test suite in https://openqa.opensuse.org/admin/test_suites looks fine
[17/04/2020 08:18:52] <okurz> I will see if I can do a quick crosscheck with the previous openQA version
[17/04/2020 08:21:16] <DimStar> okurz: thanks for looking into that
[17/04/2020 08:23:26] <okurz> ok, seems like it was the version from the day before. I have reverted the change. Based on the commands available in https://openqa.opensuse.org/admin/obs_rsync/openSUSE:Factory:ToTest%7Cbase you can either select individual tests or we retrigger the complete medium
[17/04/2020 08:27:57] <DimStar> okurz: retriggering the test won't be ebnogh, right? they need to be rescheduled? it's about 25% of the tests only; so the whole product seems a bit of a waste
[17/04/2020 08:31:44] <okurz> I will do my best to select the incomplete test and only reschedule those.
[17/04/2020 08:32:16] <okurz> It's basically copy the command available from https://openqa.opensuse.org/admin/obs_rsync/openSUSE:Factory:ToTest%7Cbase/runs/.run_last/download/openqa.cmd as linked from the URL I gave you, then add TEST="list of all affected tests"
[17/04/2020 08:33:50] <DimStar> ok - the NET can be scheduled completely - there 100% is 'cancelled' (17 minutes ago)

What I did:

 1003  2020-04-17 06:17:47 ls -ltra /var/cache/zypp/packages/devel_openQA/noarch/
 1004  2020-04-17 06:19:10 zypper in --oldpackage /var/cache/zypp/packages/devel_openQA/noarch/openQA*-4.6.1586954096.7160d88d9-lp151.2537.1.noarch.rpm

and then reschedule based on https://openqa.opensuse.org/admin/obs_rsync/openSUSE:Factory:ToTest%7Cbase/runs/.run_last/download/openqa.cmd as geekotest

/usr/share/openqa/script/client isos post --host localhost  ARCH=x86_64  ASSET_256=openSUSE-Tumbleweed-NET-x86_64-Snapshot20200415-Media.iso.sha256  BUILD=20200415  CHECKSUM_ISO=$(cut -b-64 /var/lib/openqa/factory/other/openSUSE-Tumbleweed-NET-x86_64-Snapshot20200415-Media.iso.sha256 | grep -E '[0-9a-f]{5,40}' | head -n1)  DISTRI=opensuse  FLAVOR=NET  FULLURL=1  ISO=openSUSE-Tumbleweed-NET-x86_64-Snapshot20200415-Media.iso  MIRROR_HTTP=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-i586-x86_64-Snapshot20200415  MIRROR_HTTPS=https://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-i586-x86_64-Snapshot20200415  MIRROR_PREFIX=http://openqa.opensuse.org/assets/repo  REPO_0=openSUSE-Tumbleweed-oss-i586-x86_64-Snapshot20200415  REPO_1=openSUSE-Tumbleweed-oss-i586-x86_64-Snapshot20200415-debuginfo  REPO_2=openSUSE-Tumbleweed-oss-i586-x86_64-Snapshot20200415-source  REPO_3=openSUSE-Tumbleweed-non-oss-i586-x86_64-Snapshot20200415  REPO_NON_OSS=openSUSE-Tumbleweed-non-oss-i586-x86_64-Snapshot20200415  REPO_OSS=openSUSE-Tumbleweed-oss-i586-x86_64-Snapshot20200415  REPO_OSS_DEBUGINFO=openSUSE-Tumbleweed-oss-i586-x86_64-Snapshot20200415-debuginfo  REPO_OSS_DEBUGINFO_PACKAGES='java*,kernel-default-debug*,kernel-default-base-debug*,mraa-debug*'  REPO_OSS_SOURCE=openSUSE-Tumbleweed-oss-i586-x86_64-Snapshot20200415-source  REPO_OSS_SOURCE_PACKAGES='coreutils*,yast2-network*'  SUSEMIRROR=http://openqa.opensuse.org/assets/repo/openSUSE-Tumbleweed-oss-i586-x86_64-Snapshot20200415  VERSION=Tumbleweed  _OBSOLETE=1 TEST=zdup-Leap-42.1-gnome

this did not help, I needed to go back one package further:

 1006  2020-04-17 06:20:43 zypper in --oldpackage /var/cache/zypp/packages/devel_openQA/noarch/openQA*-4.6.1586881438.7d89e5b7c-lp151.2531.1.noarch.rpm

This yields test like https://openqa.opensuse.org/tests/1238216#settings where HDD_1 is again correctly "opensuse-42.1-x86_64-Updates-20170213-1-gnome@64bit_cirrus.qcow2".


Related issues 1 (0 open1 closed)

Related to openQA Project - action #63883: openqa-clone-job does not support removing an unuseful setting ResolvedXiaojing_liu2020-02-27

Actions
Actions #1

Updated by okurz about 4 years ago

  • Status changed from New to In Progress

I identified https://github.com/os-autoinst/openQA/pull/2931 as the culprit after crosschecking the git log:

$ git log1 --no-merges 7d89e5b7c..7160d88d9
21261208c (okurz/enhance/click_element_ok) t: Add test description strings for all 'click_element_ok' calls
cbee4e8fa (okurz/feature/tools) docs: Add description of folder structure
118a1dd5a Separate all "scripts" to be packaged from development "tools"
afe9be553 (okurz/enhance/full_stack) t: Allow to set custom test output message on wait_for_ajax calls
3004b8364 t: Reduce sleep time in "schedule_one_job" to save testing time
e7943c9ca t: Extract method "find_status_text" for full stack utils
07cba7417 (Amrysliu/refactor_generate_job_settings) Extract duplicated code from generating job settings
b7fd462a3 (okurz/enhance/jobs_simplify) Simplify Schema::Results::Jobs (map)
aa3f05431 Simplify Schema::Results::Jobs overview preparation
2b7f4b6d1 Simplify Schema::Results::Jobs with early returns
bdb8eaf89 Extract Schema::Results::Jobs logging method
e1ebb9864 Simplify Schema::Results::Jobs "add/remove_result_dir_prefix"
96fea65fd Simplify Schema::Results::Jobs "delete" method
8be8bebc8 t: Add simple test for job name/label/scenario

Besides changes that affect only tests this is mainly https://github.com/os-autoinst/openQA/pull/2900 and https://github.com/os-autoinst/openQA/pull/2931 . I created patches with git format-patch 7d89e5b7c..7160d88d9 and applied the ones corresponding to https://github.com/os-autoinst/openQA/pull/2900 and the created jobs are still fine.

Created https://github.com/os-autoinst/openQA/pull/2950 as revert and double checked by creating patches from all commits and I could still retrigger the problem after applying only https://github.com/os-autoinst/openQA/pull/2931 on last good and see it fixed by using updated packages and only revert 2931 as well as crosschecking by using last good and applying 2900 which did not pose a problem. I merged the revert and will now try to cleanup jobs on o3. E.g. what I did:

git diff --no-merges 7d89e5b7c..7160d88d9 lib/ > revert_2900_2931_only_lib.patch

copy over the patch to o3 and apply it with patch -d /usr/share/openqa/ -R -p1 </tmp/revert_2900_2931_only_lib.patch ; systemctl restart openqa-webui openqa-scheduler openqa-websockets and then crosschecking by post jobs and checking the generated job for the HDD_1 setting.

Now, how did I retrigger? First, in SQL:

select test from jobs where result='incomplete' and flavor ~ 'NET' and t_finished > '2020-04-16 23:00';
select test from jobs where result='incomplete' and flavor ~ 'DVD' and t_finished > '2020-04-16 23:00';

and for both list I copy-pasted the output into a local shell and parsed with echo … | sort | uniq | sort | uniq | tr "\n" ',' | sed 's/, /,/g' and put that as argument onto TEST=<…> combined with the commands from https://openqa.opensuse.org/admin/obs_rsync/openSUSE:Factory:ToTest%7Cbase/runs/.run_last/download/openqa.cmd to reschedule all incompletes of Tumbleweed accordingly.

Monitoring result on https://openqa.opensuse.org/tests/overview?result=none&result=incomplete&result=skipped&result=obsoleted&result=parallel_failed&result=parallel_restarted&result=user_cancelled&result=user_restarted&result=timeout_exceeded&arch=&machine=&modules=&distri=opensuse&groupid=1&version=Tumbleweed#

Actions #2

Updated by okurz about 4 years ago

  • Related to action #63883: openqa-clone-job does not support removing an unuseful setting added
Actions #3

Updated by okurz about 4 years ago

  • Status changed from In Progress to Feedback
  • Priority changed from Urgent to High

With o3 hotpatched, faulty jobs retriggered correctly and the culprit PR merged I will monitor but reduce prio already.

Actions #4

Updated by okurz about 4 years ago

  • Status changed from Feedback to Resolved

Everything fine

Actions

Also available in: Atom PDF