Project

General

Profile

action #64938

'+ISO=' in test suite breaks a number of tests

Added by ggardet_arm 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Concrete Bugs
Target version:
-
Start date:
2020-03-27
Due date:
% Done:

0%

Estimated time:
Difficulty:
Duration:

Description

'+ISO=' has been added to a number of test suites, but it breaks a number of tests:

Error log is:

[2020-03-27T13:53:49.151 CET] [debug] running /usr/bin/qemu-img info --output=json /var/lib/openqa/pool/13/openqa1-opensuse
[2020-03-27T13:53:49.175 CET] [debug] qemu-img: Could not open '/var/lib/openqa/pool/13/openqa1-opensuse': A regular file was expected by the 'file' driver, but something else was given
[2020-03-27T13:53:49.175 CET] [debug] Backend process died, backend errors are reported below in the following lines:
malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "qemu-img: Could not ...") at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/JSON.pm line 39.

vars.json shows:

"ISO" : "/var/lib/openqa/pool/6/openqa1-opensuse"

Related issues

Related to openQA Tests - action #59394: [functional][u] Overwrite empty ISO variable everywhere where not needed, i.e. `+ISO=`, to prevent useless ISO downloading and storageWorkable2019-11-13

Related to openQA Project - action #63565: The extra setting is added to the new job when cloning a jobResolved2020-02-19

History

#1 Updated by ggardet_arm 2 months ago

  • Description updated (diff)

#2 Updated by ggardet_arm 2 months ago

  • Related to action #59394: [functional][u] Overwrite empty ISO variable everywhere where not needed, i.e. `+ISO=`, to prevent useless ISO downloading and storage added

#3 Updated by okurz 2 months ago

  • Project changed from openQA Tests to openQA Project
  • Category set to Concrete Bugs
  • Status changed from New to In Progress
  • Assignee set to okurz
  • Priority changed from Normal to High

I will look into this. I suspect a regression from https://github.com/os-autoinst/openQA/pull/2861

#4 Updated by okurz 2 months ago

  • Status changed from In Progress to Feedback

Reproduced with https://openqa.opensuse.org/tests/1215844 , did snapper rollback 585 on aarch64, reboot, retrigger, https://openqa.opensuse.org/tests/1215847 passed, hypothesis of regression due to https://github.com/os-autoinst/openQA/pull/2861 accepted, revert https://github.com/os-autoinst/openQA/pull/2874 prepared and merged. Waiting for fixed packages.

#5 Updated by okurz 2 months ago

  • Status changed from Feedback to Resolved

Fixed packages are deployed on all o3 workers.

#7 Updated by Xiaojing_liu 2 months ago

  • Status changed from Resolved to Feedback

I checked the os-autoinst.log between successful job and failed job. The +ISO in those jobs were both handled to ISO= in the job settings. The different is that, in the fail job, the ISO was re-written to /var/lib/openqa/pool/8/openqa1-opensuse. And this re-written seems like was done during caching assets. In the successful job, when ISO=, there is no downloading ISO log message in os-autoinst, but in fail job, it still download the ISO. I also did some test in my local environment (disabled the cache service), when the ISO=, the job will failed too, because the function locate_local_assets in isotovideo.pm re-written the ISO to directory which is under /var/lib/openqa/pool.

#8 Updated by Xiaojing_liu 2 months ago

I also did some test in my local environment after reverting this pr #2860, and the test passed. Seems like this question is caused by this modification: https://github.com/os-autoinst/openQA/pull/2860/files#diff-daeb812b7eb46c12f6ec790a2ec2d399L85.

#10 Updated by okurz 2 months ago

  • Related to action #63565: The extra setting is added to the new job when cloning a job added

#11 Updated by okurz 2 months ago

  • Status changed from Feedback to In Progress

please Xiaojing_liu, thanks for taking a look. Please handle your follow up in code in #59394 or #63565 so that this ticket can focus on handling the problems in production with reverts and workarounds.

EDIT: openqa_clone_job_o3 --skip-chained-deps 1218657 ISO=''
Created job #1218661: opensuse-15.2-DVD-ppc64le-Build190.2-boot_to_snapshot@ppc64le -> https://openqa.opensuse.org/t1218661

Also x86_64 is affected.

[30/03/2020 12:24:09] <DimStar> okurz: do you have some ETA for https://progress.opensuse.org/issues/64938 ?
[30/03/2020 12:24:10] <|Anna|> '+ISO=' in test suite breaks a number of tests in openQA Project (action for okurz) [In Progress] Created on: 2020-03-27 | 0% done.
[30/03/2020 12:25:41] <okurz> DimStar: ETA 1-2 days. I do not yet understand why after my work three days ago there should still be problems . At the time I retriggered tests and they were fine. So what's the current impact?
[30/03/2020 12:26:25] <DimStar> okurz: I see two tests in TW incomplete on that - so gnuhealth and boot_to_snapshot are untested for a few days already
[30/03/2020 12:26:39] <DimStar> e.g https://openqa.opensuse.org/tests/1218368
[30/03/2020 12:27:18] <DimStar> [info] [#5429] Downloading "." from "http://openqa1-opensuse/tests/1218368/asset/iso/."
[30/03/2020 12:28:31] <okurz> alright. I cloned them now with the explicit `ISO=''` that should help for the current jobs
openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org/1218664 ISO=''
openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org/1218665 ISO=''

Created job #1218681: opensuse-Tumbleweed-DVD-x86_64-Build20200329-boot_to_snapshot@64bit -> https://openqa.opensuse.org/t1218681
Created job #1218682: opensuse-Tumbleweed-DVD-x86_64-Build20200329-gnuhealth@64bit -> https://openqa.opensuse.org/t1218682

both passed

I hotpatched openqaworker7 with https://github.com/os-autoinst/openQA/pull/2877 and restarted the only free worker instance systemctl openqa-worker@5 and retriggered a run

$ openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org/1218665 TEST=okurz_poo64938_boot_to_snapshot_openqaworker7_hotpatched_openQA_2877 BUILD=poo64938 WORKER_CLASS=openqaworker7

Created job #1218700: opensuse-Tumbleweed-DVD-x86_64-Build20200329-gnuhealth@64bit -> https://openqa.opensuse.org/t1218700

test is fine so the fix is helpful. Not sure about other impacts but nevertheless we can move ahead. I will merge the PR.

#12 Updated by Xiaojing_liu 2 months ago

okurz wrote:

openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org/1218664 ISO=''
openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org/1218665 ISO=''

Created job #1218681: opensuse-Tumbleweed-DVD-x86_64-Build20200329-boot_to_snapshot@64bit -> https://openqa.opensuse.org/t1218681
Created job #1218682: opensuse-Tumbleweed-DVD-x86_64-Build20200329-gnuhealth@64bit -> https://openqa.opensuse.org/t1218682

both passed

When using the openqa-clone-job and specify the ISO='', the job's setting ISO= will be removed, then the job will be passed. This situation is different from the jobs that created by isos post. All the fail jobs' settings have ISO=. we should use isos post or openqa-clone-job but not specify ISO= (the command okurz gave above) to verify this fix.

#13 Updated by okurz 2 months ago

  • Status changed from In Progress to Feedback

Yes, I am aware that clone-job ISO='' is not the same. I did that as a short-term remedy knowing that the ISO is still there even though not actively used by the test.

Triggering an out-of-ordinary upgrade with the fixed package with for i in aarch64 openqaworker1 openqaworker4 openqaworker7 power8 imagetester rebel; do echo $i && ssh root@$i "(transactional-update -n dup || zypper -n dup) && reboot" ; done and will monitor tests on o3. https://openqa.opensuse.org/tests/1218947 is passed fine now.

#14 Updated by okurz 2 months ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF