action #67288: test fails in partitioning in dual_windows10 - something must been changed in openqa regarding windows10 image or settings - openQA Tests (public) - openSUSE Project Management Tool

Custom queries

All open Feature tests
openQA Infrastructure Project
openqa-review - Closed tickets last updated by openqa-review, last 30 days
QA roadmap long-term
QA SLE functional
QA SLE Functional - closed in last 14 days
QA SLE Functional - High, need to be refined
QA SLE Functional - over cycle time median
QA SLE u
QA SLE y
QA tools (tag not necessary in openQA and subprojects)
QA tools tag (tag not necessary in openQA and subprojects; excluding tickets in "Ready" version as they are already on the backlog)
QAC - Backlog
QAM
QE tools team - backlog (dev)
QE tools team - backlog (ready issues)
QE tools team - backlog SLA high
QE tools team - backlog SLA immediate
QE tools team - backlog SLA no immediate/urgent in feedback/blocked
QE tools team - backlog SLA normal
QE tools team - backlog SLA urgent
QE tools team - backlog SLO high
QE tools team - backlog SLO normal
QE tools team - backlog SLO urgent
QE tools team - backlog, high-level view (epics and higher)
QE tools team - backlog, non-reactive work, needs parent
QE tools team - backlog, top-level view (all sagas)
QE tools team - closed within last 14 days
QE tools team - closed within last 60 days
QE tools team - closed yesterday
QE Tools Team - Collaborative Session
QE tools team - due date forecast
QE tools team - exceeding due-date
QE tools team - infrastructure backlog
QE tools team - next - sorted by update time
QE tools team - next issues
QE tools team - non-estimated (unblocked) issues (dev)
QE tools team - non-estimated (unblocked) issues (infra)
QE tools team - ready issues - Workable
QE tools team - ready, not assigned/blocked/low
QE tools team - SLO high forecast
QE tools team - update forecast
QE tools team - updated by priority
QE tools team - what members of the team are working on - Feedback (not-low)
QE Tools Team Backlog By Assignee
SLE15 Migration Open Tickets
SLE15 SP1 Migration Open Tickets
SLE15SP3 Migration open ticket
SLE15SP3 Security open ticket
Tools Team Retrospective
Tools Team Retrospective (not estimated or assigned)

Actions

Copy link

action #67288

closed

test fails in partitioning in dual_windows10 - something must been changed in openqa regarding windows10 image or settings

Added by mlin7442 over 4 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

High

Assignee:

okurz

Category:

Bugs in existing tests

Target version:

Start date:

2020-05-26

Due date:

2020-06-17

% Done:

Estimated time:

Difficulty:

Description

Observation¶

openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-kde_dual_windows10@uefi_win fails in
partitioning

I've filed https://bugzilla.suse.com/show_bug.cgi?id=1172071 at the first place, then I'm afraid this is actual an openqa issue.

I've through product changes in the last few check-in and cannot find any suspicion change, and at the same moment this test also fails to work on Leap 15.2, I'm believing this is about something has been changed on openqa regarding windows 10 image or settings. A further clue is that I've re-try previous succeeded openqa job and now it turns to fail.

https://openqa.opensuse.org/tests/1278961 is re-runed test from the previous succeeded job, it's now ends to fail.

Reproducible¶

Fails since (at least) Build 20200523

Expected result¶

Last good: 20200520 (or more recent)

Further details¶

Always latest result in this scenario: latest

History
Notes
Property changes

Actions

Copy link

Updated by mlin7442 over 4 years ago

Priority changed from Normal to High

Actions

Copy link

Updated by riafarov over 4 years ago

Seems problem is in mechanism hiding qcow2 from downloading. If I try to download asset from openQA it's indeed 100KB file. So seems that openQA cannot download it properly.
o3 contains correct qcow2. However, permissions were wrong (set to root), I've changed ownership to geekotest, as it's supposed to be, let's see if that helps, but I guess it's more complex than this.

Actions

Copy link

Updated by mlin7442 over 4 years ago

looks still not working https://openqa.opensuse.org/tests/1291031

Actions

Copy link

Updated by riafarov over 4 years ago

Can someone from tools team comment on this?

Actions

Copy link

Updated by dimstar over 4 years ago

openQA redirects download attempts of the win images to microsoft - as we can't legally distribute' those.

it redirects all users not coming from the worker network (i.e not 192.168.112.0/24)
So far, all good.

Now, though, I did find a problem on ariel. the win qcow image exists in factory/hdd AND factory/hdd/fixed

in factory/hdd

-rw-r--r-- 1 geekotest nogroup     102228 May 25 13:16 windows-10-x86_64-1903@uefi_win.qcow2

in factory/hdd/fixed/

-rw-r--r-- 1 geekotest   nogroup  5450498048 Sep 26  2019 windows-10-x86_64-1903@uefi_win.qcow2

Clearly, the one in factory/hdd is not correct - but seems to be the preferred one over the image in fixed. As a test, I renamed it to
windows-10-x86_64-1903@uefi_win.qcow2~ to ignore it for now.

Test run: https://openqa.opensuse.org/tests/1295279 -> passed partitioner

So remains only to find out where from this broken qcow image came on May 25

Actions

Copy link

Updated by okurz over 4 years ago

Due date set to 2020-06-17
Status changed from New to Feedback
Assignee set to okurz

Unlikely we can find out what caused this. Looking in the database I can find:

openqa=> select jobs.id,t_finished,test from jobs,job_settings where (jobs.test ~ 'windows' and job_settings.job_id = jobs.id and key = 'PUBLISH_HDD_1' and value = 'windows-10-x86_64-1903@uefi_win.qcow2');
   id    |     t_finished      |    test    
---------+---------------------+------------
 1036580 | 2019-09-20 10:57:15 | windows_10
(1 row)

so a single job but that is much older – about the age of the actual fixed asset – and also https://openqa.opensuse.org/tests/1036580/file/worker-log.txt shows what looks like a "longer" upload corresponding to a file that is way bigger than 100kb. So I guess someone did a mistake, triggered one job, maybe aborted it prematurely, etc. Maybe we can just regard it as unlucky timing that caused it to end up in a way that is not completely obvious :D

In hindsight the wrong permissions might also be a symptom of "prematurely aborted upload" as it might be that in the correct case the file should change its ownership to geekotest. But could also be someone doing stuff manually. Overall the story looks related to #67219 .

So I think the immediate problem is fixed. I will take the ticket and try to use the opportunity for all of us involved to learn and see how we can improve in the future to maybe not prevent case like these but improve so that the next time we spend less time and effort to identify the root cause.

I have one finding: https://openqa.opensuse.org/tests/1277483 is the first job in the row that failed. maxlin reviewed and reported the bug on bugzilla. What could have helped is the initial investigation bisection step to distinguish "1. is it reproducible, 2. does the same test with test code of 'last good' still work, 3. does the same test with product state of 'last good' still work.". https://gitlab.suse.de/openqa/auto-review/pipelines is setup for that by triggering automatic investigation jobs for every new failures that do not yet have a comment. There was however unfortunate timing as the pipeline triggers every day at 0819 CET and maxlin commented at just 0759 CET so 20mins before :D The specific review job in question is https://gitlab.suse.de/openqa/auto-review/-/jobs/210675

Hence I have one simple suggestion: Use https://github.com/os-autoinst/scripts/blob/master/openqa-investigate for any new openQA test failures where the root cause is not immediately obvious

I am looking forward for more comments from all of you

Actions

Copy link

Updated by okurz over 4 years ago

Status changed from Feedback to Resolved

I think the "investigation" route provided by openQA same as automatically triggered investigation jobs would at least show that there is no relevant difference so that should lead one to the conclusion that it is neither test differences nor product differences. adding checksum sounds feasible same as crosschecking the size of the image. IMHO we should calculate and check and show the checksum of generated/used assets, especially for "fixed" assets. Recorded the idea in #65271#note-19

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Tests (public)

Tags

Custom queries

action #67288

test fails in partitioning in dual_windows10 - something must been changed in openqa regarding windows10 image or settings

Observation¶

Reproducible¶

Expected result¶

Further details¶

Updated by mlin7442 over 4 years ago

Updated by riafarov over 4 years ago

Updated by mlin7442 over 4 years ago

Updated by riafarov over 4 years ago

Updated by dimstar over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago