action #62159
closedAsset GRU download not done by web UI host if job scheduled by `isos post`, fails to download and then cloned (was: … using the Web UI)
0%
Description
Observation¶
https://openqa.opensuse.org/tests/1144395 has
but the test fails with
[info] [#192844] Purging "/var/lib/openqa/cache/openqa1-opensuse/openSUSE-Tumbleweed-JeOS.x86_64-old-20200113.qcow2" because the download failed: 404 - Not Found
GRU logs show that no download was attempted at all.
Cloning the job using the Web UI results in the same error 100% reproducible.
However, using just
openqa-clone-job 1144395
results in a working job, https://openqa.opensuse.org/tests/1144396
This is currently the (only) blocker for JeOS zdup tests for TW.
See also https://progress.opensuse.org/issues/57617, which this is a part of...
Steps to reproduce¶
See #62159#note-14
Updated by mkittler over 4 years ago
- Related to action #57782: retrigger of job with failed gru download task ends up incomplete with 404 on asset, does not retry download added
Updated by mkittler over 4 years ago
The code which parses the settings variables is the same for the ISO post as for the single jobs post (parse_assets_from_settings
method). The code for enqueuing the download jobs is also the same in both cases (enqueue_download_jobs
method). Hence it must be the way enqueue_download_jobs
is called when scheduling the ISO is buggy. But both use create_downloads_list
for this. So I'm not sure what makes the difference here.
Updated by mkittler over 4 years ago
Cloning the job using the Web UI results in the same error 100% reproducible.
I assume you mean clicking the "restart" button on the web UI. Restarting the job via the web UI does not help because openQA's job duplication code simply does not create a new download task. That is a known limitation, see https://progress.opensuse.org/issues/57782#note-2 and https://progress.opensuse.org/issues/57782#note-5. I've also created a draft to prevent jobs with missing assets from being restarted in the first place which goes in the opposite direction. Maybe we should clarify whether we want to retrigger downloads or not. I've been messing the the related code recently so I would agree with @coolo's statement from the other issue:
I wouldn't restart the download from retriggering jobs. Everyone who knows the retriggering jobs code will agree :)
Updated by okurz over 4 years ago
- Category set to Feature requests
mkittler wrote:
[…] Maybe we should clarify whether we want to retrigger downloads or not.
I guess it's a reasonable expectation that a "retry" (aka. retrigger) would retry what openQA was asked to do initially, that includes the download of necessary assets.
Updated by favogt over 4 years ago
How is this a feature request? It's quite clearly a bug.
Updated by okurz over 4 years ago
I am following what we defined on https://progress.opensuse.org/projects/openqav3/wiki#ticket-categories . Categorizing it does not have a direct impact on severity or our priority of the issue. To my understanding this never worked. Maybe I am misreading your observation and this really a regression? In this case could you help us to find any "last good"?
Updated by mkittler over 4 years ago
- Category changed from Feature requests to Regressions/Crashes
It is a bug that GRU didn't download the asset in the first place (regardless of the restart feature which I have only mentioned because @favogt tried to use it as workaround). The problem is also reproducible, e.g. further jobs on o3 of the scenario show the problem again.
Updated by okurz over 4 years ago
@mkittler so do you know since when this regression was introduced then?
Updated by okurz over 4 years ago
- Related to action #62459: Retry on download errors within GRU download tasks added
Updated by favogt over 4 years ago
Any news here? This is also needed by MicroOS tests now.
rbrown added a download cron job to workaround this, but that's not great for multiple reasons.
Updated by mkittler over 4 years ago
After reading the ticket description again I'm not sure anymore what this ticket is about. Is it about
- restarting a job within the web UI or API? That's now actually prevented if there are missing assets (with a force option) as requested by #34783.
- posting "an ISO" via the API? (Likely that's not the case because the job mentioned in the description has not been created by posting an ISO.)
- posting a single job via the API?
If it is option 1. that still leaves the question how the job has been created initially and the the asset download failed in the first place.
If there are more recent examples, can you provide some links?
Updated by favogt over 4 years ago
The job was created by posting an ISO, through the obs_rsync scripts.
Both 1 and 3 should be fine AFAIK, though I haven't tried that again.
As okurz removed the test for some reason, I don't have any recent example. I'll try to create a minimal PoC locally.
Updated by favogt over 4 years ago
favogt wrote:
As okurz removed the test for some reason, I don't have any recent example. I'll try to create a minimal PoC locally.
Done and copied to o3: https://openqa.opensuse.org/tests/1293416#details
To reproduce: openqa-client --host http://openqa.opensuse.org isos post DISTRI=opensuse VERSION=1 FLAVOR=poo62159 ARCH=x86_64 BUILD=1
The created job incompletes, but when cloning it with openqa-clone-job it creates a GRU task and works.
Updated by okurz about 4 years ago
- Description updated (diff)
- Status changed from New to Workable
- Target version set to Ready
Updated by okurz about 4 years ago
- Subject changed from Asset download not done if job scheduled using the Web UI to Asset GRU download not done by web UI host if job scheduled by `isos post` (was: … using the Web UI)
Updated by okurz about 4 years ago
- Subject changed from Asset GRU download not done by web UI host if job scheduled by `isos post` (was: … using the Web UI) to Asset GRU download not done by web UI host if job scheduled by `isos post`, fails to download and then cloned (was: … using the Web UI)
Updated by Xiaojing_liu about 4 years ago
The difference in creating download asset between 'isos post' and 'job create' is that: 'isos post' uses the arguments to create the download list, and 'job create' (such as openqa-clone-job) uses the job's setting to create the download list. Maybe this can explain why 'openqa-clone-job' works, but 'isos post' does not works.
I am not sure if we should call the create_downloads_list
for every job when doing 'isos post', because there may be many jobs will be created.
@favogt
workaround:
openqa-client --host http://openqa.opensuse.org isos post DISTRI=opensuse VERSION=1 FLAVOR=poo62159 ARCH=x86_64 BUILD=1 HDD_1=openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2 HDD_1_URL=http://download.opensuse.org/tumbleweed/appliances/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2?foobar=1
Updated by okurz about 4 years ago
Xiaojing_liu wrote:
I am not sure if we should call the
create_downloads_list
for every job when doing 'isos post', because there may be many jobs will be created.
I think this should be the same as calling jobs post … HDD_1_URL=http://download.opensuse.org/my/same/asset.qcow2
10 times. I suggest to just try this out and see what happens.
I recommend to just try out what happens when we call create_downloads_list
.
@favogt
workaround:
openqa-client --host http://openqa.opensuse.org isos post DISTRI=opensuse VERSION=1 FLAVOR=poo62159 ARCH=x86_64 BUILD=1 HDD_1=openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2 HDD_1_URL= http://download.opensuse.org/tumbleweed/appliances/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2?foobar=1
I think there are spaces between HDD_1_URL=
and its arguments which should not be there.
Updated by mkittler about 4 years ago
- Category changed from Regressions/Crashes to Feature requests
@fvogt We came to the conclusion that this issue is actually: Add support for triggering GRU asset downloads via "isos post" when the relevant _URL
parameters are not directly provided but only pulled from e.g. the test suites table
Is that right? I'm just wondering because in the ticket description this feature request is mixed up with restarting jobs and errors reported from the worker's asset cache.
Updated by Xiaojing_liu about 4 years ago
- Category changed from Feature requests to Regressions/Crashes
okurz wrote:
Xiaojing_liu wrote:
I am not sure if we should call the
create_downloads_list
for every job when doing 'isos post', because there may be many jobs will be created.I think this should be the same as calling
jobs post … HDD_1_URL=http://download.opensuse.org/my/same/asset.qcow2
10 times. I suggest to just try this out and see what happens.I recommend to just try out what happens when we call
create_downloads_list
.
Here is the test result:
calling job post
10 times:
# for i in $(seq 10); do openqa-cli api -X post jobs --host http://10.67.19.103 TEST=kde DISTRI=sle MACHINE=64bit; done
The job ids are : 186 ... 195.
we could see that there are 10 records in database table gru_tasks
are created
id | taskname | args | run_at | priority | t_created | t_updated |
---|---|---|---|---|---|---|
5284 | download_asset | ["http:\/\/download.opensuse.org\/tumbleweed\/appliances\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2","\/var\/lib\/openqa\/share\/factory\/hdd\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2",0] | 2020-08-11 10:31:52 | 20 | 2020-08-11 10:31:52 | 2020-08-11 10:31:52 |
5283 | download_asset | ["http:\/\/download.opensuse.org\/tumbleweed\/appliances\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2","\/var\/lib\/openqa\/share\/factory\/hdd\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2",0] | 2020-08-11 10:31:52 | 20 | 2020-08-11 10:31:52 | 2020-08-11 10:31:52 |
5282 | download_asset | ["http:\/\/download.opensuse.org\/tumbleweed\/appliances\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2","\/var\/lib\/openqa\/share\/factory\/hdd\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2",0] | 2020-08-11 10:31:51 | 20 | 2020-08-11 10:31:51 | 2020-08-11 10:31:51 |
5281 | download_asset | ["http:\/\/download.opensuse.org\/tumbleweed\/appliances\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2","\/var\/lib\/openqa\/share\/factory\/hdd\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2",0] | 2020-08-11 10:31:51 | 20 | 2020-08-11 10:31:51 | 2020-08-11 10:31:51 |
5280 | download_asset | ["http:\/\/download.opensuse.org\/tumbleweed\/appliances\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2","\/var\/lib\/openqa\/share\/factory\/hdd\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2",0] | 2020-08-11 10:31:51 | 20 | 2020-08-11 10:31:51 | 2020-08-11 10:31:51 |
5279 | download_asset | ["http:\/\/download.opensuse.org\/tumbleweed\/appliances\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2","\/var\/lib\/openqa\/share\/factory\/hdd\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2",0] | 2020-08-11 10:31:50 | 20 | 2020-08-11 10:31:50 | 2020-08-11 10:31:50 |
5278 | download_asset | ["http:\/\/download.opensuse.org\/tumbleweed\/appliances\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2","\/var\/lib\/openqa\/share\/factory\/hdd\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2",0] | 2020-08-11 10:31:50 | 20 | 2020-08-11 10:31:50 | 2020-08-11 10:31:50 |
5277 | download_asset | ["http:\/\/download.opensuse.org\/tumbleweed\/appliances\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2","\/var\/lib\/openqa\/share\/factory\/hdd\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2",0] | 2020-08-11 10:31:50 | 20 | 2020-08-11 10:31:50 | 2020-08-11 10:31:50 |
5276 | download_asset | ["http:\/\/download.opensuse.org\/tumbleweed\/appliances\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2","\/var\/lib\/openqa\/share\/factory\/hdd\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2",0] | 2020-08-11 10:31:50 | 20 | 2020-08-11 10:31:50 | 2020-08-11 10:31:50 |
5275 | download_asset | ["http:\/\/download.opensuse.org\/tumbleweed\/appliances\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2","\/var\/lib\/openqa\/share\/factory\/hdd\/openSUSE-Tumbleweed-JeOS.x86_64-kvm-and-xen.qcow2",0] | 2020-08-11 10:31:49 | 20 | 2020-08-11 10:31:49 | 2020-08-11 10:31:49 |
(10 rows)
And the gru_dependencies
result is
job_id | gru_task_id |
---|---|
195 | 5284 |
194 | 5283 |
193 | 5282 |
192 | 5281 |
191 | 5280 |
190 | 5279 |
189 | 5278 |
188 | 5277 |
187 | 5276 |
186 | 5275 |
(10 rows)
Updated by Xiaojing_liu about 4 years ago
- Category changed from Regressions/Crashes to Feature requests
Updated by mkittler about 4 years ago
Ok, so that would create multiple asset downloads. Judging by the code in openQA/lib/OpenQA/Task/Asset/Download.pm
nothing bad will happen in that case. There's a lock to prevent concurrently downloading the same asset and a check to prevent downloading an existing asset again.
I suppose it would nevertheless be a good idea to de-duplicate the download lists for the jobs created by isos post
by the download destination to produce less overhead. (E.g. enqueue_download_jobs
would accept multiple download lists and skips "visited" download destinations.)
Updated by Xiaojing_liu about 4 years ago
- Status changed from Workable to In Progress
- Assignee set to Xiaojing_liu
Updated by favogt about 4 years ago
mkittler wrote:
@fvogt We came to the conclusion that this issue is actually: Add support for triggering GRU asset downloads via "isos post" when the relevant
_URL
parameters are not directly provided but only pulled from e.g. the test suites tableIs that right? I'm just wondering because in the ticket description this feature request is mixed up with restarting jobs and errors reported from the worker's asset cache.
Yes. I don't see how there's any mixup, the errors and observations with job restarting are directly related.
Updated by Xiaojing_liu about 4 years ago
- Status changed from In Progress to Feedback
PR has been merged
Updated by favogt about 4 years ago
- Status changed from Feedback to Resolved
I can confirm that the issue is fixed, thanks!
I created a new test suite and hooked it up, but unfortunately it bumps against max_redirects now. I opened a PR for that: https://github.com/os-autoinst/openQA/pull/3338
Updated by mkittler about 4 years ago
There's just one caveat but for now I wouldn't over-optimize it.
Updated by favogt about 4 years ago
- Status changed from Resolved to Workable
Unfortunately there is an issue with the way this is implemented.
I added a testsuite ("jeos2twnext") which defines HDD_1_URL and linked them to the JeOS medium in the "Development Tumbleweed" group.
When the next snapshot scheduled the JeOS product, all unrelated tests in the main group also failed with the GRU error:
https://openqa.opensuse.org/tests/1375735#step/GRU/1
So it appears like the download jobs are attached to all scheduled jobs and not just the ones which actually need them?
Updated by Xiaojing_liu about 4 years ago
favogt wrote:
Unfortunately there is an issue with the way this is implemented.
I added a testsuite ("jeos2twnext") which defines HDD_1_URL and linked them to the JeOS medium in the "Development Tumbleweed" group.
When the next snapshot scheduled the JeOS product, all unrelated tests in the main group also failed with the GRU error:
https://openqa.opensuse.org/tests/1375735#step/GRU/1So it appears like the download jobs are attached to all scheduled jobs and not just the ones which actually need them?
yes, when using isos post
, the download jobs are attached to all scheduled jobs, even the unrelated jobs. This pr does not fix this, I could create a new ticket to record this feature.
Updated by Xiaojing_liu about 4 years ago
- Related to action #70687: Download gru is attached to all scheduled jobs when doing 'isos post' added
Updated by Xiaojing_liu about 4 years ago
- Status changed from Workable to Feedback
Updated by okurz about 4 years ago
@Xiaojing_liu I guess by now we know that your change works and all changes left to be done are tracked in #70687 , right? If this as true then please resolve this ticket.
Updated by Xiaojing_liu about 4 years ago
- Status changed from Feedback to Resolved
Updated by okurz almost 4 years ago
- Related to action #72142: Avoid problematic symlinking in download assets tasks of the web UI added