action #35299

Potential corrupted file causing installation tests to fail

Added by okurz almost 2 years ago. Updated over 1 year ago.

Status:ResolvedStart date:20/04/2018
Priority:HighDue date:
Assignee:EDiGiacinto% Done:

0%

Category:Concrete Bugs
Target version:Done
Difficulty:medium
Duration:

Description

Observation

from IRC:

[20 Apr 2018 19:43:24] <DimStar> okurz: today I see a correlation between all the DVD/*/await_install failures: they all seem to run on ow4; the same tests rerun on ow4 fail, on ow1 seem to pass
[20 Apr 2018 21:47:31] <okurz> I checked the logs on the worker but could not find anything obvious. The only job that showed some perl errors in the journal is about https://openqa.opensuse.org/tests/659929 which is actually running fine it seems and about to complete.
[20 Apr 2018 22:02:06] <DimStar> okurz: yeah.. looking at the stuff, it might be that the DVD was corrupted transferred to OW4 - it fails on installing some packages.
[20 Apr 2018 22:02:26] <DimStar> e.g. https://openqa.opensuse.org/tests/659950#step/await_install/8
[20 Apr 2018 22:03:01] <okurz> hm, you wouldn't be the first one to suggest to have more checksums to check ;)
[20 Apr 2018 22:05:50] <DimStar> looking at caches, that actually makes sense...
[20 Apr 2018 22:06:04] <DimStar> on o4: 4172888508 Apr 20 14:55 openSUSE-Tumbleweed-DVD-x86_64-Snapshot20180419-Media.iso
[20 Apr 2018 22:06:17] <DimStar> on o1: 4621074432 Apr 20 14:52 openSUSE-Tumbleweed-DVD-x86_64-Snapshot20180419-Media.iso

All the worker logs from openqaw4 from the first mention of "openSUSE-Tumbleweed-DVD-x86_64-Snapshot20180419-Media.iso" are included in the attached logfile. Please check the content.

journal_corrupted_iso.log.xz (54.5 KB) okurz, 20/04/2018 08:13 pm


Related issues

Related to openQA Project - action #13646: Ensuring asset files integrity (was: "An error occurred d... Workable 09/09/2016
Related to openQA Project - action #35296: Error messages on worker about "Use of uninitialized valu... Rejected 20/04/2018
Related to openQA Project - action #34597: Race condition causing problems with the worker cache Resolved 11/05/2018
Related to openQA Project - action #34591: Corrupt iso download by cache Resolved 09/04/2018

History

#1 Updated by okurz almost 2 years ago

As a workaround DimStar deleted the file manually from the cache, let's see what this will do.

EDIT: has worked.

#2 Updated by okurz almost 2 years ago

  • Related to action #13646: Ensuring asset files integrity (was: "An error occurred during the installation" on images) added

#3 Updated by okurz almost 2 years ago

  • Related to action #35296: Error messages on worker about "Use of uninitialized value $host in hash element at /usr/share/openqa/script/../lib/OpenQA/Worker/Common.pm line 359, <GEN298662> line 4." added

#4 Updated by EDiGiacinto almost 2 years ago

What could help here actually is multi-chunked download ( as we did with uploads - it is checking integrity piece by piece, instead of at the end of the process )

#5 Updated by coolo almost 2 years ago

  • Priority changed from Normal to High
  • Target version set to Ready
  • Difficulty set to medium

#6 Updated by okurz almost 2 years ago

EDiGiacinto wrote:

What could help here actually is multi-chunked download ( as we did with uploads - it is checking integrity piece by piece, instead of at the end of the process )

I don't understand why we do not trust what is offered by existing tools to handle this properly. Are we using the wrong tools? Is Linux broken? ;)

#7 Updated by EDiGiacinto almost 2 years ago

okurz wrote:

EDiGiacinto wrote:

What could help here actually is multi-chunked download ( as we did with uploads - it is checking integrity piece by piece, instead of at the end of the process )


I don't understand why we do not trust what is offered by existing tools to handle this properly. Are we using the wrong tools? Is Linux broken? ;)

Sorry - what tools are you talking about?

#8 Updated by okurz almost 2 years ago

EDiGiacinto wrote:

Sorry - what tools are you talking about?

rsync, wget, curl, scp, etc.

#9 Updated by szarate almost 2 years ago

  • Target version changed from Ready to Current Sprint

#10 Updated by EDiGiacinto almost 2 years ago

okurz wrote:

EDiGiacinto wrote:

Sorry - what tools are you talking about?


rsync, wget, curl, scp, etc.

Interesting way of seeing things, but i'm sorry we don't share such approach - so what you are proposing exactly? JFYI we had to remove already from the codebase area that are relying on external tools, and i'm afraid you missed the reason why. Do you expect me to script openQA or actually develop it?

#11 Updated by okurz almost 2 years ago

EDiGiacinto wrote:

Interesting way of seeing things, but i'm sorry we don't share such approach - so what you are proposing what exactly?

I am not proposing anything but just asking why we can't use existing tools. Reasons might be many-fold, e.g. that these tools (or perl modules) do not exist or they do not apply to our needs for whatever reasons.

JFYI we had to remove already from the codebase area that are relying on external tools, and i'm afraid you missed the reason why

Yes, seems so.

Do you expect me to script openQA or actually develop it?

I don't think this is a real open question so I guess you do not really expect an answer from me.

#12 Updated by EDiGiacinto almost 2 years ago

  • Related to action #34597: Race condition causing problems with the worker cache added

#13 Updated by mkittler almost 2 years ago

  • Assignee set to EDiGiacinto

#14 Updated by szarate almost 2 years ago

#15 Updated by szarate almost 2 years ago

  • Target version changed from Current Sprint to Ready

#16 Updated by szarate almost 2 years ago

  • Target version changed from Ready to Current Sprint

Sending back to product backlog for the time being until we get some feedback if it's still happening.

#17 Updated by szarate almost 2 years ago

  • Target version changed from Current Sprint to Ready

:)

#18 Updated by coolo over 1 year ago

  • Target version changed from Ready to Current Sprint

One more issue related to the worker downloading assets

#19 Updated by coolo over 1 year ago

  • Status changed from New to Resolved
  • Target version changed from Current Sprint to Done

Also available in: Atom PDF