Project

General

Profile

action #52964

s390x builds are triggered twice

Added by okurz 12 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Start date:
2019-06-12
Due date:
% Done:

0%

Estimated time:
Duration:

Description

Observation

[12/06/2019 15:29:43] <mgriessmeier> @coolo any special reason why s390 got retriggered?
[12/06/2019 15:33:05] <mgriessmeier> @coolo though I guess it wasn't you, but your account =)

This also happened in the past and it was only becoming apparent during pending milestone tests but maybe always happens.

https://openqa.suse.de/tests/2972769#next_previous shows that there was a job within the same build and 2972769 is a retrigger by slindomansilla however after that https://openqa.suse.de/admin/auditlog shows "about 2 hours ago coolo job_create { "id": 2973121 }" so the user "coolo" retriggered another time.

Checking on osd the file /var/log/openqa_rsync.log I can find:

Wed Jun 12 09:50:01 CEST 2019
Configured to deprioritize or cancel jobs from previous builds
Syncing 'sle12_sp5
…
{
   "ARCH" : "s390x",
   "BUILD" : "0197",
   "BUILD_HA" : "0074",
   "BUILD_HA_GEO" : "0057",
   "BUILD_SDK" : "0150",
   "BUILD_SLE" : "0197",
   "DISTRI" : "SLE",
   "FLAVOR" : "Server-DVD",
…

same as for the other three architectures "aarch64", "ppc64le" and "x86_64".

Then in between we have many cycles where no action is conducted:

Wed Jun 12 10:15:01 CEST 2019
Wed Jun 12 10:15:01 CEST 2019
Configured to deprioritize or cancel jobs from previous builds
Configured to deprioritize or cancel jobs from previous builds
Wed Jun 12 10:20:01 CEST 2019
…

until

Wed Jun 12 15:10:01 CEST 2019
Wed Jun 12 15:10:01 CEST 2019
Configured to deprioritize or cancel jobs from previous builds
Configured to deprioritize or cancel jobs from previous builds
Syncing 'sle12_sp5'
…
SLE-12-SP5-Server-MINI-ISO-x86_64-Build0197-Media.iso exists, skipped
add_sle_addons: No url found for: 'RT'
add_sle_addons: product: 'SAP'
add_sle_addons: product: 'HPC'
add_sle_addons: update_current_repo: 'HPC'
add_sle_addons: product: 'Live-Patching'
add_sle_addons: update_current_repo: 'Live-Patching'
  dist.suse.de::repos/SUSE:/SLE-12-SP5:/GA:/TEST/images/iso/SLE-12-SP5-Server-DVD-s390x-Build0197-Media1.iso
  -> SLE-12-SP5-Server-DVD-s390x-Build0197-Media1.iso...

sent 20 bytes  received 101 bytes  2.60 bytes/sec
total size is 3,959,422,976  speedup is 32,722,503.93
unsetting unused s390x installation ISO: SLE-12-SP5-Server-DVD-s390x-Build0197-Media1.iso
registering ...
'deprioritizing or cancelling currently running jobs (if any), setting '_DEPRIORITIZEBUILD'
{
   "ARCH" : "s390x",
   "BUILD" : "0197",
   "BUILD_HA" : "0074",
   "BUILD_HA_GEO" : "0057",
   "BUILD_SDK" : "0150",
   "BUILD_SLE" : "0197",
   "DISTRI" : "SLE",
   "FLAVOR" : "Server-DVD",

but only for s390x. So s390x is actually triggered twice for the same build.

History

#1 Updated by okurz 12 months ago

  • Status changed from New to In Progress
  • Assignee set to okurz

For build 0196 the first trigger point was before "Mon Jun 10 20:15:02 CEST 2019" and the next one after "Wed Jun 12 00:10:01 CEST 2019" so that is at least 4h apart! I guess either the detection of "a new build is published" does not work correctly anymore or something on IBS already triggers stuff twice.

Calling

env rsync_opts="--help" bash -ex /opt/openqa-scripts/openqa-iso-sync-sles sle12_sp5

on osd revealed that actually both "_product" and "000product" do not seem to exist anymore.

EDIT: Maybe that never existed in the ":TEST" subproject.

According to coolo we can try to change that to any package or no package at all for the :TEST sub directory, e.g.

iosc api /build/SUSE:SLE-12-SP5:GA:TEST/_result?code=failed

#2 Updated by okurz 12 months ago

  • Status changed from In Progress to Feedback

https://gitlab.suse.de/openqa/scripts/merge_requests/335 is most probably not the fix to the problem but it might have quite some impact on timing, let's see what happens.

The idea I have to solve it properly is to use the idea we had for quite some time: Use event based triggering, probably based on
https://metacpan.org/pod/Mojo::RabbitMQ::Client#CONSUMER

#3 Updated by okurz 11 months ago

  • Priority changed from Normal to Low

Looking at https://openqa.suse.de/tests/latest?test=default&version=12-SP5&machine=s390x-kvm-sle12&flavor=Server-DVD&distri=sle&arch=s390x#next_previous it looks like there are still some double triggers however looks like less likely, probably due to my change. Maybe we can live with the current state until we have the work by "andriinikitin" on the obs sync and trigger plugin done. I could not find any ticket assigned to him though.

EDIT: 2019-09-23: seems we still have some double triggers, e.g. build 0307

#4 Updated by okurz 8 months ago

  • Assignee changed from okurz to andriinikitin

andriinikitin I guess this could be something for you: What are the plans to use "OBS Sync" for more than just staging? This could potentially solve this problem.

#5 Updated by andriinikitin 8 months ago

  • Assignee changed from andriinikitin to riafarov

riafarov , could you please have a look if this problem may be related to https://gitlab.suse.de/openqa/scripts/commit/c2706ab2df04c36d9171f2da96c9be0f1216bf52 and then maybe assign back to me or okurz

okurz I had drafts for SLE12-SP5, but after some discussions I got impression that it is not worth to migrate it to 'Obs Sync' and it may stick to rsync.pl until EOL. Do you think it makes sense?

#6 Updated by okurz 8 months ago

  • Status changed from Feedback to Blocked
  • Assignee changed from riafarov to andriinikitin

andriinikitin wrote:

riafarov , could you please have a look if this problem may be related to https://gitlab.suse.de/openqa/scripts/commit/c2706ab2df04c36d9171f2da96c9be0f1216bf52 and then maybe assign back to me or okurz

I can answer this one easily. The problem appeared way before the mentioned git commit so it can not be the cause of problems.

okurz I had drafts for SLE12-SP5, but after some discussions I got impression that it is not worth to migrate it to 'Obs Sync' and it may stick to rsync.pl until EOL. Do you think it makes sense?

Sure. By now there is not much benefit anymore for SLE12-SP5 however I am sure that we have the very same problem still for 15-SP2, e.g. following https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Installer-DVD&machine=s390x-kvm-sle12&test=default&version=15-SP2#next_previous I can find multiple occurences of jobs for the same builds which according to https://openqa.suse.de/admin/auditlog are scheduled by "geekotest" so not triggered by users. I.e. this ticket is blocked by #56855 if you agree.

#7 Updated by okurz 8 months ago

  • Blocked by action #56855: Move SLE-15-SP2:GA:TEST to ObsRsync Plugin added

#8 Updated by andriinikitin 8 months ago

  • Status changed from Blocked to Workable
  • Assignee changed from andriinikitin to okurz

okurz wrote:

The problem appeared way before the mentioned git commit so it can not be the cause of problems.

But the problem still looks related to that commit, which mentions that .iso for s390x is removed after rsync. So next time when wrapper scripts suspect that rsync is required (and version haven't changed) - they will ignore all platforms except s390x.

Also the problem doesn't seem to happen in 15sp2 after switching to plugin, so I am not sure if anything should be done here.

#9 Updated by andriinikitin 8 months ago

  • Blocked by deleted (action #56855: Move SLE-15-SP2:GA:TEST to ObsRsync Plugin)

#10 Updated by okurz 8 months ago

  • Status changed from Workable to Resolved
  • Assignee changed from okurz to andriinikitin
  • Target version set to Current Sprint

andriinikitin wrote:

Also the problem doesn't seem to happen in 15sp2 after switching to plugin, so I am not sure if anything should be done here.

Agreed. Nothing should be done here. I would say mainly you "solved" it for SLE15SP2 then and for SLE12SP5 we will (out-)live the problem.

Also available in: Atom PDF