Project

General

Profile

tickets #50852

why typhoon2 not accessible anymore ?

Added by michel_mno about 1 year ago. Updated 7 months ago.

Status:
Closed
Priority:
High
Assignee:
Category:
OBS
Start date:
2019-04-29
Due date:
% Done:

100%

Estimated time:
Duration:

Description

=== extract irc #opensuse-buildservice
(03:38:03 PM) michel_mno: Hi there, who could have a look to typhoon2 ?

(03:38:03 PM) michel_mno: state seems to be bad as per https://build.opensuse.org/project/monitor/openSUSE:Factory:PowerPC?arch_local=1&defaults=0&repo_images=1&scheduled=1#

History

#1 Updated by michel_mno about 1 year ago

need to remove private flag

#2 Updated by cboltz about 1 year ago

  • Category set to OBS
  • Assignee set to opensuse-admin-obs
  • Private changed from Yes to No

#3 Updated by michel_mno about 1 year ago

I do not know what has been done in the meantime,
but this afternoonA (20190430) the typhoon2 is operational again.

#4 Updated by michel_mno about 1 year ago

again new typhoon2 not accessible on 20190515.

#5 Updated by oertel about 1 year ago

  • Assignee changed from opensuse-admin-obs to adrianSuSE

no idea what's wrong here. in logfile.badhost it has:

tail *

==> openSUSE:Factory:PowerPC::images::000product:openSUSE-ftp-ftp-ppc64_ppc64le-0ca9d9ad333b73902741885057299df9 <==
empty build log?

==> openSUSE:Factory:PowerPC::images::000product:openSUSE-ftp-ftp-ppc64_ppc64le-3f542e990f8baadc0137c389394b7c42 <==
empty build log?

==> openSUSE:Factory:PowerPC::images::000product:openSUSE-ftp-ftp-ppc64_ppc64le-636bb457235e611222040c6bf253a7ab <==
empty build log?

==> openSUSE:Factory:PowerPC::images::000product:openSUSE-ftp-ftp-ppc64_ppc64le-ded5a75555462923d5b35ce0915667e3 <==
empty build log?

==> openSUSE:Factory:PowerPC::images::000product:openSUSE-ftp-ftp-ppc64_ppc64le-ee2da71835f83f550e25606c167a5664 <==
empty build log?

so we frequently have the situation that the build aborts without a logfile (manual abort before all
the packages have been fetched maybe?) and then the worker is marked as "bad". Now since we only have
the two workers for this job, we will run into the situation to wait for the only 2 workers and these
are marked as "bad" ...

adrian, mls ?

#6 Updated by michel_mno about 1 year ago

Adrian, did some action to "wipe the badhost information" yesterday 20190516, and so PowerPC scheduled task dispatched correctly.

#7 Updated by michel_mno about 1 year ago

New occurence on 20190528 :(

#8 Updated by michel_mno about 1 year ago

new occurence on 20190618 and 20190619, I do not have details of actions as per #opensuse-buildservice irc:

(03:15:32 PM) michel_mno: Hello adrianS,  seems there is a typhoon2 problem as reported as idle status in monitor page, but a PowerPC task is blocked as scheduled https://build.opensuse.org/project/monitor/openSUSE:Factory:PowerPC?arch_local=1&defaults=0&repo_images=1&scheduled=1#   Could you have a look ?
(03:28:26 PM) mstrigl: michel_mno: it is building now
...
(09:03:56 AM) michel_mno: mstrigl:  thanks for triggering the rebuilding yesterday, but this morning, still have same blocked scheduled task https://build.opensuse.org/project/monitor/openSUSE:Factory:PowerPC?arch_local=1&defaults=0&repo_images=1&scheduled=1 waiting for typhoon2 access (that is idle)
(10:35:36 AM) ismail: michel_mno: we are aware and looking at it atm
(10:48:18 AM) mstrigl: michel_mno: It is building again. I am monitoring this to find the root cause

#9 Updated by mstrigl about 1 year ago

  • Status changed from New to In Progress

The problem seems to be that if typhoon2 is already building a job and gets another job assigned. Then the job is getting a badhost entry for typhoon2. And is waiting forever.

#10 Updated by michel_mno about 1 year ago

Is there a new status for typhoon2 problem ? mstrigl was waiting for mls to discuss with.

This is a critical problem, blocking iso build for PowerPC TW isos and openQA tests since 20190614 snapshot.

#11 Updated by michel_mno about 1 year ago

It seems there were some actions done between 20190626 and 20190628 because the TW PowerPC isos were generated and submitted to openQA as per o3 status (1) But this monday 20190701 the scheduled task is blocked again (2) because of bad typhoon2 state.

What was done last week on typhoon2/OBS ? and could it be done again ?

(1) https://openqa.opensuse.org/group_overview/4
(2) https://build.opensuse.org/project/monitor/openSUSE:Factory:PowerPC?arch_local=1&defaults=0&repo_images=1&scheduled=1

#12 Updated by michel_mno about 1 year ago

new occurence on 20190710

#13 Updated by lrupp 7 months ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

typhoon2 - as many other hardware in OBS - is a very old, unreliable system (not in service). Keeping those machines up and running is quite an effort - and limited manpower means that it might take some time before those machines get fixed when they have problems.

So please do not rely on the availability of any OBS worker machine at any time. OBS is smart enough to handle broken builds and will reschedule them, if possible.

Hardware donations are welcome - but only enterprise hardware, which is in service... ;-)

Also available in: Atom PDF