why typhoon2 not accessible anymore ?

Added by michel_mno about 1 year ago.

=== extract irc #opensuse-buildservice
(03:38:03 PM) michel_mno: Hi there, who could have a look to typhoon2 ?

(03:38:03 PM) michel_mno: state seems to be bad as per


need to remove private flag

I do not know what has been done in the meantime,
but this afternoonA (20190430) the typhoon2 is operational again.

again new typhoon2 not accessible on 20190515.

no idea what's wrong here. in logfile.badhost it has:

tail *

==> openSUSE:Factory:PowerPC::images::000product:openSUSE-ftp-ftp-ppc64_ppc64le-0ca9d9ad333b73902741885057299df9 <==
empty build log?

==> openSUSE:Factory:PowerPC::images::000product:openSUSE-ftp-ftp-ppc64_ppc64le-3f542e990f8baadc0137c389394b7c42 <==
empty build log?

==> openSUSE:Factory:PowerPC::images::000product:openSUSE-ftp-ftp-ppc64_ppc64le-636bb457235e611222040c6bf253a7ab <==
empty build log?

==> openSUSE:Factory:PowerPC::images::000product:openSUSE-ftp-ftp-ppc64_ppc64le-ded5a75555462923d5b35ce0915667e3 <==
empty build log?

==> openSUSE:Factory:PowerPC::images::000product:openSUSE-ftp-ftp-ppc64_ppc64le-ee2da71835f83f550e25606c167a5664 <==
empty build log?

so we frequently have the situation that the build aborts without a logfile (manual abort before all
the packages have been fetched maybe?) and then the worker is marked as "bad". Now since we only have
the two workers for this job, we will run into the situation to wait for the only 2 workers and these
are marked as "bad" ...

adrian, mls ?

Adrian, did some action to "wipe the badhost information" yesterday 20190516, and so PowerPC scheduled task dispatched correctly.

New occurence on 20190528 :(

new occurence on 20190618 and 20190619, I do not have details of actions as per #opensuse-buildservice irc:

(03:15:32 PM) michel_mno: Hello adrianS,  seems there is a typhoon2 problem as reported as idle status in monitor page, but a PowerPC task is blocked as scheduled   Could you have a look ?
(03:28:26 PM) mstrigl: michel_mno: it is building now
(09:03:56 AM) michel_mno: mstrigl:  thanks for triggering the rebuilding yesterday, but this morning, still have same blocked scheduled task waiting for typhoon2 access (that is idle)
(10:35:36 AM) ismail: michel_mno: we are aware and looking at it atm
(10:48:18 AM) mstrigl: michel_mno: It is building again. I am monitoring this to find the root cause

The problem seems to be that if typhoon2 is already building a job and gets another job assigned. Then the job is getting a badhost entry for typhoon2. And is waiting forever.

Is there a new status for typhoon2 problem ? mstrigl was waiting for mls to discuss with.

This is a critical problem, blocking iso build for PowerPC TW isos and openQA tests since 20190614 snapshot.

It seems there were some actions done between 20190626 and 20190628 because the TW PowerPC isos were generated and submitted to openQA as per o3 status (1) But this monday 20190701 the scheduled task is blocked again (2) because of bad typhoon2 state.

What was done last week on typhoon2/OBS ? and could it be done again ?


new occurence on 20190710

typhoon2 - as many other hardware in OBS - is a very old, unreliable system (not in service). Keeping those machines up and running is quite an effort - and limited manpower means that it might take some time before those machines get fixed when they have problems.

So please do not rely on the availability of any OBS worker machine at any time. OBS is smart enough to handle broken builds and will reschedule them, if possible.

Hardware donations are welcome - but only enterprise hardware, which is in service... ;-)

