Project

General

Profile

Actions

action #77209

closed

workers on o3 machine rebel provide no "WORKER_HOSTNAME" value anymore but it shows up in journal of worker service

Added by okurz almost 4 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Urgent
Category:
-
Target version:
Start date:
2020-11-09
Due date:
% Done:

0%

Estimated time:

Description

Observation

For example https://openqa.opensuse.org/admin/workers/384 shows an empty field for "WORKER_HOSTNAME".

Last good: https://openqa.opensuse.org/tests/1464503 from 2020-11-08 22:56Z
First bad: https://openqa.opensuse.org/tests/1464540 from 2020-11-09 04:40Z

what looks certainly related is that
https://github.com/os-autoinst/openQA/pull/3520
was merged just days ago but that was already deployed at 2020-11-08 01:25:01
says grep 'openQA-worker' /var/log/zypp/history so I see it as unlikely that the issue has been caused directly by that revert.

Acceptance criteria

  • AC1: The s390x test scenarios are not incompleting anymore due to "no WORKER_CLASS defined" on rebel

Suggestions

  • Compare with other workers
  • Check what could cause this, e.g. in /etc/openqa/workers.ini
  • Also look into #77014

Related issues 3 (1 open2 closed)

Related to openQA Project - action #77014: openQA webui entry "Assigned worker" shows ip instead of names as formerlyWorkable2020-11-05

Actions
Related to openQA Tests - action #69328: [o3][s390x] Early fail on s390x workers: connection refusedResolvedokurz2020-07-242020-11-13

Actions
Blocks openQA Tests - action #77116: test fails in bootloader_s390 - ftp installation media directory repo is too long for using in parmfile - linux144, linux145, linux146, linux147 (rebel)ResolvedSLindoMansilla2020-11-08

Actions
Actions #1

Updated by okurz almost 4 years ago

  • Related to action #77014: openQA webui entry "Assigned worker" shows ip instead of names as formerly added
Actions #2

Updated by okurz almost 4 years ago

  • Description updated (diff)
Actions #3

Updated by okurz almost 4 years ago

  • Related to action #69328: [o3][s390x] Early fail on s390x workers: connection refused added
Actions #4

Updated by SLindoMansilla almost 4 years ago

  • Assignee set to SLindoMansilla
Actions #5

Updated by SLindoMansilla almost 4 years ago

  • Blocks action #77116: test fails in bootloader_s390 - ftp installation media directory repo is too long for using in parmfile - linux144, linux145, linux146, linux147 (rebel) added
Actions #6

Updated by SLindoMansilla almost 4 years ago

I notice something weird, where the WORKER_HOSTNAME was under the webui section, so the setting was then not applied to the global section.

But, even after fixing that and restarting workers only worker 3 receives the WORKER_HOSTNAME setting.

It is maybe related to the fact that yesterday only linux146(rebel:3) could access the FTP repo. It could be that the other z/VM guests still need tweaking. I am going to continue investigating.

Actions #7

Updated by okurz almost 4 years ago

  • Assignee deleted (SLindoMansilla)

ok, also Ihno pointed out that the IPv4 address for the SUT is reused among the multiple instances. The machine "s390x" in https://openqa.opensuse.org/admin/machines had:

S390_NETWORK_PARAMS=OSAHWAddr= OSAMedium=eth InstNetDev=osa OSAInterface=qdio HostIP=192.168.112.10/24 Gateway=192.168.112.254 Nameserver=192.168.112.100 Domain=opensuse.org PortNo=0 Layer2=1 ReadChannel=0.0.0800 WriteChannel=0.0.0801 DataChannel=0.0.0802 Hostname=192.168.112.10

I have changed that now to

S390_NETWORK_PARAMS=OSAHWAddr= OSAMedium=eth InstNetDev=osa OSAInterface=qdio HostIP=192.168.112.@S390_HOST@/24 Hostname=@S390_HOST@ Gateway=192.168.112.254 Nameserver=192.168.112.100 Domain=opensuse.org PortNo=0 Layer2=1 ReadChannel=0.0.0800 WriteChannel=0.0.0801 DataChannel=0.0.0802

similar to what we had on osd.

on o3 I have added according entries in
o3:/etc/hosts

# s390x SUT IPs, see rebel:/etc/openqa/workers.ini
192.168.112.144  s390linux144
192.168.112.145  s390linux145
192.168.112.146  s390linux146
192.168.112.147  s390linux147
Actions #8

Updated by okurz almost 4 years ago

  • Status changed from Workable to In Progress
  • Assignee set to SLindoMansilla
Actions #9

Updated by SLindoMansilla almost 4 years ago

I have changed the machine name to s390x-zVM-vswitch-l2 in O3 (like in OSD) to distinguish from vswitch-l3, kvm-sle12*, kvm-sle15*

  • Job group template updated
  • workers.ini updated
Actions #10

Updated by okurz almost 4 years ago

  • Subject changed from workers on o3 machine provide no "WORKER_HOSTNAME" value anymore but it shows up in journal of worker service to workers on o3 machine rebel provide no "WORKER_HOSTNAME" value anymore but it shows up in journal of worker service
Actions #11

Updated by SLindoMansilla almost 4 years ago

  • Status changed from In Progress to Resolved

It looks like my change in workers.ini actually fixed it. But, the WORKER_HOSTNAME setting in the worker page is not updated until next job is run.
Good to know :)

I have verified that the missing WORKER_HOSTNAME is no more an issue.

Actions #12

Updated by okurz almost 4 years ago

SLindoMansilla wrote:

But, the WORKER_HOSTNAME setting in the worker page is not updated until next job is run.

wow! That was the missing part for me :D Thanks a lot.

Actions #13

Updated by livdywan almost 4 years ago

Do we know if #3520 was deployed before you made the .ini file changes? Should we revert the revert?

Actions #14

Updated by SLindoMansilla almost 4 years ago

cdywan wrote:

Do we know if #3520 was deployed before you made the .ini file changes? Should we revert the revert?

In any case, now it is for sure deployed, and it is working. The problem was in the workers.ini. I don't think that this PR had something to do with it.

Actions

Also available in: Atom PDF