Project

General

Profile

action #77209

workers on o3 machine rebel provide no "WORKER_HOSTNAME" value anymore but it shows up in journal of worker service

Added by okurz 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Urgent
Target version:
Start date:
2020-11-09
Due date:
% Done:

0%

Estimated time:

Description

Observation

For example https://openqa.opensuse.org/admin/workers/384 shows an empty field for "WORKER_HOSTNAME".

Last good: https://openqa.opensuse.org/tests/1464503 from 2020-11-08 22:56Z
First bad: https://openqa.opensuse.org/tests/1464540 from 2020-11-09 04:40Z

what looks certainly related is that
https://github.com/os-autoinst/openQA/pull/3520
was merged just days ago but that was already deployed at 2020-11-08 01:25:01
says grep 'openQA-worker' /var/log/zypp/history so I see it as unlikely that the issue has been caused directly by that revert.

Acceptance criteria

  • AC1: The s390x test scenarios are not incompleting anymore due to "no WORKER_CLASS defined" on rebel

Suggestions

  • Compare with other workers
  • Check what could cause this, e.g. in /etc/openqa/workers.ini
  • Also look into #77014

Related issues

Related to openQA Project - action #77014: openQA webui entry "Assigned worker" shows ip instead of names as formerlyWorkable2020-11-05

Related to openQA Tests - action #69328: [o3][s390x] Early fail on s390x workers: connection refusedResolved2020-07-242020-11-13

Blocks openQA Tests - action #77116: test fails in bootloader_s390 - ftp installation media directory repo is too long for using in parmfile - linux144, linux145, linux146, linux147 (rebel)Resolved2020-11-08

History

#1 Updated by okurz 2 months ago

  • Related to action #77014: openQA webui entry "Assigned worker" shows ip instead of names as formerly added

#2 Updated by okurz 2 months ago

  • Description updated (diff)

#3 Updated by okurz 2 months ago

  • Related to action #69328: [o3][s390x] Early fail on s390x workers: connection refused added

#4 Updated by SLindoMansilla 2 months ago

  • Assignee set to SLindoMansilla

#5 Updated by SLindoMansilla 2 months ago

  • Blocks action #77116: test fails in bootloader_s390 - ftp installation media directory repo is too long for using in parmfile - linux144, linux145, linux146, linux147 (rebel) added

#6 Updated by SLindoMansilla 2 months ago

I notice something weird, where the WORKER_HOSTNAME was under the webui section, so the setting was then not applied to the global section.

But, even after fixing that and restarting workers only worker 3 receives the WORKER_HOSTNAME setting.

It is maybe related to the fact that yesterday only linux146(rebel:3) could access the FTP repo. It could be that the other z/VM guests still need tweaking. I am going to continue investigating.

#7 Updated by okurz 2 months ago

  • Assignee deleted (SLindoMansilla)

ok, also Ihno pointed out that the IPv4 address for the SUT is reused among the multiple instances. The machine "s390x" in https://openqa.opensuse.org/admin/machines had:

S390_NETWORK_PARAMS=OSAHWAddr= OSAMedium=eth InstNetDev=osa OSAInterface=qdio HostIP=192.168.112.10/24 Gateway=192.168.112.254 Nameserver=192.168.112.100 Domain=opensuse.org PortNo=0 Layer2=1 ReadChannel=0.0.0800 WriteChannel=0.0.0801 DataChannel=0.0.0802 Hostname=192.168.112.10

I have changed that now to

S390_NETWORK_PARAMS=OSAHWAddr= OSAMedium=eth InstNetDev=osa OSAInterface=qdio HostIP=192.168.112.@S390_HOST@/24 Hostname=@S390_HOST@ Gateway=192.168.112.254 Nameserver=192.168.112.100 Domain=opensuse.org PortNo=0 Layer2=1 ReadChannel=0.0.0800 WriteChannel=0.0.0801 DataChannel=0.0.0802

similar to what we had on osd.

on o3 I have added according entries in
o3:/etc/hosts

# s390x SUT IPs, see rebel:/etc/openqa/workers.ini
192.168.112.144  s390linux144
192.168.112.145  s390linux145
192.168.112.146  s390linux146
192.168.112.147  s390linux147

#8 Updated by okurz 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to SLindoMansilla

#9 Updated by SLindoMansilla 2 months ago

I have changed the machine name to s390x-zVM-vswitch-l2 in O3 (like in OSD) to distinguish from vswitch-l3, kvm-sle12*, kvm-sle15*

  • Job group template updated
  • workers.ini updated

#10 Updated by okurz 2 months ago

  • Subject changed from workers on o3 machine provide no "WORKER_HOSTNAME" value anymore but it shows up in journal of worker service to workers on o3 machine rebel provide no "WORKER_HOSTNAME" value anymore but it shows up in journal of worker service

#11 Updated by SLindoMansilla 2 months ago

  • Status changed from In Progress to Resolved

It looks like my change in workers.ini actually fixed it. But, the WORKER_HOSTNAME setting in the worker page is not updated until next job is run.
Good to know :)

I have verified that the missing WORKER_HOSTNAME is no more an issue.

#12 Updated by okurz 2 months ago

SLindoMansilla wrote:

But, the WORKER_HOSTNAME setting in the worker page is not updated until next job is run.

wow! That was the missing part for me :D Thanks a lot.

#13 Updated by cdywan 2 months ago

Do we know if #3520 was deployed before you made the .ini file changes? Should we revert the revert?

#14 Updated by SLindoMansilla 2 months ago

cdywan wrote:

Do we know if #3520 was deployed before you made the .ini file changes? Should we revert the revert?

In any case, now it is for sure deployed, and it is working. The problem was in the workers.ini. I don't think that this PR had something to do with it.

Also available in: Atom PDF