Project

General

Profile

Actions

action #153715

closed

coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

coordination #153685: [epic] Move from SUSE NUE1 (Maxtorhof) to PRG2e

Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - whale

Added by okurz 3 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Target version:
Start date:
2024-01-16
Due date:
% Done:

0%

Estimated time:

Description

Acceptance criteria

  • AC1: whale is usable from PRG2

Suggestions


Related issues 2 (1 open1 closed)

Copied from QA - action #153709: Move of selected LSG QE machines NUE1 to PRG2e - ada size:MResolvedokurz2024-01-16

Actions
Copied to QA - action #153718: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldir size:MIn Progressnicksinger2024-01-162024-05-01

Actions
Actions #1

Updated by okurz 3 months ago

  • Copied from action #153709: Move of selected LSG QE machines NUE1 to PRG2e - ada size:M added
Actions #2

Updated by okurz 3 months ago

  • Status changed from New to Blocked
  • Priority changed from Normal to Low
Actions #3

Updated by okurz 3 months ago

  • Copied to action #153718: Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldir size:M added
Actions #4

Updated by okurz 2 months ago

https://jira.suse.com/browse/ENGINFRA-3686 was marked as done but I could not reach the machine over IPMI/HMC(?)

I reopened the ticket and added
https://jira.suse.com/browse/ENGINFRA-3686?focusedId=1329265&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1329265

From qe-jumpy I can not ping whale.qe-ipmi-ur (192.168.153.118) and with no information about PDU connections in https://racktables.nue.suse.com/index.php?object_id=9594&page=object&tab=default I would not know how to check or power cycle the machine. Can you please ensure that we can reach the machine? Otherwise we can't work with it from remote.

Actions #6

Updated by okurz 2 months ago

  • Due date set to 2024-03-13
  • Status changed from In Progress to Feedback
pvmctl lpar power-on -i name=whale-1
mkvterm -p whale-1

https://suse.slack.com/archives/C02CANHLANP/p1709126471127139?thread_ts=1708520004.766399&cid=C02CANHLANP

So whale-1 booted up fine and has network but a dynamic DHCP lease. Any hint on the credentials for whale-1 to login and find the MAC address and debug further?

Actions #7

Updated by okurz about 2 months ago

  • Status changed from Feedback to In Progress

I could login as root with default old root pw. The system whale-1 has network

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
    link/ether 76:58:02:7b:d8:f7 brd ff:ff:ff:ff:ff:ff
    altname env2
    inet 10.146.5.173/23 brd 10.146.5.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 2a07:de40:b230:1:9097:659e:5d44:236e/64 scope global temporary dynamic 
       valid_lft 602315sec preferred_lft 83579sec
    inet6 2a07:de40:b230:1:884:9766:e6ed:25bc/64 scope global temporary deprecated dynamic 
       valid_lft 516253sec preferred_lft 0sec
    inet6 2a07:de40:b230:1:3215:2f54:f811:50af/64 scope global temporary deprecated dynamic 
       valid_lft 430191sec preferred_lft 0sec
    inet6 2a07:de40:b230:1:c51d:8bc1:61d3:aec6/64 scope global temporary deprecated dynamic 
       valid_lft 344129sec preferred_lft 0sec
    inet6 2a07:de40:b230:1:6e95:d332:76e3:a0fc/64 scope global temporary deprecated dynamic 
       valid_lft 258067sec preferred_lft 0sec
    inet6 2a07:de40:b230:1:7532:a34b:3cdb:1c70/64 scope global temporary deprecated dynamic 
       valid_lft 172006sec preferred_lft 0sec
    inet6 2a07:de40:b230:1:7458:2ff:fe7b:d8f7/64 scope global dynamic mngtmpaddr 
       valid_lft 2591856sec preferred_lft 604656sec
    inet6 fe80::7458:2ff:fe7b:d8f7/64 scope link 
       valid_lft forever preferred_lft forever

but DNS says it should be 10.145.0.133. wicked ifup all did not change that. Triggering reboot. If that does not help to get the right address will ping IT to ask to check the DHCP logs on fozziebear for 76:58:02:7b:d8:f7

Actions #8

Updated by okurz about 2 months ago ยท Edited

Reboot did not help, same address received.

https://suse.slack.com/archives/C04MDKHQE20/p1709559520923729

As we are still denied access to fozziebear can someone from IT with access to that machine running DHCP for qe.prg2.suse.org please check the dhcpd logs for 76:58:02:7b:d8:f7 so that we can find out why the machine whale-1 gets a dynamic lease when it should get 10.145.0.133

Actions #9

Updated by okurz about 2 months ago

  • Status changed from In Progress to Feedback
Actions #10

Updated by okurz about 2 months ago

Also I need to wait for https://suse.slack.com/archives/C029APBKLGK/p1709629124723019?thread_ts=1709625519.090749&cid=C029APBKLGK

(Martin Caj) I can work on fix it... once I have the fix done I will send you MR to review in ok ?

from #help-it-ama to prevent conflicts on the file.

Actions #11

Updated by okurz about 2 months ago

  • Status changed from Feedback to In Progress

https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/4842 by mcaj was merged and as follow-up https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/4799. Logged in to whale and mkvterm -p whale-1, triggered reboot. Will check if it gets the IP addresses from the static lease after that.

Actions #12

Updated by okurz about 2 months ago

  • Status changed from In Progress to Feedback

Also after reboot whale-1 got 10.146.5.173/23 so back to https://suse.slack.com/archives/C04MDKHQE20/p1709559520923729 in #dct-migration

(Oliver Kurz) As we are still denied access to fozziebear can someone from IT with access to that machine running DHCP for qe.prg2.suse.org please check the dhcpd logs for 76:58:02:7b:d8:f7 so that we can find out why the machine whale-1 gets a dynamic lease when it should get 10.145.0.133

Actions #13

Updated by okurz about 2 months ago

  • Status changed from Feedback to In Progress

Apparently the wrong IP addresses were configured, only valid for PRG2, not PRG2e. mcaj fixed that with https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/4848, testing now.

Actions #14

Updated by okurz about 2 months ago

  • Status changed from In Progress to Feedback

https://suse.slack.com/archives/C02CANHLANP/p1709653369210609

@Martin Pluskal all VMs on whale should be usable now. See https://suse.slack.com/archives/C04MDKHQE20/p1709653283705009?thread_ts=1709559520.923729&cid=C04MDKHQE20 for context. whale-1.qe.prg2.suse.org is already reachable, others after reboot or when the DHCP leases expire. Please verify that all machines on whale can be used as expected or let me know where I can help otherwise.

Actions #15

Updated by okurz about 2 months ago

  • Due date deleted (2024-03-13)
  • Status changed from Feedback to Resolved

No further problems reported anymore. I assume we are good.

Actions

Also available in: Atom PDF