Project

General

Profile

Actions

action #138356

closed

coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

coordination #123800: [epic] Provide SUSE QE Tools services running in PRG2 aka. Prg CoLo

coordination #137630: [epic] QE (non-openQA) setup in PRG2

Migration of qam.suse.de to PRG2 size:M

Added by okurz 7 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2023-10-23
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

See parent

Acceptance criteria

  • AC1: Common services supplied from qam.suse.de, including at least teregen as well as dashboard.qam.suse.de, are supplied from PRG2 after migration

Suggestions

  • Announce upfront
  • Follow https://jira.suse.com/browse/ENGINFRA-3071
  • Monitor the situation, in particular gitlab CI pipelines for qem-dashboard and bot-ng
  • Adapt tooling were needed
  • Coordinate to configure firewall as needed

Rollback actions


Related issues 1 (0 open1 closed)

Copied to QA - action #139130: Migration of openqa-service to PRG2 size:MResolvedokurz

Actions
Actions #1

Updated by okurz 7 months ago

  • Status changed from New to In Progress
  • Priority changed from Normal to High

Meeting with mmanev about planned migration of qam2.suse.de. migration of qam2.suse.de to PRG2 datacenter planned this Wednesday impacting at least QAM template generation and dashboard.qam.suse.de . Expect the systems to be unavailable during the timeframe 2023-10-25 0700Z-1900Z. The VM has 50G of storage on a single virtual drive so the migration itself will likely not take too long. As the VM is still in an "old" network zone I provided rough requirements to mmanev about necessary inbound+outbound connections. Likely something will be missed and needs to be handled case by case. Expect a Slack thread opened by mmanev for coordination.

Actions #2

Updated by okurz 7 months ago

https://suse.slack.com/archives/C02CANHLANP/p1698066009284279

@here migration of qam2.suse.de to PRG2 datacenter planned this Wednesday impacting at least QAM template generation and dashboard.qam.suse.de . Expect the systems to be unavailable during the timeframe 2023-10-25 0700Z-1900Z. Further details in https://progress.opensuse.org/issues/138356

Actions #3

Updated by openqa_review 7 months ago

  • Due date set to 2023-11-07

Setting due date based on mean cycle time of SUSE QE Tools

Actions #4

Updated by okurz 7 months ago

  • Description updated (diff)
Actions #5

Updated by livdywan 7 months ago

  • Subject changed from Migration of qam.suse.de to PRG2 to Migration of qam.suse.de to PRG2 size:M

Discussed briefly in the estimations. Good as-is.

Actions #6

Updated by okurz 6 months ago

migration was delayed by SUSE-IT, planned in https://suse.slack.com/archives/C04MDKHQE20/p1698395123650769

(Marko Manev) Migration for qam2.suse.de
(Marko Manev) @Oliver Kurz Would it be OK to migrate this VM on Tuesday 31.10.2023? We were short staffed this week, so we could not get to it. We can also aim on Monday, but I would not want to re-schedule again.
(Oliver Kurz) Can we make it 02.11.2023?

Actions #7

Updated by okurz 6 months ago

again 2023-11-02 was not possible for SUSE-IT. I am suggesting to follow up today in https://suse.slack.com/archives/C04MDKHQE20/p1699002851906079?thread_ts=1698395123.650769&cid=C04MDKHQE20

Actions #8

Updated by okurz 6 months ago

  • Due date changed from 2023-11-07 to 2023-11-17

special hackweek due-date bump

Actions #10

Updated by jbaier_cz 6 months ago

The machine is migrated and DNS is changed: https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/4347; there were some minor issues with accessing some of the external resources (rufus, l3support) but that was solved. I still see some issue with template generator, will investigate.

Actions #11

Updated by jbaier_cz 6 months ago

And now we are again facing #92686, the other side of the NFS mount needs to be updated.

Actions #12

Updated by jbaier_cz 6 months ago

Issue solved (context for next time: https://suse.slack.com/archives/C029APBKLGK/p1699022608671839). Now we only have problems in qem-bot pipeline, I guess the DNS records are not updated another firewall problem.

Actions #13

Updated by jbaier_cz 6 months ago

  • Description updated (diff)

I am disabling pipeline scheduling for qem-bot, with all that retries and timeouts we might end-up with a ton of waiting jobs after the weekend.

Actions #15

Updated by okurz 6 months ago

thank you for your good work so far. Next steps pending reaction from SUSE-IT firewall admins, i.e. lhaleplidis to enable communication from gitlab CI runners to qam.suse.de as discussed in https://suse.slack.com/archives/C04MDKHQE20/p1699030210024579?thread_ts=1698395123.650769&cid=C04MDKHQE20

(Oliver Kurz) I read up to here, thanks for the work so far. So I understand NFS mounts are fine. And gitlab runners can't reach qam.suse.de yet so that needs firewall enablement?
(Jiri Novak) yes, i pinged lazaros also in pm
(Lazaros Haleplidis) here sorry on a meeting with US, let me read up
(Lazaros Haleplidis) ok, fixed, can you try once more please?
(Jan Baier) so far I do not see a change
(Jan Baier) problem still persists, see https://gitlab.suse.de/jbaier_cz/ci-test/-/jobs/1954839
(Jan Baier) btw. how can we find out which rules are currently in effect? Is there like a repo with configuration?

Actions #16

Updated by jbaier_cz 6 months ago

  • Description updated (diff)
Actions #17

Updated by okurz 6 months ago

  • Copied to action #139130: Migration of openqa-service to PRG2 size:M added
Actions #18

Updated by jbaier_cz 6 months ago

I believe we are good here and we made it just in time before HackWeek :)

Actions #19

Updated by okurz 6 months ago

  • Due date deleted (2023-11-17)
  • Status changed from In Progress to Resolved

Agreed, https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs looks good and I found no more related alerts.

Actions

Also available in: Atom PDF