Project

General

Profile

Actions

action #170473

closed

k2.qe.suse.de not reachable from mania:2 size:S

Added by ph03nix 2 months ago. Updated 14 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Support
Start date:
2024-11-28
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

The PostGREST API endpoint at k2.qe.suse.de is not reachable from mania:2, see https://openqa.suse.de/tests/16036779#step/bci_version_check/36

k2.qe.suse.de is a database for collecting various BCI container stats. We need this connection to push container stats while testing. This used to work in the past, and still works for all architectures except ppc64

mania was recently announced to be part of the wireguard-tunnel project for getting CC-compliant zone-cc connectivity to those workers

Suggestions

  • Look into what was done for #170338 but don't block on it
  • This might need specific firewall rules, e.g. either for specific target host or port
  • There is already https://sd.suse.com/servicedesk/customer/portal/1/SD-174357 but we don't have access -> just ask @ph03nix to give us feedback based on progress in the SD ticket. We don't need to and probably don't want to participate in the ticket itself

Further details


Related issues 3 (0 open3 closed)

Related to openQA Infrastructure (public) - action #170338: No monitoring data from OSD since 2024-11-25 1449Z size:MResolvednicksinger2024-11-27

Actions
Related to openQA Tests (public) - action #170407: [qe-core][sle15sp7][xen]test fails in bootloader_svirt, nfsmount `openqa.suse.de:/var/lib/openqa/share/factory/hdd/fixed 6427781120 4264771584 2163009536 67% /var/lib/openqa/share/factory/hdd/fixed` seems goneResolved2024-11-28

Actions
Related to Containers and images - action #173542: [BCI] Re-Enable PowerKVM BCI test runsResolvedph03nix2024-12-02

Actions
Actions #1

Updated by nicksinger 2 months ago

mania: https://racktables.suse.de/index.php?page=object&tab=default&object_id=9588 -> fc basement
k2.qe.suse.de: https://racktables.suse.de/index.php?page=object&tab=default&object_id=19927 -> morla cluster, most likely prg2

the assessment that this could be connected to our (partial) wg setup came from me in https://suse.slack.com/archives/C02CANHLANP/p1732797379802919 - it might be totally unrelated but seems very plausible

Actions #2

Updated by ph03nix 2 months ago

nicksinger wrote in #note-1:

k2.qe.suse.de: https://racktables.suse.de/index.php?page=object&tab=default&object_id=19927 -> morla cluster, most likely prg2

Confirmed to be PRG2.

Actions #3

Updated by jbaier_cz 2 months ago

This likely needs the same type of care like in https://progress.opensuse.org/issues/170338#note-5

Actions #4

Updated by jbaier_cz 2 months ago

  • Related to action #170338: No monitoring data from OSD since 2024-11-25 1449Z size:M added
Actions #5

Updated by okurz 2 months ago

  • Parent task set to #166598
Actions #6

Updated by okurz 2 months ago

  • Tags set to infra, cc, nue2
  • Category set to Regressions/Crashes
  • Target version set to Ready
Actions #8

Updated by dzedro 2 months ago

  • Related to action #170407: [qe-core][sle15sp7][xen]test fails in bootloader_svirt, nfsmount `openqa.suse.de:/var/lib/openqa/share/factory/hdd/fixed 6427781120 4264771584 2163009536 67% /var/lib/openqa/share/factory/hdd/fixed` seems gone added
Actions #9

Updated by dzedro 2 months ago ยท Edited

Also unreal* svirt workers via sapworker1.qe.nue2.suse.org can't reach osd nfs e.g. from https://openqa.suse.de/tests/16032904#step/bootloader_svirt/15

Actions #10

Updated by mkittler 2 months ago

  • Subject changed from k2.qe.suse.de not reachable from mania:2 to k2.qe.suse.de not reachable from mania:2 size:S
  • Description updated (diff)
  • Category changed from Regressions/Crashes to Support
  • Status changed from New to Workable
Actions #11

Updated by okurz 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz
Actions #12

Updated by okurz 2 months ago

  • Due date set to 2024-12-18
  • Status changed from In Progress to Feedback

https://suse.slack.com/archives/C02CANHLANP/p1733323013781639

@Felix Niederwanger hi, in https://progress.opensuse.org/issues/170473 you referenced https://sd.suse.com/servicedesk/customer/portal/1/SD-174357 but we don't have access. Should we proceed to investigate why mania can't reach k2.qe.suse.de or track the ticket and you share with "OSD Admins"?

Actions #13

Updated by okurz 2 months ago

  • Due date deleted (2024-12-18)
  • Status changed from Feedback to Blocked

fniederwanger shared the ticket with us. It's already in progress so we can just block on https://sd.suse.com/servicedesk/customer/portal/1/SD-174357

Actions #14

Updated by ph03nix about 2 months ago

  • Related to action #173542: [BCI] Re-Enable PowerKVM BCI test runs added
Actions #15

Updated by ph03nix about 2 months ago

I can see now that the workers themselves can reach the host in question, however the openQA tests are still failing:

ph03nix@diesel:~> curl -qLf http://k2.qe.suse.de:8080/size >/dev/null && echo "OK"
...
OK
ph03nix@mania:~> curl -qLf http://k2.qe.suse.de:8080/size >/dev/null && echo "OK"
...
OK
ph03nix@petrol:~> curl -qLf http://k2.qe.suse.de:8080/size >/dev/null && echo "OK"
...
OK

Failures fresh from this morning:


Hypothesis: The openQA jobs are not routed over the wireguard tunnel. Asked for help in #eng-testing.

Actions #16

Updated by okurz about 2 months ago

  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)

https://sd.suse.com/servicedesk/customer/portal/1/SD-174357 was resovled with "We can confirm that the hosts are reachable from the machines themselves. Remaining issues are due to routing problems in the machines themselves. The firewall rules work however, ticket resolved.". Unassigning

Actions #17

Updated by livdywan 25 days ago

@ph03nix Can you clarify if you had success fixing the remaining issues? Or do you still need help from us here?

Actions #18

Updated by ph03nix 24 days ago

  • Status changed from Workable to Resolved

livdywan wrote in #note-17:

@ph03nix Can you clarify if you had success fixing the remaining issues? Or do you still need help from us here?

AFAICS all issues have been resolved, thanks for the reminder.

Actions #19

Updated by okurz 14 days ago

  • Assignee set to ph03nix
Actions

Also available in: Atom PDF