Project

General

Profile

Actions

action #159669

closed

No new openQA data on metrics.opensuse.org since o3 migration to PRG2 size:S

Added by okurz 8 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Regressions/Crashes
Start date:
2024-04-26
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://metrics.opensuse.org/d/osrt_openqa/osrt-openqa?orgId=1&from=1682065021494&to=1706492379258 shows that there is no new data since about the time that o3 was migrated to PRG2 as part of #132143

Suggestions

  • Find out again who are admins of that instance and collaborate with them to fix the problem, possibly @witekbedyk (Witold Bedyk)?
  • Check which ports the firewall needs to allow if any (supposedly just 80 and 443)
  • The influxdb API route https://openqa.opensuse.org/admin/influxdb/jobs is generally available. Ensure that metrics.opensuse.org can read that as well

Related issues 2 (1 open1 closed)

Related to openQA Infrastructure (public) - action #132143: Migration of o3 VM to PRG2 - 2023-07-19 size:MResolvednicksinger2023-06-29

Actions
Related to openSUSE admin - tickets #123828: metrics.o.o: no OSRT:Review graphsNewwitekbedyk2023-01-31

Actions
Actions #1

Updated by okurz 8 months ago

  • Related to action #132143: Migration of o3 VM to PRG2 - 2023-07-19 size:M added
Actions #2

Updated by livdywan 7 months ago

  • Subject changed from Missing openQA data on metrics.opensuse.org since o3 migration to PRG2 to No new openQA data on metrics.opensuse.org since o3 migration to PRG2 size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by okurz 6 months ago

  • Target version changed from Ready to Tools - Next
Actions #4

Updated by okurz 6 months ago

  • Parent task changed from #123800 to #162146
Actions #5

Updated by okurz 5 months ago

  • Target version changed from Tools - Next to Ready
Actions #6

Updated by tinita 5 months ago

Actions #7

Updated by livdywan 5 months ago

  • Description updated (diff)
Actions #8

Updated by witekbedyk 5 months ago

I only maintain the access metrics service on metrics.o.o. and had never touched the others.

From what I can see in the systemd journal the osrt-metrics-telegraf.service fails when attempting to connect to http://openqa.opensuse.org/admin/influxdb/jobs [1]. Please make sure the service is available.

[1] https://github.com/openSUSE/openSUSE-release-tools/blob/master/metrics/telegraf/openqa.conf#L2

Actions #9

Updated by tinita 5 months ago

Maybe it just needs to use https now?

Actions #10

Updated by witekbedyk 5 months ago

# curl --digest https://openqa.opensuse.org/admin/influxdb/jobs
curl: (7) Failed to connect to openqa.opensuse.org port 443 after 0 ms: Couldn't connect to server
Actions #11

Updated by tinita 5 months ago

Ok. Who can help to make that url available from the telegraf instance then?
It's certainly available from here.

% curl --digest https://openqa.opensuse.org/admin/influxdb/jobs
openqa_jobs,url=https://openqa.opensuse.org blocked=0i,running=11i,scheduled=24i
...
Actions #12

Updated by crameleon 5 months ago

@witekbedyk

fails when attempting

That's a very vague error description, it is better to analyze the issue based on the exact error message.

metrics (metrics.o.o):~ # journalctl --no-pager -n1 -u osrt-metrics-telegraf
Aug 05 16:07:20 metrics osrt-metrics[594]: 2024-08-05T16:07:20Z E! [inputs.http] Error in plugin: [url=http://openqa.opensuse.org/admin/influxdb/jobs]: Get "http://openqa.opensuse.org/admin/influxdb/jobs": dial tcp 192.168.47.13:80: connect: network is unreachable

1) 192.168.47.13 is a private IPv4 address which is not in our address space. Even if it was, I would be surprised if you were able to reach it from an IPv6-only network.

metrics (metrics.o.o):~ # grep openqa /etc/hosts
192.168.47.13           openqa.opensuse.org

2) Not only is the address wrong, there is also an /etc/hosts override which should not be there - it is better practice to use DNS, to avoid receiving outdated IP addresses to begin with.

Destination unreachable: Administratively prohibited

3) After the the bogus hosts entry is removed, the above error, or one similar to it, might be presented instead, because the maintainer of the machine never requested access to this resource.

Actions #13

Updated by witekbedyk 4 months ago

I've deleted the bogus hosts entry. Now we see the following error message:

Aug 06 08:17:30 metrics osrt-metrics[14767]: 2024-08-06T08:17:30Z E! [inputs.http] Error in plugin: [url=http://openqa.opensuse.org/admin/influxdb/jobs]: Get "http://openqa.opensuse.org/admin/influxdb/jobs": dial tcp [2a07:de40:b251:2:10:150:2:10]:80: connect: permission denied
Actions #14

Updated by okurz 4 months ago

Can you change that to https?

Actions #15

Updated by witekbedyk 4 months ago

I have changed to https:

Aug 06 12:23:30 metrics osrt-metrics[17461]: 2024-08-06T12:23:30Z E! [inputs.http] Error in plugin: [url=https://openqa.opensuse.org/admin/influxdb/jobs]: Get "https://openqa.opensuse.org/admin/influxdb/jobs": dial tcp [2a07:de40:b251:2:10:150:2:10]:443: connect: permission denied
Actions #16

Updated by okurz 4 months ago

The URL and IPv6 address are correct. "permission denied" is unexpected but likely means we reach the right server. I guess as next step one could try to replicate this manually with telegraf accessing the route from elsewhere.

Or we try to tweak the nginx config on o3 in /etc/nginx/vhosts.d/openqa.conf to allow HTTP access to admin/influxdb/jobs

If https
Also TODO: Create PR to update https://github.com/openSUSE/openSUSE-release-tools/blob/master/metrics/telegraf/openqa.conf#L2

Actions #17

Updated by tinita 4 months ago

When searching for this error message, I only find references to docker, some related to influxdb: https://community.influxdata.com/t/got-permission-denied-while-trying-to-connect-to-the-docker-daemon-socket/23353/3
So maybe there is something wrong with the telegraf/influxdb installation.

Actions #18

Updated by tinita 4 months ago

  • Status changed from Workable to Feedback
  • Assignee set to tinita
Actions #19

Updated by tinita 4 months ago

Maybe one of us could also have a look, if we get access to the host where telegraf is running. Would that be possible?

Actions #20

Updated by nicksinger 4 months ago

okurz wrote in #note-16:

Also TODO: Create PR to update https://github.com/openSUSE/openSUSE-release-tools/blob/master/metrics/telegraf/openqa.conf#L2

https://github.com/openSUSE/openSUSE-release-tools/pull/3133

okurz wrote in #note-16:

The URL and IPv6 address are correct. "permission denied" is unexpected but likely means we reach the right server. I guess as next step one could try to replicate this manually with telegraf accessing the route from elsewhere.

I just tried it from my local machine and was able to receive data using http:// as well as https://

workstation ~ » telegraf --config /etc/telegraf/telegraf.conf --test | grep openqa
2024-08-09T09:43:44Z I! Loading config: /etc/telegraf/telegraf.conf
2024-08-09T09:43:44Z I! Starting Telegraf 1.26.3-
2024-08-09T09:43:44Z I! Available plugins: 235 inputs, 9 aggregators, 27 processors, 22 parsers, 57 outputs, 2 secret-stores
2024-08-09T09:43:44Z I! Loaded inputs: cpu http kernel linux_cpu mem net sensors system
2024-08-09T09:43:44Z I! Loaded aggregators:
2024-08-09T09:43:44Z I! Loaded processors: regex strings template (3x)
2024-08-09T09:43:44Z I! Loaded secretstores:
2024-08-09T09:43:44Z W! Outputs are not used in testing mode!
2024-08-09T09:43:44Z I! Tags enabled: host=workstation
> openqa_jobs,host=workstation,url=https://openqa.opensuse.org blocked=2i,running=32i,scheduled=24i 1723196625000000000
> openqa_jobs_by_group,group=Development,host=workstation,url=https://openqa.opensuse.org scheduled=14i 1723196625000000000
> openqa_jobs_by_group,group=No\ Group,host=workstation,url=https://openqa.opensuse.org running=4i 1723196625000000000
> openqa_jobs_by_group,group=Others,host=workstation,url=https://openqa.opensuse.org blocked=2i,running=18i 1723196625000000000
> openqa_jobs_by_group,group=openSUSE\ Leap\ 15.5\ Updates,host=workstation,url=https://openqa.opensuse.org running=1i 1723196625000000000
> openqa_jobs_by_group,group=openSUSE\ Leap\ 15.6\ Maintenance,host=workstation,url=https://openqa.opensuse.org running=9i 1723196625000000000
> openqa_jobs_by_group,group=openSUSE\ Tumbleweed\ AArch64,host=workstation,url=https://openqa.opensuse.org scheduled=10i 1723196625000000000
> openqa_jobs_by_worker,host=workstation,url=https://openqa.opensuse.org,worker=openqaworker-arm21 running=1i 1723196625000000000
> openqa_jobs_by_worker,host=workstation,url=https://openqa.opensuse.org,worker=openqaworker20 running=6i 1723196625000000000
> openqa_jobs_by_worker,host=workstation,url=https://openqa.opensuse.org,worker=openqaworker21 running=3i 1723196625000000000
> openqa_jobs_by_worker,host=workstation,url=https://openqa.opensuse.org,worker=openqaworker22 running=3i 1723196625000000000
> openqa_jobs_by_worker,host=workstation,url=https://openqa.opensuse.org,worker=openqaworker23 running=3i 1723196625000000000
> openqa_jobs_by_worker,host=workstation,url=https://openqa.opensuse.org,worker=openqaworker24 running=5i 1723196625000000000
> openqa_jobs_by_worker,host=workstation,url=https://openqa.opensuse.org,worker=openqaworker25 running=5i 1723196625000000000
> openqa_jobs_by_worker,host=workstation,url=https://openqa.opensuse.org,worker=openqaworker26 running=6i 1723196625000000000
> openqa_jobs_by_arch,arch=aarch64,host=workstation,url=https://openqa.opensuse.org running=1i,scheduled=21i 1723196625000000000
> openqa_jobs_by_arch,arch=arm,host=workstation,url=https://openqa.opensuse.org scheduled=3i 1723196625000000000
> openqa_jobs_by_arch,arch=x86_64,host=workstation,url=https://openqa.opensuse.org blocked=2i,running=31i 1723196625000000000

tinita wrote in #note-19:

Maybe one of us could also have a look, if we get access to the host where telegraf is running. Would that be possible?

I currently also have no further idea what we could do from our side.

Actions #21

Updated by tinita 4 months ago

  • Status changed from Feedback to Blocked
Actions #22

Updated by okurz 4 months ago

blocked on what?

Actions #23

Updated by tinita 4 months ago

We said we don't want tickets in Feedback which we don't work on in the team.
This ticket here is blocked on feedback from someone who has access to the machine and can give us access.
We can't do anything here, so it shouldn't count for our ticket time.

Actions #24

Updated by okurz 4 months ago

  • Due date set to 2024-09-16
  • Status changed from Blocked to Feedback
  • Assignee changed from tinita to okurz

That unfortunately never worked for us in the past. "Blocked" should only be used if we have an external reference that we should look into for the current status. If we don't have that then the ticket should stay in "Feedback" which shows that there is nowhere else to look for the current status or more information.

@witekbedyk I have access to .infra.opensuse.org but couldn't access metrics.infra.opensuse.org . https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=12045 says the host has 192.168.47.31 which is also what I saved in my /etc/hosts. Can you provide me current connection details and authentication so that I can login to the host and try myself?

Actions #25

Updated by witekbedyk 4 months ago

metrics.infra.opensuse.org has IPv6 address 2a07:de40:b27e:1203::141

Actions #26

Updated by witekbedyk 4 months ago · Edited

Please also compare #123828

Actions #27

Updated by okurz 4 months ago

witekbedyk wrote in #note-25:

metrics.infra.opensuse.org has IPv6 address 2a07:de40:b27e:1203::141

great! I could login and will investigate.

Actions #28

Updated by okurz 4 months ago · Edited

ok, I updated https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=12045 with the correct IP addresses then.

okurz@metrics:/home/okurz> curl -6 -vvvv --digest http://openqa.opensuse.org/
*   Trying [2a07:de40:b251:2:10:150:2:10]:80...
* connect to 2a07:de40:b251:2:10:150:2:10 port 80 failed: Permission denied
* Failed to connect to openqa.opensuse.org port 80 after 2 ms: Couldn't connect to server
* Closing connection 0
curl: (7) Failed to connect to openqa.opensuse.org port 80 after 2 ms: Couldn't connect to server
okurz@metrics:/home/okurz> ping -c1 openqa.opensuse.org
PING openqa.opensuse.org(openqa.opensuse.org (2a07:de40:b251:2:10:150:2:10)) 56 data bytes
From 2a07:de40:b27e:1203::3 (2a07:de40:b27e:1203::3) icmp_seq=1 Destination unreachable: Administratively prohibited

I suspect a firewall is giving us explicit REJECT here. I could not see anything from ariel aka. o3 that should prevent this. Also temporarily disabled the firewall on ariel but no change in observation. With my non-root limited account on metrics I could also not see a firewall in effect.

@crameleon as you stated in #159669-12

Destination unreachable: Administratively prohibited
3) After the the bogus hosts entry is removed, the above error, or one similar to it, might be presented instead, because the maintainer of the machine never requested access to this resource.

Is there a network-level firewall in effect here? What do mean with "the maintainer of the machine never requested access to this resource."? which machine, which resource? If you mean access from metrics.i.o.o to openqa.o.o 80/tcp or 443/tcp then this statement is wrong as that access was present and possible in before hence the access was requested but obviously it would have been many years ago.

@witekbedyk can you provide me root permissions on the machine either with the help of sudo or the root password so that I could install network debug tools, check the firewall, etc.?

Actions #29

Updated by okurz 4 months ago

  • Priority changed from Normal to Low
Actions #30

Updated by crameleon 4 months ago · Edited

Is there a network-level firewall in effect here?

All of infra.opensuse.org is firewalled.

then this statement is wrong as that access was present and possible in before hence the access was requested but obviously it would have been many years ago.

I informed about this change in November: https://lists.opensuse.org/archives/list/heroes@lists.opensuse.org/message/4YV256BAZPZ4ILA3MI76MVO2BSKBB265/. The maintainer of metrics.i.o.o never reached out about requiring any additional access.

Actions #31

Updated by okurz 4 months ago

crameleon wrote in #note-30:

Is there a network-level firewall in effect here?

All of infra.opensuse.org is firewalled.

then this statement is wrong as that access was present and possible in before hence the access was requested but obviously it would have been many years ago.

I informed about this change in November: https://lists.opensuse.org/archives/list/heroes@lists.opensuse.org/message/4YV256BAZPZ4ILA3MI76MVO2BSKBB265/. The maintainer of metrics.i.o.o never reached out about requiring any additional access.

well, regardless, I am reaching out now :) So can you allow the according access from metrics.infra.opensuse.org to openqa.opensuse.org 443/tcp?

Actions #32

Updated by crameleon 4 months ago · Edited

The reason I did not yet intervene is there being another request related to networking of the metrics.i.o.o machine (https://progress.opensuse.org/issues/123828) lacking maintainer input as well. I prefer the maintainer(s) of a machine to collaborate with others in the admin team to establish all that is needed for their service as opposed to me implementing miscellaneous changes to help users of a service without any acknowledgment of what is really needed by one of the listed maintainers.

That being said, I don't want to block you with politics.

Just to avoid surprises after my implementation: since openQA resides behind a SUSE-side NAT, could you please confirm the IP address I find in DNS (2a07:de40:b251:2:10:150:2:10) is actually the source address of the machine as visible from our openSUSE network?

You can do so using the following on the machine behind openqa.opensuse.org:

curl -6 ip.opensuse.org

Potentially with https:// - I hope to recall your host already being allowed to reach us over either HTTP or HTTPS on the SUSE side correctly.

Actions #33

Updated by crameleon 4 months ago

Sorry, I mixed up the direction of traffic, I don't need your source address for this. ;-)

Submitted via https://gitlab.infra.opensuse.org/infra/salt/-/merge_requests/2032.
Will update once done (can't assign ticket to me as not part of the project).

Actions #34

Updated by crameleon 4 months ago

Committed as https://progress.opensuse.org/projects/opensuse-admin/repository/salt/revisions/a9ffb36776ac7f00ac5651c05ea2cffe9519f9b2 and tested:

crameleon@metrics:/home/crameleon> curl -sI https://openqa.opensuse.org|head -n3
HTTP/2 200
server: nginx/1.21.5
date: Tue, 20 Aug 2024 19:57:09 GMT
Actions #35

Updated by tinita 4 months ago

Thank you!

Actions #36

Updated by okurz 4 months ago

  • Due date deleted (2024-09-16)
  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF