Project

General

Profile

action #102975

Fix missing openqa.o.o data on metrics.o.o size:M

Added by okurz about 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Organisational
Target version:
Start date:
2021-11-24
Due date:
2021-12-10
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

See https://github.com/openSUSE/openSUSE-release-tools/issues/2672

metrics.o.o shows no current data.

Acceptance criteria

  • AC1: Current data of o3 can be seen in the metrics

Suggestions

  • The API route appears to work, but the graphs contain no data
  • Talk to Witold
  • Become an openSUSE hero, get access to the network and come up with a fix

History

#1 Updated by cdywan about 2 months ago

  • Subject changed from Fix missing openqa.o.o data on metrics.o.o to Fix missing openqa.o.o data on metrics.o.o size:M
  • Description updated (diff)
  • Status changed from New to Workable

#2 Updated by kraih about 2 months ago

  • Assignee set to kraih

#3 Updated by kraih about 2 months ago

Spoke with Witold, he will take a look at the Grafana setup.

#4 Updated by kraih about 2 months ago

  • Status changed from Workable to In Progress

#5 Updated by openqa_review about 2 months ago

  • Due date set to 2021-12-10

Setting due date based on mean cycle time of SUSE QE Tools

#6 Updated by okurz about 2 months ago

  • Status changed from In Progress to Feedback

We are still waiting for Witold to look into this. If there is no update until the start of next week some of us can try with openSUSE Heroes VPN ourselves for a start.

EDIT: I also tried something myself so I connected to the openSUSE Heroes VPN, could login to metrics.infra.opensuse.org as "okurz" but to fix systemd services I would need sudo permissions or the root password. So I asked in https://matrix.to/#/!WaFljdmUCIDWhFMJfr:libera.chat/$9TX85dwTKhLgAFGo6Hu3-dw0CpaiFJdCtyoa2l7yU68 for help

okurz@metrics:/home/okurz> systemctl --failed
  UNIT                                  LOAD   ACTIVE SUB    DESCRIPTION                                         
● osrt-metrics-access.service           loaded failed failed openSUSE Release Tools: metrics - access logs       
● osrt-metrics@openSUSE:Factory.service loaded failed failed openSUSE Release Tools: metrics for openSUSE:Factory

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

2 loaded units listed.
okurz@metrics:/home/okurz> systemctl status osrt-metrics-access.service
● osrt-metrics-access.service - openSUSE Release Tools: metrics - access logs
     Loaded: loaded (/usr/lib/systemd/system/osrt-metrics-access.service; disabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Fri 2021-12-03 04:27:49 UTC; 6h ago
TriggeredBy: ● osrt-metrics-access.timer
    Process: 2141 ExecStart=/usr/bin/osrt-metrics-access-aggregate (code=exited, status=255/EXCEPTION)
   Main PID: 2141 (code=exited, status=255/EXCEPTION)

Warning: some journal files were not opened due to insufficient permissions.

so two failed service but I am not allowed to read logs.

#7 Updated by okurz about 2 months ago

kraih over which chat channel can I reach Witold? Is he in https://app.element.io/#/room/#opensuse-admin:libera.chat ?

#8 Updated by kraih about 1 month ago

okurz wrote:

kraih over which chat channel can I reach Witold? Is he in https://app.element.io/#/room/#opensuse-admin:libera.chat ?

You can reach him on Slack. He told me that he won't get around to it before next week though. I'm trying to get some info on how things are set up in the meantime (so i can maybe take a look before that).

#9 Updated by kraih about 1 month ago

Another interesting tidbit, Witold was not the one who set up metrics.o.o, he has just recently reverse engineered and fixed access metrics. Who set up the machine originally is still unknown was Jimmy Berry, and he did not leave much documentation unfortunately.

#10 Updated by okurz about 1 month ago

  • Status changed from Feedback to Resolved

So we fixed it together now. I added the "tools-team" from os-autoinst github organisation to /etc/grafana/grafana.ini so I could log in using github to https://metrics.opensuse.org and look into the panel to understand where it gets data from. We learned that telegraf is used and it gets its configuration from /usr/share/openSUSE-release-tools/metrics/telegraf which is maintained on github so I created
https://github.com/openSUSE/openSUSE-release-tools/pull/2673
and edited it locally, then restarted osrt-metrics-telegraf.service

Also available in: Atom PDF