Project

General

Profile

Actions

action #102975

closed

Fix missing openqa.o.o data on metrics.o.o size:M

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Organisational
Target version:
Start date:
2021-11-24
Due date:
2021-12-10
% Done:

0%

Estimated time:

Description

Motivation

See https://github.com/openSUSE/openSUSE-release-tools/issues/2672

metrics.o.o shows no current data.

Acceptance criteria

  • AC1: Current data of o3 can be seen in the metrics

Suggestions

  • The API route appears to work, but the graphs contain no data
  • Talk to Witold
  • Become an openSUSE hero, get access to the network and come up with a fix
Actions #1

Updated by livdywan over 2 years ago

  • Subject changed from Fix missing openqa.o.o data on metrics.o.o to Fix missing openqa.o.o data on metrics.o.o size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #2

Updated by kraih over 2 years ago

  • Assignee set to kraih
Actions #3

Updated by kraih over 2 years ago

Spoke with Witold, he will take a look at the Grafana setup.

Actions #4

Updated by kraih over 2 years ago

  • Status changed from Workable to In Progress
Actions #5

Updated by openqa_review over 2 years ago

  • Due date set to 2021-12-10

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by okurz over 2 years ago

  • Status changed from In Progress to Feedback

We are still waiting for Witold to look into this. If there is no update until the start of next week some of us can try with openSUSE Heroes VPN ourselves for a start.

EDIT: I also tried something myself so I connected to the openSUSE Heroes VPN, could login to metrics.infra.opensuse.org as "okurz" but to fix systemd services I would need sudo permissions or the root password. So I asked in https://matrix.to/#/!WaFljdmUCIDWhFMJfr:libera.chat/$9TX85dwTKhLgAFGo6Hu3-dw0CpaiFJdCtyoa2l7yU68 for help

okurz@metrics:/home/okurz> systemctl --failed
  UNIT                                  LOAD   ACTIVE SUB    DESCRIPTION                                         
● osrt-metrics-access.service           loaded failed failed openSUSE Release Tools: metrics - access logs       
● osrt-metrics@openSUSE:Factory.service loaded failed failed openSUSE Release Tools: metrics for openSUSE:Factory

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

2 loaded units listed.
okurz@metrics:/home/okurz> systemctl status osrt-metrics-access.service
● osrt-metrics-access.service - openSUSE Release Tools: metrics - access logs
     Loaded: loaded (/usr/lib/systemd/system/osrt-metrics-access.service; disabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Fri 2021-12-03 04:27:49 UTC; 6h ago
TriggeredBy: ● osrt-metrics-access.timer
    Process: 2141 ExecStart=/usr/bin/osrt-metrics-access-aggregate (code=exited, status=255/EXCEPTION)
   Main PID: 2141 (code=exited, status=255/EXCEPTION)

Warning: some journal files were not opened due to insufficient permissions.

so two failed service but I am not allowed to read logs.

Actions #7

Updated by okurz over 2 years ago

@kraih over which chat channel can I reach Witold? Is he in https://app.element.io/#/room/#opensuse-admin:libera.chat ?

Actions #8

Updated by kraih over 2 years ago

okurz wrote:

@kraih over which chat channel can I reach Witold? Is he in https://app.element.io/#/room/#opensuse-admin:libera.chat ?

You can reach him on Slack. He told me that he won't get around to it before next week though. I'm trying to get some info on how things are set up in the meantime (so i can maybe take a look before that).

Actions #9

Updated by kraih over 2 years ago

Another interesting tidbit, Witold was not the one who set up metrics.o.o, he has just recently reverse engineered and fixed access metrics. Who set up the machine originally is still unknown was Jimmy Berry, and he did not leave much documentation unfortunately.

Actions #10

Updated by okurz over 2 years ago

  • Status changed from Feedback to Resolved

So we fixed it together now. I added the "tools-team" from os-autoinst github organisation to /etc/grafana/grafana.ini so I could log in using github to https://metrics.opensuse.org and look into the panel to understand where it gets data from. We learned that telegraf is used and it gets its configuration from /usr/share/openSUSE-release-tools/metrics/telegraf which is maintained on github so I created
https://github.com/openSUSE/openSUSE-release-tools/pull/2673
and edited it locally, then restarted osrt-metrics-telegraf.service

Actions

Also available in: Atom PDF