Project

General

Profile

Actions

action #175710

open

coordination #161414: [epic] Improved salt based infrastructure management

OSD openqa.ini is corrupted, invalid characters, again 2025-01-17

Added by okurz 14 days ago. Updated 3 days ago.

Status:
Blocked
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
2024-07-10
Due date:
% Done:

0%

Estimated time:

Description

Observation

See #175686-5


Related issues 6 (2 open4 closed)

Related to openQA Infrastructure (public) - action #176013: [alert] web UI: Too many Minion job failures alert size:SResolvedybonatakis2025-01-23

Actions
Related to openQA Infrastructure (public) - action #175407: salt state for machine monitor.qe.nue2.suse.org was broken for almost 2 months, nothing was alerting us size:SResolvedokurz

Actions
Related to openQA Infrastructure (public) - action #176124: OSD influxdb minion route seemingly returns only a very small number of failed minion jobs, not allResolvedtinita

Actions
Related to openQA Infrastructure (public) - action #176175: [alert] Grafana failed to start due to corrupted config fileBlockedokurz2025-01-26

Actions
Blocked by openQA Infrastructure (public) - action #176250: file corruption in salt controlled config files size:MIn Progressokurz2025-02-13

Actions
Copied from openQA Infrastructure (public) - action #163790: OSD openqa.ini is corrupted, invalid characters size:MResolvedokurz2024-07-10

Actions
Actions #1

Updated by okurz 14 days ago

  • Copied from action #163790: OSD openqa.ini is corrupted, invalid characters size:M added
Actions #2

Updated by okurz 14 days ago

  • Target version changed from Ready to Tools - Next
Actions #3

Updated by tinita 8 days ago · Edited

  • Priority changed from Low to High
  • Target version changed from Tools - Next to Ready

While looking into #176013 I noticed that the search https://openqa.suse.de/minion does not allow to search for obs_rsync* tasks. They are just gone from the select. (Compare https://openqa.opensuse.org/minion )

I looked on osd if there were any config changes.

The openqa.config:

-rw-r--r-- 1 geekotest root 10243 Jan 22 23:54 openqa.ini                                                │

The snapshot from Nov 7 is significantly bigger:

-rw-r--r-- 2 martchus root 14262 Nov  7 15:32 openqa.ini

I'm looking at the diff, but in both the obs_rsync plugin is configured. The diff is mostly comment lines

Actions #4

Updated by tinita 8 days ago

  • Related to action #176013: [alert] web UI: Too many Minion job failures alert size:S added
Actions #5

Updated by tinita 8 days ago

I just tried to restart the gru service:

Jan 23 00:08:20 openqa systemd[1]: Stopping The openQA daemon for various background tasks like cleanup and saving needles...
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: State 'stop-sigterm' timed out. Killing.
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Killing process 13903 (openqa) with signal SIGKILL.
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Killing process 26956 (openqa) with signal SIGKILL.
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Main process exited, code=killed, status=9/KILL
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Failed with result 'timeout'.
Jan 23 00:13:20 openqa systemd[1]: Stopped The openQA daemon for various background tasks like cleanup and saving needles.
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Consumed 20min 30.720s CPU time.
Jan 23 00:13:20 openqa systemd[1]: Started The openQA daemon for various background tasks like cleanup and saving needles.

So it is running, but something went wrong.

Actions #6

Updated by okurz 8 days ago

  • Related to action #175407: salt state for machine monitor.qe.nue2.suse.org was broken for almost 2 months, nothing was alerting us size:S added
Actions #7

Updated by tinita 8 days ago · Edited

tinita wrote in #note-3:

I looked on osd if there were any config changes.

The openqa.config:

-rw-r--r-- 1 geekotest root 10243 Jan 22 23:54 openqa.ini

I had made a local backup of that file. I copied that now to osd into my home directory as openqa.ini-2025-01-22T23:54

Actions #8

Updated by nicksinger 7 days ago

I just found https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/3701756 which shows also broken files on tumblesle in /etc/zypp/zypp.conf which looked like:

## Configuration file for software management
## /etc/zypp/zypp.conf
##
## Boolean values are 0 1 yes no on off true false
}

[main]
solver.dupAllowVendorChange = True

I removed the stray "}" at the top. Maybe this is also related to "corrupted files".

Actions #9

Updated by tinita 7 days ago

  • Related to action #176124: OSD influxdb minion route seemingly returns only a very small number of failed minion jobs, not all added
Actions #10

Updated by tinita 4 days ago

  • Related to action #176175: [alert] Grafana failed to start due to corrupted config file added
Actions #11

Updated by okurz 3 days ago

  • Copied to action #176250: file corruption in salt controlled config files size:M added
Actions #12

Updated by okurz 3 days ago

  • Status changed from New to Blocked
  • Assignee set to okurz
  • Priority changed from High to Normal

currently OSD is fine, #176250

Actions #13

Updated by okurz 3 days ago

  • Copied to deleted (action #176250: file corruption in salt controlled config files size:M)
Actions #14

Updated by okurz 3 days ago

  • Blocked by action #176250: file corruption in salt controlled config files size:M added
Actions

Also available in: Atom PDF