Project

General

Profile

Actions

action #175710

closed

coordination #161414: [epic] Improved salt based infrastructure management

OSD openqa.ini is corrupted, invalid characters, again 2025-01-17

Added by okurz 4 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
2024-07-10
Due date:
% Done:

0%

Estimated time:

Description

Observation

See #175686-5


Related issues 6 (0 open6 closed)

Related to openQA Infrastructure (public) - action #176013: [alert] web UI: Too many Minion job failures alert size:SResolvedybonatakis2025-01-23

Actions
Related to openQA Infrastructure (public) - action #175407: salt state for machine monitor.qe.nue2.suse.org was broken for almost 2 months, nothing was alerting us size:SResolvedokurz

Actions
Related to openQA Infrastructure (public) - action #176124: OSD influxdb minion route seemingly returns only a very small number of failed minion jobs, not allResolvedtinita

Actions
Related to openQA Infrastructure (public) - action #176175: [alert] Grafana failed to start due to corrupted config fileResolvedokurz2025-01-26

Actions
Blocked by openQA Infrastructure (public) - action #176250: file corruption in salt controlled config files size:MResolvedokurz

Actions
Copied from openQA Infrastructure (public) - action #163790: OSD openqa.ini is corrupted, invalid characters size:MResolvedokurz2024-07-10

Actions
Actions #1

Updated by okurz 4 months ago

  • Copied from action #163790: OSD openqa.ini is corrupted, invalid characters size:M added
Actions #2

Updated by okurz 4 months ago

  • Target version changed from Ready to Tools - Next
Actions #3

Updated by tinita 4 months ago · Edited

  • Priority changed from Low to High
  • Target version changed from Tools - Next to Ready

While looking into #176013 I noticed that the search https://openqa.suse.de/minion does not allow to search for obs_rsync* tasks. They are just gone from the select. (Compare https://openqa.opensuse.org/minion )

I looked on osd if there were any config changes.

The openqa.config:

-rw-r--r-- 1 geekotest root 10243 Jan 22 23:54 openqa.ini                                                │

The snapshot from Nov 7 is significantly bigger:

-rw-r--r-- 2 martchus root 14262 Nov  7 15:32 openqa.ini

I'm looking at the diff, but in both the obs_rsync plugin is configured. The diff is mostly comment lines

Actions #4

Updated by tinita 4 months ago

  • Related to action #176013: [alert] web UI: Too many Minion job failures alert size:S added
Actions #5

Updated by tinita 4 months ago

I just tried to restart the gru service:

Jan 23 00:08:20 openqa systemd[1]: Stopping The openQA daemon for various background tasks like cleanup and saving needles...
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: State 'stop-sigterm' timed out. Killing.
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Killing process 13903 (openqa) with signal SIGKILL.
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Killing process 26956 (openqa) with signal SIGKILL.
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Main process exited, code=killed, status=9/KILL
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Failed with result 'timeout'.
Jan 23 00:13:20 openqa systemd[1]: Stopped The openQA daemon for various background tasks like cleanup and saving needles.
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Consumed 20min 30.720s CPU time.
Jan 23 00:13:20 openqa systemd[1]: Started The openQA daemon for various background tasks like cleanup and saving needles.

So it is running, but something went wrong.

Actions #6

Updated by okurz 4 months ago

  • Related to action #175407: salt state for machine monitor.qe.nue2.suse.org was broken for almost 2 months, nothing was alerting us size:S added
Actions #7

Updated by tinita 4 months ago · Edited

tinita wrote in #note-3:

I looked on osd if there were any config changes.

The openqa.config:

-rw-r--r-- 1 geekotest root 10243 Jan 22 23:54 openqa.ini

I had made a local backup of that file. I copied that now to osd into my home directory as openqa.ini-2025-01-22T23:54

Actions #8

Updated by nicksinger 4 months ago

I just found https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/3701756 which shows also broken files on tumblesle in /etc/zypp/zypp.conf which looked like:

## Configuration file for software management
## /etc/zypp/zypp.conf
##
## Boolean values are 0 1 yes no on off true false
}

[main]
solver.dupAllowVendorChange = True

I removed the stray "}" at the top. Maybe this is also related to "corrupted files".

Actions #9

Updated by tinita 4 months ago

  • Related to action #176124: OSD influxdb minion route seemingly returns only a very small number of failed minion jobs, not all added
Actions #10

Updated by tinita 4 months ago

  • Related to action #176175: [alert] Grafana failed to start due to corrupted config file added
Actions #11

Updated by okurz 4 months ago

  • Copied to action #176250: file corruption in salt controlled config files size:M added
Actions #12

Updated by okurz 4 months ago

  • Status changed from New to Blocked
  • Assignee set to okurz
  • Priority changed from High to Normal

currently OSD is fine, #176250

Actions #13

Updated by okurz 4 months ago

  • Copied to deleted (action #176250: file corruption in salt controlled config files size:M)
Actions #14

Updated by okurz 4 months ago

  • Blocked by action #176250: file corruption in salt controlled config files size:M added
Actions #15

Updated by okurz 4 months ago

  • Target version changed from Ready to future
Actions #16

Updated by okurz about 2 months ago

  • Status changed from Blocked to In Progress

I realized that again the config is incomplete although not corrupted. It looks like manual changes are lost but all salt controlled settings might be there. From ssh backup-vm.qe.nue2.suse.org 'ls -la /home/rsnapshot/*/openqa.suse.de/etc/openqa/openqa.ini'

-rw-r--r-- 1 martchus root  3411 Mar 20 07:38 /home/rsnapshot/alpha.0/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar 20 03:38 /home/rsnapshot/alpha.1/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar 19 23:39 /home/rsnapshot/alpha.2/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar 19 19:38 /home/rsnapshot/alpha.3/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar 19 16:37 /home/rsnapshot/alpha.4/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar 19 12:39 /home/rsnapshot/alpha.5/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar 19 03:39 /home/rsnapshot/beta.0/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar 18 03:38 /home/rsnapshot/beta.1/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar 15 20:36 /home/rsnapshot/beta.2/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar 15 03:37 /home/rsnapshot/beta.4/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar 14 04:37 /home/rsnapshot/beta.5/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar 13 04:37 /home/rsnapshot/beta.6/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 2 martchus root 14285 Feb  3 04:37 /home/rsnapshot/delta.0/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 2 martchus root 14285 Feb  3 04:37 /home/rsnapshot/delta.1/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar  8 04:37 /home/rsnapshot/gamma.0/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Mar  1 04:37 /home/rsnapshot/gamma.1/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root  3411 Feb 22 04:35 /home/rsnapshot/gamma.2/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 14286 Feb 15 04:16 /home/rsnapshot/gamma.3/openqa.suse.de/etc/openqa/openqa.ini

Recovering

Actions #17

Updated by okurz about 2 months ago

  • Status changed from In Progress to Blocked
Actions #18

Updated by okurz about 1 month ago

  • Status changed from Blocked to Resolved
  • Target version changed from future to Ready

With #176250 this should be resolved

Actions

Also available in: Atom PDF