action #175710
closedcoordination #161414: [epic] Improved salt based infrastructure management
OSD openqa.ini is corrupted, invalid characters, again 2025-01-17
0%
Updated by okurz 4 months ago
- Copied from action #163790: OSD openqa.ini is corrupted, invalid characters size:M added
Updated by tinita 4 months ago · Edited
- Priority changed from Low to High
- Target version changed from Tools - Next to Ready
While looking into #176013 I noticed that the search https://openqa.suse.de/minion does not allow to search for obs_rsync* tasks. They are just gone from the select. (Compare https://openqa.opensuse.org/minion )
I looked on osd if there were any config changes.
The openqa.config:
-rw-r--r-- 1 geekotest root 10243 Jan 22 23:54 openqa.ini │
The snapshot from Nov 7 is significantly bigger:
-rw-r--r-- 2 martchus root 14262 Nov 7 15:32 openqa.ini
I'm looking at the diff, but in both the obs_rsync plugin is configured. The diff is mostly comment lines
Updated by tinita 4 months ago
- Related to action #176013: [alert] web UI: Too many Minion job failures alert size:S added
Updated by tinita 4 months ago
I just tried to restart the gru service:
Jan 23 00:08:20 openqa systemd[1]: Stopping The openQA daemon for various background tasks like cleanup and saving needles...
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: State 'stop-sigterm' timed out. Killing.
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Killing process 13903 (openqa) with signal SIGKILL.
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Killing process 26956 (openqa) with signal SIGKILL.
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Main process exited, code=killed, status=9/KILL
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Failed with result 'timeout'.
Jan 23 00:13:20 openqa systemd[1]: Stopped The openQA daemon for various background tasks like cleanup and saving needles.
Jan 23 00:13:20 openqa systemd[1]: openqa-gru.service: Consumed 20min 30.720s CPU time.
Jan 23 00:13:20 openqa systemd[1]: Started The openQA daemon for various background tasks like cleanup and saving needles.
So it is running, but something went wrong.
Updated by okurz 4 months ago
- Related to action #175407: salt state for machine monitor.qe.nue2.suse.org was broken for almost 2 months, nothing was alerting us size:S added
Updated by tinita 4 months ago · Edited
tinita wrote in #note-3:
I looked on osd if there were any config changes.
The openqa.config:
-rw-r--r-- 1 geekotest root 10243 Jan 22 23:54 openqa.ini
I had made a local backup of that file. I copied that now to osd into my home directory as openqa.ini-2025-01-22T23:54
Updated by nicksinger 4 months ago
I just found https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/3701756 which shows also broken files on tumblesle in /etc/zypp/zypp.conf which looked like:
## Configuration file for software management
## /etc/zypp/zypp.conf
##
## Boolean values are 0 1 yes no on off true false
}
[main]
solver.dupAllowVendorChange = True
I removed the stray "}" at the top. Maybe this is also related to "corrupted files".
Updated by tinita 4 months ago
- Related to action #176124: OSD influxdb minion route seemingly returns only a very small number of failed minion jobs, not all added
Updated by tinita 4 months ago
- Related to action #176175: [alert] Grafana failed to start due to corrupted config file added
Updated by okurz 4 months ago
- Copied to action #176250: file corruption in salt controlled config files size:M added
Updated by okurz 4 months ago
- Copied to deleted (action #176250: file corruption in salt controlled config files size:M)
Updated by okurz 4 months ago
- Blocked by action #176250: file corruption in salt controlled config files size:M added
Updated by okurz about 2 months ago
- Status changed from Blocked to In Progress
I realized that again the config is incomplete although not corrupted. It looks like manual changes are lost but all salt controlled settings might be there. From ssh backup-vm.qe.nue2.suse.org 'ls -la /home/rsnapshot/*/openqa.suse.de/etc/openqa/openqa.ini'
-rw-r--r-- 1 martchus root 3411 Mar 20 07:38 /home/rsnapshot/alpha.0/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 20 03:38 /home/rsnapshot/alpha.1/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 19 23:39 /home/rsnapshot/alpha.2/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 19 19:38 /home/rsnapshot/alpha.3/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 19 16:37 /home/rsnapshot/alpha.4/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 19 12:39 /home/rsnapshot/alpha.5/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 19 03:39 /home/rsnapshot/beta.0/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 18 03:38 /home/rsnapshot/beta.1/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 15 20:36 /home/rsnapshot/beta.2/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 15 03:37 /home/rsnapshot/beta.4/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 14 04:37 /home/rsnapshot/beta.5/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 13 04:37 /home/rsnapshot/beta.6/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 2 martchus root 14285 Feb 3 04:37 /home/rsnapshot/delta.0/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 2 martchus root 14285 Feb 3 04:37 /home/rsnapshot/delta.1/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 8 04:37 /home/rsnapshot/gamma.0/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Mar 1 04:37 /home/rsnapshot/gamma.1/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 3411 Feb 22 04:35 /home/rsnapshot/gamma.2/openqa.suse.de/etc/openqa/openqa.ini
-rw-r--r-- 1 martchus root 14286 Feb 15 04:16 /home/rsnapshot/gamma.3/openqa.suse.de/etc/openqa/openqa.ini
Recovering
Updated by okurz about 2 months ago
- Status changed from In Progress to Blocked
Updated by okurz about 1 month ago
- Status changed from Blocked to Resolved
- Target version changed from future to Ready
With #176250 this should be resolved