Project

General

Profile

Actions

action #163825

closed

[alert][FIRING:1] Failed systemd services alert session-c69388.scope / suse-build-key-import.service on backup-qam.qe.nue2.suse.org size:S

Added by livdywan 5 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-06-29
Due date:
% Done:

0%

Estimated time:

Description

Observation

backup-qam:~ # systemctl status suse-build-key-import.service
× suse-build-key-import.service - Import new SUSE RPM signing keys
     Loaded: loaded (/usr/lib/systemd/system/suse-build-key-import.service; static)
     Active: failed (Result: exit-code) since Fri 2024-07-12 04:00:00 CEST; 10h ago
TriggeredBy: ● suse-build-key-import.timer
  Condition: start condition unmet at Fri 2024-07-12 14:00:00 CEST; 1min 39s ago
             └─ ConditionPathExists=/var/lib/suse-build-key/imported was not met
    Process: 18577 ExecStart=/usr/bin/import-suse-build-key (code=exited, status=2)
   Main PID: 18577 (code=exited, status=2)
        CPU: 37ms

Jul 12 05:00:00 backup-qam systemd[1]: Import new SUSE RPM signing keys was skipped because of an unmet condition check (ConditionPathExists=/var/lib/suse-build skipped because of an unmet condition check (ConditionPathExists=/var/lib/suse-build-key/imported).

/var/lib/suse-build-key is present, but the sub-folder imported does not.

Suggestions

  • Try to understand why /var/lib/suse-build-key exists and if something deleted imported
  • Create upstream issue and track that and work around that, possibly by hammering the restart button

Related issues 1 (0 open1 closed)

Copied to openQA Project (public) - action #163852: [alert][FIRING:1] Failed systemd services alert session-c69388.scope / session-c69388.scope on openqa.suse.deResolvednicksinger2024-06-29

Actions
Actions #1

Updated by livdywan 5 months ago

$ systemctl --failed                                                      
  UNIT                 LOAD   ACTIVE SUB    DESCRIPTION                                     
● session-c69388.scope loaded failed failed Session c69388 of User postgres
[...]
$ systemctl status session-c69388.scope                                   
× session-c69388.scope - Session c69388 of User postgres                                    
     Loaded: loaded (/run/systemd/transient/session-c69388.scope; transient)                
  Transient: yes                                                                            
     Active: failed (Result: resources)
$ sudo systemctl restart session-c69388.scope
Job failed. See "journalctl -xe" for details.

This is postgres related. No easy restart. Not sure how to deal with this.

Actions #2

Updated by livdywan 5 months ago

  • Status changed from In Progress to New

2024-07-12 10:08:40 backup-qam suse-build-key-import 1

Tried to check this machine but can't login:

$ ssh backup-vm
ssh: Could not resolve hostname backup-vm: Temporary failure in name resolution

There is also backup.qa.suse.de but I guess that's not it since it has no service called suse-build-key-import.

Actions #3

Updated by nicksinger 5 months ago

  • Status changed from New to In Progress
  • Assignee changed from livdywan to nicksinger

as per slack, taking over from here now

Actions #4

Updated by nicksinger 5 months ago

  • Subject changed from [alert][FIRING:1] Failed systemd services alert session-c69388.scope / backup-qam to [alert][FIRING:1] Failed systemd services alert session-c69388.scope / suse-build-key-import.service on backup-qam.qe.nue2.suse.org
Actions #5

Updated by nicksinger 5 months ago

  • Copied to action #163852: [alert][FIRING:1] Failed systemd services alert session-c69388.scope / session-c69388.scope on openqa.suse.de added
Actions #6

Updated by nicksinger 5 months ago

  • Description updated (diff)
Actions #7

Updated by nicksinger 5 months ago

rpm -qf /var/lib/suse-build-key/imported shows that the folder should be provided by the rpm "suse-build-key-12.0-150000.8.46.2.noarch". Recreating it manually and then calling systemd-tmpfiles --clean does not delete the directory so it has to be something else. Now looking with journalctl --since "Jul 12 05:00:00" for other processes running at that time.

Actions #8

Updated by nicksinger 5 months ago

Actually I have to check one hour earlier and by accident spotted:

Jul 12 04:00:00 backup-qam import-suse-build-key[18577]: /usr/bin/import-suse-build-key: line 34: syntax error near unexpected token `fi'
Jul 12 04:00:00 backup-qam import-suse-build-key[18577]: /usr/bin/import-suse-build-key: line 34: `fi'
Jul 12 04:00:00 backup-qam systemd[1]: suse-build-key-import.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Jul 12 04:00:00 backup-qam systemd[1]: suse-build-key-import.service: Failed with result 'exit-code'.
Jul 12 04:00:00 backup-qam systemd[1]: Failed to start Import new SUSE RPM signing keys.

this looks like the script self destructed its own required conditions. I will collect some more details and probably create an upstream issue (not sure yet where to report such issues).

Actions #9

Updated by okurz 5 months ago

  • Subject changed from [alert][FIRING:1] Failed systemd services alert session-c69388.scope / suse-build-key-import.service on backup-qam.qe.nue2.suse.org to [alert][FIRING:1] Failed systemd services alert session-c69388.scope / suse-build-key-import.service on backup-qam.qe.nue2.suse.org size:S
  • Description updated (diff)
Actions #10

Updated by nicksinger 5 months ago

  • Status changed from In Progress to Resolved

Upstream issue tracked at https://bugzilla.suse.com/show_bug.cgi?id=1227681 and a fix from Marcus Meissner is already submitted so I just manually changed the file because the next package update should overwrite it with an upstream patch. Simply restarting the service after fixing the script fixed it and the issue should be resolved

Actions #11

Updated by nicksinger 5 months ago

  • Status changed from Resolved to Feedback
Actions #12

Updated by okurz 5 months ago

  • Status changed from Feedback to Resolved

nicksinger just reopened to make our backlog checker happy. We can resolve as we have a workaround in place and final fix is expected to happen eventually regardless when that will be.

Actions

Also available in: Atom PDF