Project

General

Profile

Actions

action #163852

closed

[alert][FIRING:1] Failed systemd services alert session-c69388.scope / session-c69388.scope on openqa.suse.de

Added by nicksinger about 1 month ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-06-29
Due date:
% Done:

0%

Estimated time:

Description

Observation

As per suggestion in #163825 this is a second ticket covering a different failing service on OSD.

From logs on OSD:

openqa:~ # systemctl --failed
  UNIT                 LOAD   ACTIVE SUB    DESCRIPTION
● session-c69388.scope loaded failed failed Session c69388 of User postgres

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
openqa:~ # systemctl status session-c69388.scope
× session-c69388.scope - Session c69388 of User postgres
     Loaded: loaded (/run/systemd/transient/session-c69388.scope; transient)
  Transient: yes
     Active: failed

Jul 12 01:45:40 openqa systemd[1]: session-c69388.scope: Couldn't move process 3018 to requested cgroup '/user.slice/user-26.slice/session-c69388.scope': No such process
Jul 12 01:45:40 openqa systemd[1]: session-c69388.scope: Failed to add PIDs to scope's control group: No such process
Jul 12 01:45:40 openqa systemd[1]: session-c69388.scope: Failed with result 'resources'.
Jul 12 01:45:40 openqa systemd[1]: Failed to start Session c69388 of User postgres.
Jul 12 10:36:10 openqa systemd[1]: Failed to start Session c69388 of User postgres.

Related issues 1 (0 open1 closed)

Copied from openQA Project - action #163825: [alert][FIRING:1] Failed systemd services alert session-c69388.scope / suse-build-key-import.service on backup-qam.qe.nue2.suse.org size:SResolvednicksinger2024-06-29

Actions
Actions #1

Updated by nicksinger about 1 month ago

  • Copied from action #163825: [alert][FIRING:1] Failed systemd services alert session-c69388.scope / suse-build-key-import.service on backup-qam.qe.nue2.suse.org size:S added
Actions #2

Updated by nicksinger about 1 month ago

  • Status changed from In Progress to Resolved

I didn't find any useful logs for postgres because "journal has been rotated since unit was started, output may be incomplete.". I assume that some automated task running as this user which exited very quickly resulting in that issue. I wouldn't adjust anything now. If we see this issue again we can think about two possible solutions:

a) Adjust our data collection scripts (https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/monitoring/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh?ref_type=heads#L35-42) to e.g. ignore failed scope units if these issues just appear and don't cause any harm
b) Research what changed and what is affected by this. Try to improve logging and monitor it for a longer time.

Actions #3

Updated by nicksinger about 1 month ago

reset the failed state with systemctl reset-failed on OSD

Actions

Also available in: Atom PDF