action #163852: [alert][FIRING:1] Failed systemd services alert session-c69388.scope / session-c69388.scope on openqa.suse.de - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #163852

closed

[alert][FIRING:1] Failed systemd services alert session-c69388.scope / session-c69388.scope on openqa.suse.de

Added by nicksinger 9 months ago. Updated 9 months ago.

Status:

Resolved

Priority:

High

Assignee:

nicksinger

Category:

Regressions/Crashes

Target version:

Ready

Start date:

2024-06-29

Due date:

% Done:

Estimated time:

Tags:

alert, infra, reactive work

Description

Observation¶

As per suggestion in #163825 this is a second ticket covering a different failing service on OSD.

From logs on OSD:

openqa:~ # systemctl --failed
  UNIT                 LOAD   ACTIVE SUB    DESCRIPTION
● session-c69388.scope loaded failed failed Session c69388 of User postgres

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
openqa:~ # systemctl status session-c69388.scope
× session-c69388.scope - Session c69388 of User postgres
     Loaded: loaded (/run/systemd/transient/session-c69388.scope; transient)
  Transient: yes
     Active: failed

Jul 12 01:45:40 openqa systemd[1]: session-c69388.scope: Couldn't move process 3018 to requested cgroup '/user.slice/user-26.slice/session-c69388.scope': No such process
Jul 12 01:45:40 openqa systemd[1]: session-c69388.scope: Failed to add PIDs to scope's control group: No such process
Jul 12 01:45:40 openqa systemd[1]: session-c69388.scope: Failed with result 'resources'.
Jul 12 01:45:40 openqa systemd[1]: Failed to start Session c69388 of User postgres.
Jul 12 10:36:10 openqa systemd[1]: Failed to start Session c69388 of User postgres.

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by nicksinger 9 months ago

Copied from action #163825: [alert][FIRING:1] Failed systemd services alert session-c69388.scope / suse-build-key-import.service on backup-qam.qe.nue2.suse.org size:S added

Actions

Copy link

Updated by nicksinger 9 months ago

Status changed from In Progress to Resolved

I didn't find any useful logs for postgres because "journal has been rotated since unit was started, output may be incomplete.". I assume that some automated task running as this user which exited very quickly resulting in that issue. I wouldn't adjust anything now. If we see this issue again we can think about two possible solutions:

a) Adjust our data collection scripts (https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/monitoring/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh?ref_type=heads#L35-42) to e.g. ignore failed scope units if these issues just appear and don't cause any harm
b) Research what changed and what is affected by this. Try to improve logging and monitor it for a longer time.

Actions

Copy link

Updated by nicksinger 9 months ago

reset the failed state with systemctl reset-failed on OSD

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #163852

[alert][FIRING:1] Failed systemd services alert session-c69388.scope / session-c69388.scope on openqa.suse.de

Observation¶

Updated by nicksinger 9 months ago

Updated by nicksinger 9 months ago

Updated by nicksinger 9 months ago