action #163052
closedjenkins.qa.suse.de no longer reachable via web browser (but responsive to SSH)
0%
Description
Observation¶
Try and open something like http://jenkins.qa.suse.de/job/gnome_next-openqa and find it says it's refusing to connect.
Acceptance criteria¶
- AC1: jenkins.qa.suse.de is reachable via a web browser
Suggestions¶
- Try via SSH and check the logs
Rollback actions¶
- Remove alert silence
alertname=Failed systemd services alert (except openqa.suse.de)
from https://monitor.qa.suse.de/alerting/silences
Updated by livdywan 5 months ago
- Copied from action #133907: Improve monitoring for http(s?) reachable on jenkins.qa.suse.de size:M added
Updated by okurz 5 months ago · Edited
- Description updated (diff)
- Status changed from New to In Progress
- Assignee set to okurz
Also showing an alert about failed systemd service "apparmor" on that host: https://monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1&from=1719807966682&to=1719830820875
sudo systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● apparmor.service loaded failed failed Load AppArmor profiles
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
okurz@jenkins:~> sudo systemctl status apparmor
× apparmor.service - Load AppArmor profiles
Loaded: loaded (/usr/lib/systemd/system/apparmor.service; enabled; preset: enabled)
Active: failed (Result: exit-code) since Sun 2024-06-30 03:36:16 CEST; 1 day 10h ago
Main PID: 651 (code=exited, status=1/FAILURE)
CPU: 121ms
Notice: journal has been rotated since unit was started, output may be incomplete.
after restart sudo systemctl restart apparmor
● apparmor.service - Load AppArmor profiles
Loaded: loaded (/usr/lib/systemd/system/apparmor.service; enabled; preset: enabled)
Active: active (exited) since Mon 2024-07-01 13:44:53 CEST; 8s ago
Process: 10863 ExecStart=/lib/apparmor/apparmor.systemd reload (code=exited, status=0/SUCCESS)
Main PID: 10863 (code=exited, status=0/SUCCESS)
CPU: 41ms
Jul 01 13:44:52 jenkins systemd[1]: Starting Load AppArmor profiles...
Jul 01 13:44:52 jenkins apparmor.systemd[10863]: Restarting AppArmor
Jul 01 13:44:52 jenkins apparmor.systemd[10863]: Reloading AppArmor profiles
Jul 01 13:44:52 jenkins apparmor.systemd[10868]: Warning from stdin (line 1): Cache: failed to add read only location '/usr/share/apparmor/cache', does not contain valid cache direc>
Jul 01 13:44:53 jenkins systemd[1]: Finished Load AppArmor profiles.
# systemctl status jenkins
○ jenkins.service - Jenkins Continuous Integration Server
Loaded: loaded (/usr/lib/systemd/system/jenkins.service; enabled; preset: disabled)
Active: inactive (dead)
triggered restart. Takes very long. Failed on first attempt, succeeded on second. Found other services also not running, e.g. nginx. Triggered systemctl start default.target
. Called zypper -n in java-21-openjdk-headless
and rebooted. All good again. apparmor shows the same warning, so red herring for the original issue.
Updated by okurz 5 months ago
- Status changed from In Progress to Resolved
Fixed. GNOME-Next job triggered: https://openqa.opensuse.org/tests/4310149
Rollback action done, silence removed, alert is gone.