Project

General

Profile

Actions

action #163052

closed

jenkins.qa.suse.de no longer reachable via web browser (but responsive to SSH)

Added by livdywan 5 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

Try and open something like http://jenkins.qa.suse.de/job/gnome_next-openqa and find it says it's refusing to connect.

Acceptance criteria

  • AC1: jenkins.qa.suse.de is reachable via a web browser

Suggestions

  • Try via SSH and check the logs

Rollback actions


Related issues 1 (1 open0 closed)

Copied from openQA Infrastructure - action #133907: Improve monitoring for http(s?) reachable on jenkins.qa.suse.de size:MWorkable2023-08-07

Actions
Actions #1

Updated by livdywan 5 months ago

  • Copied from action #133907: Improve monitoring for http(s?) reachable on jenkins.qa.suse.de size:M added
Actions #2

Updated by okurz 5 months ago · Edited

  • Description updated (diff)
  • Status changed from New to In Progress
  • Assignee set to okurz

Also showing an alert about failed systemd service "apparmor" on that host: https://monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1&from=1719807966682&to=1719830820875

sudo systemctl --failed
  UNIT             LOAD   ACTIVE SUB    DESCRIPTION           
● apparmor.service loaded failed failed Load AppArmor profiles

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
1 loaded units listed.
okurz@jenkins:~> sudo systemctl status apparmor
× apparmor.service - Load AppArmor profiles
     Loaded: loaded (/usr/lib/systemd/system/apparmor.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Sun 2024-06-30 03:36:16 CEST; 1 day 10h ago
   Main PID: 651 (code=exited, status=1/FAILURE)
        CPU: 121ms

Notice: journal has been rotated since unit was started, output may be incomplete.

after restart sudo systemctl restart apparmor

● apparmor.service - Load AppArmor profiles
     Loaded: loaded (/usr/lib/systemd/system/apparmor.service; enabled; preset: enabled)
     Active: active (exited) since Mon 2024-07-01 13:44:53 CEST; 8s ago
    Process: 10863 ExecStart=/lib/apparmor/apparmor.systemd reload (code=exited, status=0/SUCCESS)
   Main PID: 10863 (code=exited, status=0/SUCCESS)
        CPU: 41ms

Jul 01 13:44:52 jenkins systemd[1]: Starting Load AppArmor profiles...
Jul 01 13:44:52 jenkins apparmor.systemd[10863]: Restarting AppArmor
Jul 01 13:44:52 jenkins apparmor.systemd[10863]: Reloading AppArmor profiles
Jul 01 13:44:52 jenkins apparmor.systemd[10868]: Warning from stdin (line 1): Cache: failed to add read only location '/usr/share/apparmor/cache', does not contain valid cache direc>
Jul 01 13:44:53 jenkins systemd[1]: Finished Load AppArmor profiles.
# systemctl status jenkins
○ jenkins.service - Jenkins Continuous Integration Server
     Loaded: loaded (/usr/lib/systemd/system/jenkins.service; enabled; preset: disabled)
     Active: inactive (dead)

triggered restart. Takes very long. Failed on first attempt, succeeded on second. Found other services also not running, e.g. nginx. Triggered systemctl start default.target. Called zypper -n in java-21-openjdk-headless and rebooted. All good again. apparmor shows the same warning, so red herring for the original issue.

Actions #3

Updated by okurz 5 months ago

  • Status changed from In Progress to Resolved

Fixed. GNOME-Next job triggered: https://openqa.opensuse.org/tests/4310149

Rollback action done, silence removed, alert is gone.

Actions

Also available in: Atom PDF