Project

General

Profile

Actions

action #135152

closed

Zabbix agent is not available

Added by livdywan 8 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
2023-07-07
Due date:
% Done:

0%

Estimated time:

Description

Observation

Problem started at 11:47:08 on 2023.09.03 
Problem name: Zabbix agent is not available (or nodata for 30m)
Host: ariel.suse-dmz.opensuse.org

and another version

Problem started at 11:46:51 on 2023.09.03
Problem name: Zabbix agent is not available (or nodata for 30m)
Host: ariel.dmz-prg2.suse.org (over old-ariel)

and also

Problem started at 10:36:51 on 2023.08.26
Problem name: Zabbix agent is not available (or nodata for 30m)
Host: ariel.dmz-prg2.suse.org (over old-ariel)

sudo journalctl -u zabbix_agentd only reveals logs from August 17 and there's no indication it stopped running.

Acceptance criteria

  • AC1: It is understand what was causing Zabbix agent unavailable alerts

Suggestions

  • Confirm what was causing the Zabbix agent to appear unavailable
  • Consider moving /var/log/zabbix/zabbix_agentd.log into the journal for better discoverability

Out of scope

  • Monitor proxy availability

Related issues 1 (0 open1 closed)

Copied from openQA Infrastructure - action #135029: Many unhandled alert messages while users report problemsResolvedlivdywan2023-07-07

Actions
Actions #1

Updated by livdywan 8 months ago

  • Copied from action #135029: Many unhandled alert messages while users report problems added
Actions #2

Updated by jbaier_cz 8 months ago

Logs are present in /var/log/zabbix/zabbix_agentd.log. We can consider changing that to log into journal (setting LogType=console should do the trick)

Actions #3

Updated by tinita 8 months ago

jbaier_cz wrote in #note-2:

Logs are present in /var/log/zabbix/zabbix_agentd.log. We can consider changing that to log into journal (setting LogType=console should do the trick)

ah!
maybe this is related then:

  1704:20230826:080711.330 active check data upload to [zabbix-proxy-opensuse:10051] started to fail ([connect] cannot connect to [[zabbix-proxy-opensuse]:10051]: [4] Interrupted system call)
  1704:20230826:080739.331 active check configuration update from [zabbix-proxy-opensuse:10051] started to fail (cannot connect to [[zabbix-proxy-opensuse]:10051]: [4] Interrupted system call)

The email was sent 08:36 UTC

Actions #4

Updated by tinita 8 months ago

  • Description updated (diff)

@livdywan could you add timestamps to the examples you gave above?

Actions #5

Updated by tinita 8 months ago

  • Description updated (diff)
Actions #6

Updated by livdywan 8 months ago

  • Description updated (diff)

@livdywan could you add timestamps to the examples you gave above?

Done. And I found a third instance.

Actions #7

Updated by livdywan 8 months ago

  • Description updated (diff)
  • Status changed from New to Resolved

We understand what happened and it's fine

Actions

Also available in: Atom PDF