Actions
action #138527
closedZabbix agent on ariel.dmz-prg2.suse.org reported no data for 30m and there is nothing in the journal size:S
Start date:
2023-07-07
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
Problem started at 12:50:21 on 2023.10.25
Problem name: Zabbix agent is not available (or nodata for 30m)
Host: ariel.dmz-prg2.suse.org
Severity: Average
Operational data: Up (1)
Original problem ID: 600373209
Checking the journal shows nothing:
sudo journalctl -u zabbix_agentd
-- No entries --
Acceptance criteria¶
- AC1: It is understand what was causing Zabbix agent not reporting any data
Suggestions¶
Updated by tinita 10 months ago
This keeps happening regularly for a short timeframe. Right now everything seems fine again, duration was 35min.
% grep zabbix-proxy.dmz-prg2.suse.org /var/log/zabbix_agentd.log
1641:20231022:033032.672 active check configuration update from [zabbix-proxy.dmz-prg2.suse.org:10051] started to fail (cannot resolve [zabbix-proxy.dmz-prg2.suse.org])
1641:20231022:033132.688 active check configuration update from [zabbix-proxy.dmz-prg2.suse.org:10051] is working again
1641:20231024:200405.516 active check data upload to [zabbix-proxy.dmz-prg2.suse.org:10051] started to fail ([connect] cannot resolve [zabbix-proxy.dmz-prg2.suse.org])
1641:20231024:200546.006 active check configuration update from [zabbix-proxy.dmz-prg2.suse.org:10051] started to fail (cannot resolve [zabbix-proxy.dmz-prg2.suse.org])
1641:20231024:201159.094 active check data upload to [zabbix-proxy.dmz-prg2.suse.org:10051] is working again
1641:20231024:201246.432 active check configuration update from [zabbix-proxy.dmz-prg2.suse.org:10051] is working again
# The alert seems to be about this timeframe:
1641:20231025:102049.982 active check data upload to [zabbix-proxy.dmz-prg2.suse.org:10051] started to fail ([connect] cannot resolve [zabbix-proxy.dmz-prg2.suse.org])
1641:20231025:102052.173 active check configuration update from [zabbix-proxy.dmz-prg2.suse.org:10051] started to fail (cannot resolve [zabbix-proxy.dmz-prg2.suse.org])
1641:20231025:112646.694 active check data upload to [zabbix-proxy.dmz-prg2.suse.org:10051] is working again
1641:20231025:112654.706 active check configuration update from [zabbix-proxy.dmz-prg2.suse.org:10051] is working again
1641:20231025:133801.103 active check data upload to [zabbix-proxy.dmz-prg2.suse.org:10051] started to fail ([connect] cannot connect to [[zabbix-proxy.dmz-prg2.suse.org]:10051]: [4] Interrupted system call)
1641:20231025:133801.105 active check data upload to [zabbix-proxy.dmz-prg2.suse.org:10051] is working again
So apparently there are connection problems to the zabbix host from time to time.
Not sure how we could debug or improve this...
Updated by jbaier_cz 10 months ago
- Related to action #138551: DNS outage of 2023-10-25, e.g. Cron <root@openqa-service> (date; fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log Max retries exceeded with url size:S added
Updated by livdywan 10 months ago
- Subject changed from Zabbix agent on ariel.dmz-prg2.suse.org reported no data for 30m and there is nothing in the journal to Zabbix agent on ariel.dmz-prg2.suse.org reported no data for 30m and there is nothing in the journal size:S
- Status changed from New to In Progress
- Assignee set to livdywan
- Priority changed from Urgent to High
Maybe related to, or the same as #138551 and also lowering priority as we're not seeing this right now. I'll try and confirm the root cause and monitor the situation going forward.
Updated by livdywan 10 months ago
- Related to action #138545: Munin - minion hook failed - opensuse.org :: openqa.opensuse.org size:S added
Actions