action #163928
closed[alert] Openqa HTTP Response lost on 15-07-24 size:S
0%
Description
Observation¶
I took a look at the logs which I attached in the ticket
I cant spot the actual problem. And the system seems to perform an update, and recovered after the restart of the services.
unresponsiveness took place from 00:42 to 01:05 (>20min)
looking at the logs I see some from telegraf
openqa telegraf[6820]: 2024-07-14T22:54:50Z E! [inputs.http] Error in plugin: [url=https://openqa.suse.de/admin/*]: Get "https://openqa.suse.de/admin/*": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
and many
Jul 15 00:54:29 openqa openqa[12024]: [debug] [pid:12024] _carry_over_candidate(14928963): ignoring job 14855612 with repeated problem
Jul 15 00:54:29 openqa openqa[12024]: [debug] [pid:12024] _carry_over_candidate(14928963): checking take over from 14834954: _failure_reason=GOOD
Files
Updated by mkittler 11 months ago
- Status changed from New to Rejected
- Assignee set to mkittler
I guess it would have made more sense to add this information to #163592 instead of creating a new ticket. But thanks for your research, I added the information to #163592#note-34. I think we can close this ticket as duplicate, though.
Updated by mkittler 11 months ago
- Is duplicate of action #163592: [alert] (HTTP Response alert Salt tm0h5mf4k) size:M added
Updated by ybonatakis 11 months ago ยท Edited
okurz wrote in #note-3:
I have some questions about this ticket which I would like to get answered with you but also with the help from others from the team:
- why did you report this issue when we already have #163592
I followed the recommendation from Liv on Slack which you participated as well- why was the alert not disabled for #163592
I feel I cant answer that- "system seems to perform an update": Where did you see that?
I think the logs show that many workers were updated. I think also there is a cron job which run every night. no?- why did you mention the
_carry_over_candidate
occurences?
I dont know what_carry_over_candidate
does exactly. It just appears a lot near the time frame
Corrected quoting by okurz:
okurz wrote in #note-3:
I have some questions about this ticket which I would like to get answered with you but also with the help from others from the team:
- why did you report this issue when we already have #163592
I followed the recommendation from Liv on Slack which you participated as well
- why was the alert not disabled for #163592
I feel I cant answer that
- "system seems to perform an update": Where did you see that?
I think the logs show that many workers were updated. I think also there is a cron job which run every night. no?
- why did you mention the
_carry_over_candidate
occurences?
I dont know what _carry_over_candidate
does exactly. It just appears a lot near the time frame
Updated by okurz 11 months ago
ybonatakis your quoting is broken but I can answer. In th markdown quoting make sure to separate each of your line from quoted lines with a separate blank line in between.
- why did you report this issue when we already have #163592
I followed the recommendation from Liv on Slack which you participated as well
You mean https://suse.slack.com/archives/C02AJ1E568M/p1721029947196429 where you asked about the telegraf error and _carry_over_candidate. You did not mention any useful context and did not mention "HTTP response" or https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&viewPanel=78&from=1720996087861&to=1720999161474
If you would have mentioned that then certainly we would have suggested to not report a separate issue or create a new ticket with the relation. Were you aware about the existance of #163592?
- why was the alert not disabled for #163592
I feel I cant answer that
understood. The question goes more to others.
- "system seems to perform an update": Where did you see that?
I think the logs show that many workers were updated. I think also there is a cron job which run every night. no?
there are many cron jobs running during multiple times of the day.
- why did you mention the
_carry_over_candidate
occurences?
I dont know what_carry_over_candidate
does exactly. It just appears a lot near the time frame
yes, but it's not related. That function belongs to the comment carry over when on a subsequent failed openQA test the comment is carried over.
Updated by okurz 10 months ago
understood. The question goes more to others.
we will handle that part explicitly today in the afternoon.
@ybonatakis can you confirm you got responses?
Updated by livdywan 10 months ago
- Related to action #163775: Conduct "lessons learned" with Five Why analysis about many alerts, e.g. alerts not silenced for known issues size:S added