action #163928
closed[alert] Openqa HTTP Response lost on 15-07-24 size:S
0%
Description
Observation¶
I took a look at the logs which I attached in the ticket
I cant spot the actual problem. And the system seems to perform an update, and recovered after the restart of the services.
unresponsiveness took place from 00:42 to 01:05 (>20min)
looking at the logs I see some from telegraf
openqa telegraf[6820]: 2024-07-14T22:54:50Z E! [inputs.http] Error in plugin: [url=https://openqa.suse.de/admin/*]: Get "https://openqa.suse.de/admin/*": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
and many
Jul 15 00:54:29 openqa openqa[12024]: [debug] [pid:12024] _carry_over_candidate(14928963): ignoring job 14855612 with repeated problem
Jul 15 00:54:29 openqa openqa[12024]: [debug] [pid:12024] _carry_over_candidate(14928963): checking take over from 14834954: _failure_reason=GOOD
Files
Updated by mkittler 5 months ago
- Status changed from New to Rejected
- Assignee set to mkittler
I guess it would have made more sense to add this information to #163592 instead of creating a new ticket. But thanks for your research, I added the information to #163592#note-34. I think we can close this ticket as duplicate, though.
Updated by mkittler 5 months ago
- Is duplicate of action #163592: [alert] (HTTP Response alert Salt tm0h5mf4k) size:M added
Updated by ybonatakis 5 months ago ยท Edited
okurz wrote in #note-3:
I have some questions about this ticket which I would like to get answered with you but also with the help from others from the team:
- why did you report this issue when we already have #163592 I followed the recommendation from Liv on Slack which you participated as well
- why was the alert not disabled for #163592 I feel I cant answer that
- "system seems to perform an update": Where did you see that? I think the logs show that many workers were updated. I think also there is a cron job which run every night. no?
- why did you mention the
_carry_over_candidate
occurences? I dont know what_carry_over_candidate
does exactly. It just appears a lot near the time frame
Corrected quoting by okurz:
okurz wrote in #note-3:
I have some questions about this ticket which I would like to get answered with you but also with the help from others from the team:
- why did you report this issue when we already have #163592
I followed the recommendation from Liv on Slack which you participated as well
- why was the alert not disabled for #163592
I feel I cant answer that
- "system seems to perform an update": Where did you see that?
I think the logs show that many workers were updated. I think also there is a cron job which run every night. no?
- why did you mention the
_carry_over_candidate
occurences?
I dont know what _carry_over_candidate
does exactly. It just appears a lot near the time frame
Updated by okurz 5 months ago
ybonatakis your quoting is broken but I can answer. In th markdown quoting make sure to separate each of your line from quoted lines with a separate blank line in between.
- why did you report this issue when we already have #163592 I followed the recommendation from Liv on Slack which you participated as well
You mean https://suse.slack.com/archives/C02AJ1E568M/p1721029947196429 where you asked about the telegraf error and _carry_over_candidate. You did not mention any useful context and did not mention "HTTP response" or https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&viewPanel=78&from=1720996087861&to=1720999161474
If you would have mentioned that then certainly we would have suggested to not report a separate issue or create a new ticket with the relation. Were you aware about the existance of #163592?
- why was the alert not disabled for #163592 I feel I cant answer that
understood. The question goes more to others.
- "system seems to perform an update": Where did you see that? I think the logs show that many workers were updated. I think also there is a cron job which run every night. no?
there are many cron jobs running during multiple times of the day.
- why did you mention the
_carry_over_candidate
occurences? I dont know what_carry_over_candidate
does exactly. It just appears a lot near the time frame
yes, but it's not related. That function belongs to the comment carry over when on a subsequent failed openQA test the comment is carried over.
Updated by okurz 5 months ago
understood. The question goes more to others.
we will handle that part explicitly today in the afternoon.
@ybonatakis can you confirm you got responses?
Updated by livdywan 5 months ago
- Related to action #163775: Conduct "lessons learned" with Five Why analysis about many alerts, e.g. alerts not silenced for known issues size:S added