action #106771
closed
imagetester missing in action
Added by favogt almost 3 years ago.
Updated almost 3 years ago.
Description
Observation¶
imagetester is unreachable over the network.
According to ariel, the last sign of life was
Feb 08 06:44:06 ariel dnsmasq-dhcp[2018]: DHCPACK(eth1) 192.168.112.5 00:25:90:a2:ba:92 imagetester
Rollback steps¶
- Priority changed from Normal to High
- Target version set to Ready
especially important for me is how we missed the machine being down. Well, ok, I know the answer, because we don't have alerting for that :) I guess we could try very simple monitoring with a cron job on ariel pinging machines.
I couldn't revive it via ipmi:
ipmitool -I lanplus -C 3 -H 10.160.65.195 -U … -P … power cycle
Error: Unable to establish IPMI v2 / RMCP+ session
- Status changed from New to In Progress
- Assignee set to mkittler
- Status changed from In Progress to Feedback
- Status changed from Feedback to Blocked
As we can reach the EngInfra ticket let's treat this one here as blocked, ok?
- Description updated (diff)
okurz wrote:
especially important for me is how we missed the machine being down. Well, ok, I know the answer, because we don't have alerting for that :) I guess we could try very simple monitoring with a cron job on ariel pinging machines.
That's meanwhile addressed with #106751 although that means that we need to pause the scheduled CI jobs and unpause again as soon as imagester is back. I added a section "Rollback steps" to the ticket accordingly.
- Related to action #106751: Update machines and passwords in the monitor-o3 repository added
- Status changed from Blocked to Resolved
SD ticket resolved. machine is up again. I enabled the pipeline again and triggered one and it's green. imagetester is again working on openQA jobs. Should be ok now.
Also available in: Atom
PDF