action #106771
closedimagetester missing in action
0%
Description
Observation¶
imagetester is unreachable over the network.
According to ariel, the last sign of life was
Feb 08 06:44:06 ariel dnsmasq-dhcp[2018]: DHCPACK(eth1) 192.168.112.5 00:25:90:a2:ba:92 imagetester
Rollback steps¶
- Enable monitoring CI pipeline on https://gitlab.suse.de/openqa/monitor-o3/-/pipeline_schedules/58/edit again
Updated by okurz almost 3 years ago
- Priority changed from Normal to High
- Target version set to Ready
especially important for me is how we missed the machine being down. Well, ok, I know the answer, because we don't have alerting for that :) I guess we could try very simple monitoring with a cron job on ariel pinging machines.
Updated by mkittler almost 3 years ago
I couldn't revive it via ipmi:
ipmitool -I lanplus -C 3 -H 10.160.65.195 -U … -P … power cycle
Error: Unable to establish IPMI v2 / RMCP+ session
Updated by mkittler almost 3 years ago
- Status changed from New to In Progress
- Assignee set to mkittler
Updated by mkittler almost 3 years ago
- Status changed from In Progress to Feedback
I've been creating an Infra ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-76988
Updated by okurz almost 3 years ago
- Status changed from Feedback to Blocked
As we can reach the EngInfra ticket let's treat this one here as blocked, ok?
Updated by okurz almost 3 years ago
- Description updated (diff)
okurz wrote:
especially important for me is how we missed the machine being down. Well, ok, I know the answer, because we don't have alerting for that :) I guess we could try very simple monitoring with a cron job on ariel pinging machines.
That's meanwhile addressed with #106751 although that means that we need to pause the scheduled CI jobs and unpause again as soon as imagester is back. I added a section "Rollback steps" to the ticket accordingly.
Updated by okurz almost 3 years ago
- Related to action #106751: Update machines and passwords in the monitor-o3 repository added
Updated by okurz almost 3 years ago
- Status changed from Blocked to Resolved
SD ticket resolved. machine is up again. I enabled the pipeline again and triggered one and it's green. imagetester is again working on openQA jobs. Should be ok now.
Updated by okurz over 1 year ago
- Related to action #135137: Bring back imagetester size:M added