Project

General

Profile

action #92176

[alert] openqaworker-arm-3 offline and CI pipeline unable to send email but stating "passed"

Added by okurz 5 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2021-05-05
Due date:
2021-05-21
% Done:

0%

Estimated time:

Description

Observation

https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?orgId=1&editPanel=7&tab=alert shows that the machine openqaworker-arm-3 is offline and https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/415264 is green but shows:

Attempting to reboot openqaworker-arm-3
Error: Unable to establish IPMI v2 / RMCP+ session
/usr/sbin/sendmail: No such file or directory
. . . message not sent.

so two problems: email could not be sent but also that did not fail the pipeline

Acceptance criteria

  • AC1: pipeline fails in case email sending does not work
  • AC2: email sending does work again for now for the above observed case

Related issues

Related to openQA Infrastructure - action #89815: osd-deployment blocked by openqaworker-arm-3 offline and not recovered automaticallyResolved2021-03-102021-04-22

Related to openQA Infrastructure - action #76876: Find a better (automated) way to inform infra about hanging (arm) workersResolved2020-11-02

History

#1 Updated by okurz 5 months ago

  • Related to action #89815: osd-deployment blocked by openqaworker-arm-3 offline and not recovered automatically added

#2 Updated by mkittler 5 months ago

  • Assignee set to mkittler

#3 Updated by mkittler 5 months ago

  • Status changed from Workable to In Progress

I've now rebooted the machine manually and it came up normally. Nothing special was required (… chassis power cycle did the trick).

I will look into the problem with the automatic recovery.

#4 Updated by mkittler 5 months ago

Looks like the sendmail binary (which would be provided by the postfix package) is configured as mail-application but it is missing in the container the jobs run in (registry.opensuse.org/home/okurz/container/containers/tumbleweed:ipmitool-ping-nc-mailx). Apparently the intention is to use mailx so it should likely be configured explicitly. Maybe the following change to the Dockerfile helps: https://build.opensuse.org/package/view_file/home:mkittler:branches:home:okurz:container/ipmitool-ping-nc-mailx/Dockerfile?expand=1

#5 Updated by okurz 5 months ago

hm, ok. But previously sending emails was working. Maybe something changed in the package setup. Within osd-deployment we also send emails. AFAIK we use "mutt" for sending emails in these cases so I suggest to use the same here.

#6 Updated by openqa_review 5 months ago

  • Due date set to 2021-05-21

Setting due date based on mean cycle time of SUSE QE Tools

#7 Updated by mkittler 5 months ago

I've tested your container locally (not my version) and it can resolve the mail command. It links to /etc/alternatives/mail which links to /usr/bin/mailx. Maybe this was just a temporary issue which has already been fixed in the current container version? (The image is based on Tumbleweed and Docker says the latest version is only 4 hours old so apparently it is automatically updated.)

Where comes mutt into play? Your Dockerfile explicitly installs mailx (and not mutt). The osd-deployment pipeline uses also just the mail command but a different image (which doesn't seem to install a special mail client).

#8 Updated by okurz 5 months ago

mkittler wrote:

I've tested your container locally (not my version) and it can resolve the mail command. It links to /etc/alternatives/mail which links to /usr/bin/mailx. Maybe this was just a temporary issue which has already been fixed in the current container version? (The image is based on Tumbleweed and Docker says the latest version is only 4 hours old so apparently it is automatically updated.)

Where comes mutt into play? Your Dockerfile explicitly installs mailx (and not mutt). The osd-deployment pipeline uses also just the mail command but a different image (which doesn't seem to install a special mail client).

Right. I apparently got that confused.

Ok, assuming that the issue might be fixed again upstream we should still not ignore errors when a mail could not be sent, right?

#9 Updated by okurz 5 months ago

  • Related to action #76876: Find a better (automated) way to inform infra about hanging (arm) workers added

#10 Updated by okurz 5 months ago

Originally we assumed that email sending did work as that should have been done in #76876 but we never ensured that. With mkittler I tried out to get email sending done without resorting to what we do in e.g. osd-deployment where we login over ssh to osd which is already capable of sending emails itself. So we found one that could be seen as simplest how to send emails:

zypper -n in msmtp
echo -e "Subject: email from msmtp\n\ntest" | SMTPSERVER=relay.suse.de msmtp --from okurz@suse.de -t okurz@suse.de

what should be changed of course is to have a container image that already provides msmtp and then use variables with defaults instead of hardcoded values.

#12 Updated by mkittler 5 months ago

  • Status changed from In Progress to Resolved

The SR has been merged and I've been testing whether sending mails works using the msmtp command within the container locally.

Also available in: Atom PDF