action #106933
closedUse PSU capabilites to power cycle openqaworker-arm-[1-3] instead of infra tickets size:M
0%
Description
Observation¶
Today we installed a controllable PSU (qaps06nue.qa.suse.de) into the rack for openqaworker-arm-[1-3]. We should make use of them in our automatic recovery pipeline to power cycle the BMC if it is down
Here is the mapping for each machine:
ARM1: Plug 1
ARM2: Plug 2+3
ARM3: Plug 4+5
ARM4: Plug 6
ARM6: Plug 7
Suggestions¶
Research how we can automate the power-cycle on the PSU side. The PSUs have a webinterface which can be scripted/scraped (no real API AFAIK) and several access options like e.g. ssh, telnet, ftp.
Ask nsinger for the password if you want to browse through the web ui. Keep in mind to choose a security sensible option (e.g. an encrypted channel).
Integrate this automation into our pipeline at https://gitlab.suse.de/openqa/grafana-webhook-actions/-/blob/master/ipmi-recover-worker. It should replace the create_ticket()
function (https://gitlab.suse.de/openqa/grafana-webhook-actions/-/blob/master/ipmi-recover-worker#L26-28).
Acceptance criteria¶
- AC1: Infra tickets are no longer created
- AC2: grafana-webhook-actions uses some API of the PSU to automate the power cycle
Updated by nicksinger almost 3 years ago
- Copied from action #102575: Prevent false-positive ticket reporting for openqaworker-arm-3 added
Updated by nicksinger almost 3 years ago
- Copied from deleted (action #102575: Prevent false-positive ticket reporting for openqaworker-arm-3)
Updated by nicksinger almost 3 years ago
- Related to action #102575: Prevent false-positive ticket reporting for openqaworker-arm-3 added
Updated by nicksinger almost 3 years ago
I think a "High" priority is reasonable because currently we flood infra with mails/tickets and they are already overloaded. I asked them to ignore these tickets for now as we have full access to the system our self.
Updated by okurz almost 3 years ago
- Status changed from New to In Progress
- Assignee set to okurz
I am trying with an expect script or something
Updated by nicksinger almost 3 years ago
- Assignee deleted (
okurz) - Target version deleted (
Ready)
I just recently scripted closing the FTP port. You could reuse at least the login part:
s = requests.Session()
login_params = {
"login_username": "admin",
"login_password": "<INSERT_PASSWORD_HERE>",
"submit": "Log On"
}
disable_ftp_params = {
"ftpPort": "21",
"submit": "Apply"
}
headers = {'Content-Type': 'application/x-www-form-urlencoded'}
print("Opening admin page…", end="")
req0 = s.get("http://" + host)
print("Done.")
print("Login…", end="")
req1 = s.post("http://" + host + "/Forms/login1", data=login_params, headers=headers)
print("Done.")
print("Disabling FTP…", end="")
req2 = s.post("http://" + host + "/Forms/ftpserv1", data=disable_ftp_params, headers=headers)
print("Done.")
print("Logout…", end="")
req3 = s.get("http://" + host + "/logout.htm")
print("Done.")
Updated by nicksinger almost 3 years ago
- Assignee set to okurz
- Target version set to Ready
Updated by okurz almost 3 years ago
- Due date set to 2022-03-02
- Status changed from In Progress to Feedback
thanks. Already have it covered with https://github.com/okurz/scripts/blob/master/control-switched-rack-pdu.exp :) What I have not figured out yet how to properly control one or two sockets depending on which machines we want to control. Maybe something to hack together :)
The integration into our webhook triggered recovery scripts is included with https://gitlab.suse.de/openqa/grafana-webhook-actions/-/merge_requests/19
I already added PDU_HOSTNAME and PDU_PASSWORD as variables in gitlab CI
Updated by livdywan almost 3 years ago
- Subject changed from Use PSU capabilites to power cycle openqaworker-arm-[1-3] instead of infra tickets to Use PSU capabilites to power cycle openqaworker-arm-[1-3] instead of infra tickets size:M
Updated by okurz almost 3 years ago
- Due date deleted (
2022-03-02) - Status changed from Feedback to Resolved
We have seen multiple successful runs of the pipeline since the merge in https://gitlab.suse.de/openqa/grafana-webhook-actions/-/pipelines . Likely as expected no PSU based recoveries yet as that would happen more seldomly.