action #158907
closedopenQA Project - coordination #155485: [saga][epic] Efficient openQA worker pool resource handling in datacenters
coordination #158374: [epic] Prevention of inefficient hardware resource use
Automated check for machines marked as "unused" in racktables but still pingable (as they should not be powered on at all) size:M
0%
Description
Acceptance criteria¶
- AC1: A regular automated check of all "unused" machines for pingability is conducted
Suggestions¶
- See what was done in #158383-7
- Request a non-personal IDP account with mailing list usable for CI, e.g. "qe-admins-bot"? Then use a mailing list like "osd-admins+bot"
- Ensure that the bot has access to the data we need
- Find an upstream place
- Make that a periodic check, e.g. scheduled in gitlab CI pipeline schedule every day, e.g. in https://gitlab.suse.de/openqa/scripts-ci/
Updated by okurz 5 months ago
- Copied from action #158383: Crosscheck which machines marked as "unused" in racktables are still pingable (as they should not be powered on at all) size:M added
Updated by nicksinger 5 months ago ยท Edited
- Status changed from Workable to Feedback
Waiting on the account creation e-mail
Updated by nicksinger 5 months ago
- Status changed from Feedback to Workable
account mail apparently was stuck in the spam filter for some time but eventually made it. I was able to use the provided credentials in our password repository to log into racktables with it. So next I will try to implement it in a CI-run.
Updated by nicksinger 5 months ago
- Status changed from Workable to In Progress
I already added the script itself to our scripts-repo with https://github.com/os-autoinst/scripts/pull/314
I also added a new schedule to our scripts-ci project on gitlab and created https://gitlab.suse.de/openqa/scripts-ci/-/merge_requests/5 to fetch additional files. A first test in a personal project shows that we now miss some dependencies in the CI run: https://gitlab.suse.de/nicksinger/scripts-ci/-/jobs/2501549
A PR to add these is https://github.com/os-autoinst/scripts/pull/315
Updated by nicksinger 5 months ago
- Status changed from In Progress to Blocked
All MRs and PRs merged, @okurz created a new CA container with me together after the infra daily to supply the needed internal certificate. A current run can be seen here: https://gitlab.suse.de/nicksinger/scripts-ci/-/jobs/2502360#L96
Waiting for a firewall change now so that our job can access racktables.suse.de: https://sd.suse.com/servicedesk/customer/portal/1/SD-154389
Updated by nicksinger 5 months ago
- Status changed from Blocked to Workable
SD-ticket resolved and validated with https://gitlab.suse.de/nicksinger/scripts-ci/-/jobs/2513431#L116 - the script now shows an error while parsing racktables HTTP, have to check that next
Updated by nicksinger 5 months ago
- Status changed from In Progress to Feedback
Turns out that our password needs escaping which I figured out by now. Also created some more MR/PR to handle various smaller issues I encountered while debugging this:
Updated by nicksinger 5 months ago
- Status changed from Feedback to Resolved
After several additional adjustments I was finally able to enable a weekly schedule (at 04:00 on Monday) in our scripts-ci pipeline: https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2535545
Feel free to find a "unused" machine, power it on and see if it fails after the weekend.