coordination #120660
open
[epic] Clean up old kubernetes jobs automatically
Added by ilausuch over 1 year ago.
Updated 9 months ago.
Estimated time:
(Total: 0.00 h)
Description
Motivation¶
We had multiple openQA tests fail in run_container_in_k8s_GCE because of many pending jobs in our Kubernetes Cluster. This means that part of the cleanup process does not work as intended.
Motivation¶
Sometimes the tests doesn't clean properly the jobs in the k8s clusters. This leads to a lack of resources and tests failing.
Acceptance criteria¶
- AC 1: Check the cleanup process
- AC 2: If is a situation we cannot prevent, create a method to cleanup old jobs in each run (e.g. >1day)
- Related to action #120624: test fails in run_container_in_k8s_GCE added
As an intermediate way to resolve issues like this one can login into the google cloud Web UI and delete old and dangling jobs in the Kubernetes view therein.
- Subject changed from Clean up old jobs automatically to Clean up old kubernetes jobs automatically
- Description updated (diff)
- Assignee deleted (
ilausuch)
- Status changed from Workable to In Progress
- Assignee set to ilausuch
- Assignee deleted (
ilausuch)
AC 1 complete
There is not any reason because it could fail from the test code perspective
- The test set and store the job name in $self before launching the job. Only after this, the job is created. Therefore we'll have the job name in the cleanup if something goes wrong from there
- Also the post_run and post_fail are calling to cleanup
Ideas:
- Use
--grace-period=0 --force
as a flags for delete process
- Be sure that exists the job_name before trying to delete
- Change the assert_script_run command by script_run on the deletion process. We want to delete the image too
- Check for old jobs and delete them in each run
- Status changed from In Progress to Workable
- Assignee deleted (
ilausuch)
- Status changed from Workable to In Progress
- Assignee set to ilausuch
This is solved for EKS. Remains Google and Azure. Work in progress
- Tracker changed from action to coordination
- Subject changed from Clean up old kubernetes jobs automatically to [epic] Clean up old kubernetes jobs automatically
Moving this to EPIC, as it's an extensive task.
- Description updated (diff)
- Assignee deleted (
ilausuch)
This is a epic now. So now will be individual tasks
Also available in: Atom
PDF