coordination #120660
open[epic] Clean up old kubernetes jobs automatically
66%
Description
Motivation¶
We had multiple openQA tests fail in run_container_in_k8s_GCE because of many pending jobs in our Kubernetes Cluster. This means that part of the cleanup process does not work as intended.
Motivation¶
Sometimes the tests doesn't clean properly the jobs in the k8s clusters. This leads to a lack of resources and tests failing.
Acceptance criteria¶
- AC 1: Check the cleanup process
- AC 2: If is a situation we cannot prevent, create a method to cleanup old jobs in each run (e.g. >1day)
Updated by ph03nix over 1 year ago
- Related to action #120624: test fails in run_container_in_k8s_GCE added
Updated by ph03nix over 1 year ago
As an intermediate way to resolve issues like this one can login into the google cloud Web UI and delete old and dangling jobs in the Kubernetes view therein.
Updated by ph03nix over 1 year ago
- Subject changed from Clean up old jobs automatically to Clean up old kubernetes jobs automatically
Updated by ph03nix over 1 year ago
- Description updated (diff)
- Assignee deleted (
ilausuch)
Updated by ilausuch over 1 year ago
- Status changed from Workable to In Progress
- Assignee set to ilausuch
Updated by ilausuch over 1 year ago
- Assignee deleted (
ilausuch)
AC 1 complete
There is not any reason because it could fail from the test code perspective
- The test set and store the job name in $self before launching the job. Only after this, the job is created. Therefore we'll have the job name in the cleanup if something goes wrong from there
- Also the post_run and post_fail are calling to cleanup
Ideas:
- Use
--grace-period=0 --force
as a flags for delete process - Be sure that exists the job_name before trying to delete
- Change the assert_script_run command by script_run on the deletion process. We want to delete the image too
- Check for old jobs and delete them in each run
Updated by ilausuch over 1 year ago
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15950
This PR not necessary means to be a solution for the problem, but the code is better now.
Passing to the next step
Updated by ilausuch over 1 year ago
- Status changed from Workable to In Progress
- Assignee set to ilausuch
Updated by ilausuch over 1 year ago
This is solved for EKS. Remains Google and Azure. Work in progress
Updated by jlausuch over 1 year ago
- Tracker changed from action to coordination
- Subject changed from Clean up old kubernetes jobs automatically to [epic] Clean up old kubernetes jobs automatically
Moving this to EPIC, as it's an extensive task.
Updated by ilausuch over 1 year ago
- Assignee deleted (
ilausuch)
This is a epic now. So now will be individual tasks