Project

General

Profile

Actions

coordination #120660

open

[epic] Clean up old kubernetes jobs automatically

Added by ilausuch over 1 year ago. Updated 9 months ago.

Status:
In Progress
Priority:
Low
Assignee:
-
Target version:
-
Start date:
2023-01-23
Due date:
% Done:

66%

Estimated time:
(Total: 0.00 h)

Description

Motivation

We had multiple openQA tests fail in run_container_in_k8s_GCE because of many pending jobs in our Kubernetes Cluster. This means that part of the cleanup process does not work as intended.

Motivation

Sometimes the tests doesn't clean properly the jobs in the k8s clusters. This leads to a lack of resources and tests failing.

Acceptance criteria

  • AC 1: Check the cleanup process
  • AC 2: If is a situation we cannot prevent, create a method to cleanup old jobs in each run (e.g. >1day)

Subtasks 9 (3 open6 closed)

action #123499: PCW: Cleanup old jobs in google kubernetesResolvedilausuch2023-01-23

Actions
action #123502: PCW: Cleanup old jobs in azure kubernetesResolvedilausuch2023-01-23

Actions
action #123505: Replace kubeconf adquire method on amazon job cleanup Closed2023-01-23

Actions
action #123511: Helm chart test is not deleting the namespaces properlyResolvedpdostal2023-01-23

Actions
action #123514: [PCW] Cleanup namespaces in amazon kubernetes clusterWorkable2023-01-23

Actions
action #123517: [PCW] Cleanup namespaces in google kubernetes clusterWorkable2023-01-23

Actions
action #123520: [PCW] Cleanup namespaces in azure kubernetes clusterWorkable2023-01-23

Actions
action #123730: PCW: Create a container to clean up the leftovers of the kubernetes clustersResolvedilausuch2023-01-27

Actions
action #124664: PCW: Move the kubernetes cleanup for jobs in Amazon to the cleanup_k8s scriptResolvedilausuch2023-02-16

Actions

Related issues 1 (0 open1 closed)

Related to Containers - action #120624: test fails in run_container_in_k8s_GCEResolvedilausuch2022-11-16

Actions
Actions #1

Updated by ph03nix over 1 year ago

  • Related to action #120624: test fails in run_container_in_k8s_GCE added
Actions #2

Updated by ph03nix over 1 year ago

As an intermediate way to resolve issues like this one can login into the google cloud Web UI and delete old and dangling jobs in the Kubernetes view therein.

Actions #3

Updated by ph03nix over 1 year ago

  • Subject changed from Clean up old jobs automatically to Clean up old kubernetes jobs automatically
Actions #4

Updated by ph03nix over 1 year ago

  • Description updated (diff)
  • Assignee deleted (ilausuch)
Actions #5

Updated by ilausuch over 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to ilausuch
Actions #6

Updated by ilausuch over 1 year ago

  • Assignee deleted (ilausuch)

AC 1 complete
There is not any reason because it could fail from the test code perspective

  • The test set and store the job name in $self before launching the job. Only after this, the job is created. Therefore we'll have the job name in the cleanup if something goes wrong from there
  • Also the post_run and post_fail are calling to cleanup

Ideas:

  • Use --grace-period=0 --force as a flags for delete process
  • Be sure that exists the job_name before trying to delete
  • Change the assert_script_run command by script_run on the deletion process. We want to delete the image too
  • Check for old jobs and delete them in each run
Actions #7

Updated by ilausuch over 1 year ago

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15950
This PR not necessary means to be a solution for the problem, but the code is better now.
Passing to the next step

Actions #8

Updated by ph03nix over 1 year ago

  • Assignee set to ilausuch
Actions #9

Updated by ilausuch over 1 year ago

  • Status changed from In Progress to Workable
Actions #10

Updated by ilausuch over 1 year ago

  • Assignee deleted (ilausuch)
Actions #11

Updated by ilausuch over 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to ilausuch
Actions #13

Updated by ilausuch over 1 year ago

This is solved for EKS. Remains Google and Azure. Work in progress

Actions #14

Updated by jlausuch over 1 year ago

  • Tracker changed from action to coordination
  • Subject changed from Clean up old kubernetes jobs automatically to [epic] Clean up old kubernetes jobs automatically

Moving this to EPIC, as it's an extensive task.

Actions #15

Updated by ilausuch over 1 year ago

  • Description updated (diff)
Actions #16

Updated by ilausuch over 1 year ago

  • Assignee deleted (ilausuch)

This is a epic now. So now will be individual tasks

Actions

Also available in: Atom PDF