Project

General

Profile

Actions

action #45041

closed

Asset cache cleanup still fails if new jobs are created at the same time

Added by mkittler about 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2018-12-12
Due date:
% Done:

0%

Estimated time:

Description

The error message is:

"{UNKNOWN}: repo/SLE-15-SP1-Module-Transactional-Server-POOL-aarch64-Build121.1-Media1 was scheduled during cleanup (max job initially 0, now 2322143) at /usr/share/openqa/script/../lib/OpenQA/Schema/ResultSet/Assets.pm line 273. at /usr/share/openqa/script/../lib/OpenQA/Schema/ResultSet/Assets.pm line 278\n"

(from https://openqa.suse.de/minion/jobs?state=failed&offset=0&task=limit_assets)


We previously wrapped the concerning cleanup code into a transaction so it would not see other changes happening in the meantime. So either the transaction has no effect or there's a bug in the condition for the die.


Related issues 2 (0 open2 closed)

Related to openQA Project (public) - action #19672: GRU may delete assets while jobs are registeredResolvedcoolo2017-06-08

Actions
Related to openQA Project (public) - action #41483: [tools] medium that should belong to job group was deleted after just 2 minutes making our SLE15 tests in build 47.1 useless ... and moreResolvedcoolo2018-09-24

Actions
Actions #1

Updated by mkittler about 6 years ago

  • Description updated (diff)
Actions #2

Updated by mkittler about 6 years ago

Apparently a transaction doesn't help here as it only protects from seeing incomplete changes from other transactions*. But the new jobs are likely not added in a transactional way.

I also tested this by adding the following to a controller:

$self->app->schema->txn_do(sub {
    log_debug('starting transaction');
    sleep 10; # in the meantime, insert a new worker via psql
    log_debug('actual lookup: ' . $self->app->schema->resultset('Workers')->count); # the count is incremented
});

This still leaves the question how to handle it instead. We could make scheduling an iso a transaction. But in general new jobs can always be added so the asset cleanup should likely be able to handle it.


* see https://www.postgresql.org/docs/8.3/tutorial-transactions.html, "when multiple transactions are running concurrently, each one should not be able to see the incomplete changes made by others"

Actions #3

Updated by mkittler about 6 years ago

  • Related to action #19672: GRU may delete assets while jobs are registered added
Actions #4

Updated by mkittler about 6 years ago

  • Related to action #41483: [tools] medium that should belong to job group was deleted after just 2 minutes making our SLE15 tests in build 47.1 useless ... and more added
Actions #5

Updated by mkittler about 6 years ago

The database operations for scheduling ISOs are actually already a transaction. Maybe the jobs which are added in the meantime are added elsewhere?

Actions #6

Updated by mkittler about 6 years ago

  • Status changed from New to Feedback

It seems that the transaction isolation level is not high enough.

PR: https://github.com/os-autoinst/openQA/pull/1919 (merged, let's see whether this fixes the issue in production)

Actions #7

Updated by mkittler about 6 years ago

  • Status changed from Feedback to Resolved

o3 was also affected by the issue and it hasn't occurred since the last deployment. So I assume it works now in production.

Actions

Also available in: Atom PDF