action #45041: Asset cache cleanup still fails if new jobs are created at the same time - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #45041

closed

Asset cache cleanup still fails if new jobs are created at the same time

Added by mkittler over 6 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

mkittler

Category:

Target version:

Current Sprint

Start date:

2018-12-12

Due date:

% Done:

Estimated time:

Description

The error message is:

"{UNKNOWN}: repo/SLE-15-SP1-Module-Transactional-Server-POOL-aarch64-Build121.1-Media1 was scheduled during cleanup (max job initially 0, now 2322143) at /usr/share/openqa/script/../lib/OpenQA/Schema/ResultSet/Assets.pm line 273. at /usr/share/openqa/script/../lib/OpenQA/Schema/ResultSet/Assets.pm line 278\n"

(from https://openqa.suse.de/minion/jobs?state=failed&offset=0&task=limit_assets)

We previously wrapped the concerning cleanup code into a transaction so it would not see other changes happening in the meantime. So either the transaction has no effect or there's a bug in the condition for the die.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by mkittler over 6 years ago

Description updated (diff)

Actions

Copy link

Updated by mkittler over 6 years ago

Apparently a transaction doesn't help here as it only protects from seeing incomplete changes from other transactions*. But the new jobs are likely not added in a transactional way.

I also tested this by adding the following to a controller:

$self->app->schema->txn_do(sub {
    log_debug('starting transaction');
    sleep 10; # in the meantime, insert a new worker via psql
    log_debug('actual lookup: ' . $self->app->schema->resultset('Workers')->count); # the count is incremented
});

This still leaves the question how to handle it instead. We could make scheduling an iso a transaction. But in general new jobs can always be added so the asset cleanup should likely be able to handle it.

* see https://www.postgresql.org/docs/8.3/tutorial-transactions.html, "when multiple transactions are running concurrently, each one should not be able to see the incomplete changes made by others"

Actions

Copy link

Updated by mkittler over 6 years ago

Related to action #19672: GRU may delete assets while jobs are registered added

Actions

Copy link

Updated by mkittler over 6 years ago

Related to action #41483: [tools] medium that should belong to job group was deleted after just 2 minutes making our SLE15 tests in build 47.1 useless ... and more added

Actions

Copy link