action #45041
closed
Asset cache cleanup still fails if new jobs are created at the same time
Added by mkittler almost 6 years ago.
Updated almost 6 years ago.
Description
The error message is:
"{UNKNOWN}: repo/SLE-15-SP1-Module-Transactional-Server-POOL-aarch64-Build121.1-Media1 was scheduled during cleanup (max job initially 0, now 2322143) at /usr/share/openqa/script/../lib/OpenQA/Schema/ResultSet/Assets.pm line 273. at /usr/share/openqa/script/../lib/OpenQA/Schema/ResultSet/Assets.pm line 278\n"
(from https://openqa.suse.de/minion/jobs?state=failed&offset=0&task=limit_assets)
We previously wrapped the concerning cleanup code into a transaction so it would not see other changes happening in the meantime. So either the transaction has no effect or there's a bug in the condition for the die.
- Description updated (diff)
Apparently a transaction doesn't help here as it only protects from seeing incomplete changes from other transactions*. But the new jobs are likely not added in a transactional way.
I also tested this by adding the following to a controller:
$self->app->schema->txn_do(sub {
log_debug('starting transaction');
sleep 10; # in the meantime, insert a new worker via psql
log_debug('actual lookup: ' . $self->app->schema->resultset('Workers')->count); # the count is incremented
});
This still leaves the question how to handle it instead. We could make scheduling an iso a transaction. But in general new jobs can always be added so the asset cleanup should likely be able to handle it.
* see https://www.postgresql.org/docs/8.3/tutorial-transactions.html, "when multiple transactions are running concurrently, each one should not be able to see the incomplete changes made by others"
- Related to action #19672: GRU may delete assets while jobs are registered added
- Related to action #41483: [tools] medium that should belong to job group was deleted after just 2 minutes making our SLE15 tests in build 47.1 useless ... and more added
The database operations for scheduling ISOs are actually already a transaction. Maybe the jobs which are added in the meantime are added elsewhere?
- Status changed from New to Feedback
- Status changed from Feedback to Resolved
o3 was also affected by the issue and it hasn't occurred since the last deployment. So I assume it works now in production.
Also available in: Atom
PDF