action #109292
closedOSD is missing x86_64 jobs duplicate key value violates unique constraint "assets_type_name" in lib/OpenQA/Schema/ResultSet/Assets.pm line 33 within find_or_create
Description
With the last two (117.1,118.3) or three builds x86_64 jobs are missing.
The very first time there was a dependency circle issue with one of the job group yaml. That found to prevent the scheduling. However the jobs are keep missing even after the correction and the scheduling looks to work without problem after manual intervention.
Files
Updated by JERiveraMoya over 2 years ago
https://openqa.suse.de/minion/jobs?id=4181330
4181330 schedule_iso default 15 hours ago
finished a few seconds
---
args:
- scheduled_product_id: 889867
scheduling_params:
ARCH: x86_64
ASSET_256: SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso.sha256
BUILD: '118.3'
BUILD_HA: '118.3'
BUILD_SDK: '118.3'
BUILD_SES: '118.3'
BUILD_SLE: '118.3'
CHECKSUM_ISO: 000f3eef757f334ff367ebd7bd715816be97f4e206ca96cc69dc2791e42c8748
DISTRI: SLE
FLAVOR: Online
ISO: SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso
MIRROR_FTP: ftp://openqa.suse.de/SLE-15-SP4-Online-x86_64-Build118.3-Media1
MIRROR_HTTP: http://openqa.suse.de/assets/repo/SLE-15-SP4-Online-x86_64-Build118.3-Media1
MIRROR_HTTPS: https://openqa.suse.de/assets/repo/SLE-15-SP4-Online-x86_64-Build118.3-Media1
MIRROR_NFS: nfs://openqa.suse.de/var/lib/openqa/share/factory/repo/SLE-15-SP4-Online-x86_64-Build118.3-Media1
MIRROR_SMB: smb://openqa.suse.de/inst/SLE-15-SP4-Online-x86_64-Build118.3-Media1
REPO_0: SLE-15-SP4-Online-x86_64-Build118.3-Media1
REPO_10: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media2
REPO_11: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media3
REPO_12: SLE-15-SP4-Module-Desktop-Applications-POOL-x86_64-Build118.3-Media1
REPO_13: SLE-15-SP4-Module-Development-Tools-POOL-x86_64-Build118.3-Media1
REPO_14: SLE-15-SP4-Module-Legacy-POOL-x86_64-Build118.3-Media1
REPO_15: SLE-15-SP4-Module-SAP-Applications-POOL-x86_64-Build118.3-Media1
REPO_16: SLE-15-SP4-Module-Server-Applications-POOL-x86_64-Build118.3-Media1
REPO_17: SLE-15-SP4-Module-Public-Cloud-POOL-x86_64-Build118.3-Media1
REPO_18: SLE-15-SP4-Module-Web-Scripting-POOL-x86_64-Build118.3-Media1
REPO_19: SLE-15-SP4-Module-Containers-POOL-x86_64-Build118.3-Media1
REPO_20: SLE-15-SP4-Module-Live-Patching-POOL-x86_64-Build118.3-Media1
REPO_21: SLE-15-SP4-Module-Transactional-Server-POOL-x86_64-Build118.3-Media1
REPO_22: SLE-15-SP4-Module-Python3-POOL-x86_64-Build118.3-Media1
REPO_23: SLE-15-SP4-Module-Packagehub-Subpackages-POOL-x86_64-Build118.3-Media1
REPO_24: SLE-15-SP4-Module-HPC-POOL-x86_64-Build118.3-Media1
REPO_25: SLE-15-SP4-Module-RT-POOL-x86_64-Build118.3-Media1
REPO_26: SLE-15-SP4-Module-Certifications-POOL-x86_64-Build118.3-Media1
REPO_27: SLE-15-SP4-Product-SLES-POOL-x86_64-Build118.3-Media1
REPO_28: SLE-15-SP4-Product-SLES-POOL-x86_64-Build118.3-Media1.license
REPO_29: SLE-15-SP4-Product-SLED-POOL-x86_64-Build118.3-Media1
REPO_30: SLE-15-SP4-Product-SLED-POOL-x86_64-Build118.3-Media1.license
REPO_31: SLE-15-SP4-Product-SLES_SAP-POOL-x86_64-Build118.3-Media1
REPO_32: SLE-15-SP4-Product-SLES_SAP-POOL-x86_64-Build118.3-Media1.license
REPO_33: SLE-15-SP4-Product-HPC-POOL-x86_64-Build118.3-Media1
REPO_34: SLE-15-SP4-Product-HPC-POOL-x86_64-Build118.3-Media1.license
REPO_35: SLE-15-SP4-Product-HPC-LTSS-POOL-x86_64-Build118.3-Media1
REPO_36: SLE-15-SP4-Product-HPC-ESPOS-POOL-x86_64-Build118.3-Media1
REPO_37: SLE-15-SP4-Product-RT-POOL-x86_64-Build118.3-Media1
REPO_38: SLE-15-SP4-Product-RT-POOL-x86_64-Build118.3-Media1.license
REPO_39: SLE-15-SP4-Product-WE-POOL-x86_64-Build118.3-Media1
REPO_40: SLE-15-SP4-Product-WE-POOL-x86_64-Build118.3-Media1.license
REPO_41: SLE-15-SP4-Product-HA-POOL-x86_64-Build118.3-Media1
REPO_42: SLE-15-SP4-Product-HA-POOL-x86_64-Build118.3-Media1.license
REPO_9: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_BASESYSTEM: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_BASESYSTEM_DEBUG: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media3
REPO_SLE_MODULE_BASESYSTEM_DEBUG_PACKAGES: coreutils*,kernel-default*,selinux*,yast2-network*,yast2-http-server*
REPO_SLE_MODULE_BASESYSTEM_SOURCE: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media2
REPO_SLE_MODULE_BASESYSTEM_SOURCE_PACKAGES: java*,kernel-default*,selinux*,yast2-network*,yast2-http-server*
REPO_SLE_MODULE_CERTIFICATIONS: SLE-15-SP4-Module-Certifications-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_CONTAINERS: SLE-15-SP4-Module-Containers-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_DESKTOP_APPLICATIONS: SLE-15-SP4-Module-Desktop-Applications-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_DEVELOPMENT_TOOLS: SLE-15-SP4-Module-Development-Tools-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_HPC: SLE-15-SP4-Module-HPC-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_LEGACY: SLE-15-SP4-Module-Legacy-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_LIVE_PATCHING: SLE-15-SP4-Module-Live-Patching-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_PACKAGEHUB_SUBPACKAGES: SLE-15-SP4-Module-Packagehub-Subpackages-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_PUBLIC_CLOUD: SLE-15-SP4-Module-Public-Cloud-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_PYTHON3: SLE-15-SP4-Module-Python3-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_RT: SLE-15-SP4-Module-RT-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_SAP_APPLICATIONS: SLE-15-SP4-Module-SAP-Applications-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_SERVER_APPLICATIONS: SLE-15-SP4-Module-Server-Applications-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_TRANSACTIONAL_SERVER: SLE-15-SP4-Module-Transactional-Server-POOL-x86_64-Build118.3-Media1
REPO_SLE_MODULE_WEB_SCRIPTING: SLE-15-SP4-Module-Web-Scripting-POOL-x86_64-Build118.3-Media1
REPO_SLE_PRODUCT_HA: SLE-15-SP4-Product-HA-POOL-x86_64-Build118.3-Media1
REPO_SLE_PRODUCT_HPC: SLE-15-SP4-Product-HPC-POOL-x86_64-Build118.3-Media1
REPO_SLE_PRODUCT_HPC_ESPOS: SLE-15-SP4-Product-HPC-ESPOS-POOL-x86_64-Build118.3-Media1
REPO_SLE_PRODUCT_HPC_LTSS: SLE-15-SP4-Product-HPC-LTSS-POOL-x86_64-Build118.3-Media1
REPO_SLE_PRODUCT_RT: SLE-15-SP4-Product-RT-POOL-x86_64-Build118.3-Media1
REPO_SLE_PRODUCT_SLED: SLE-15-SP4-Product-SLED-POOL-x86_64-Build118.3-Media1
REPO_SLE_PRODUCT_SLES: SLE-15-SP4-Product-SLES-POOL-x86_64-Build118.3-Media1
REPO_SLE_PRODUCT_SLES_SAP: SLE-15-SP4-Product-SLES_SAP-POOL-x86_64-Build118.3-Media1
REPO_SLE_PRODUCT_WE: SLE-15-SP4-Product-WE-POOL-x86_64-Build118.3-Media1
VERSION: 15-SP4
_DEPRIORITIZEBUILD: '1'
attempts: 1
children: []
created: 2022-03-30T17:03:13.637306Z
delayed: 2022-03-30T17:03:13.637306Z
expires: 2022-03-30T17:13:13.637306Z
finished: 2022-03-30T17:03:45.927807Z
id: 4181330
lax: 0
notes:
gru_id: 31295142
parents: []
priority: 10
queue: default
result:
error: |
DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR: duplicate key value violates unique constraint "assets_type_name"
DETAIL: Key (type, name)=(iso, SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso) already exists. [for Statement "INSERT INTO assets (name, t_created, t_updated, type) VALUES (?, ?, ?, ?) RETURNING id" with ParamValues: 1='SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso', 2='2022-03-30 17:03:37', 3='2022-03-30 17:03:37', 4='iso'] at /usr/share/openqa/script/../lib/OpenQA/Schema/ResultSet/Assets.pm line 33
retried: ~
retries: 0
started: 2022-03-30T17:03:37.057702Z
state: finished
task: schedule_iso
time: 2022-03-31T08:01:41.391308Z
worker: 685
Updated by okurz over 2 years ago
- Project changed from 46 to openQA Project
- Subject changed from OSD is missing x86_64 jobs to OSD is missing x86_64 jobs duplicate key value violates unique constraint "assets_type_name" in lib/OpenQA/Schema/ResultSet/Assets.pm line 33 within find_or_create
- Category set to Regressions/Crashes
- Priority changed from Normal to Immediate
- Target version set to Ready
Updated by mkittler over 2 years ago
I've been trying to re-schedule: https://openqa.suse.de/admin/productlog?id=891038
However, that doesn't seem to have worked. At least I couldn't find a corresponding job in the Minion dashboard.
Updated by mkittler over 2 years ago
Seems like that's the reason:
(This showed on https://openqa.suse.de/admin/productlog?id=889867 after a while when clicking the retry button. However, that re-triggering is broken is likely unrelated.)
Updated by mkittler over 2 years ago
I cannot reproduce the issue with sudo -u geekotest /usr/share/openqa/script/openqa eval -V 'app->schema->resultset("Assets")->register("iso", "SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso", {missing_ok => 1})->id'
on OSD (because that returns the expected ID).
Updated by mkittler over 2 years ago
- Status changed from New to In Progress
- Assignee set to mkittler
Despite the timeout the re-triggering actually worked: https://openqa.suse.de/admin/productlog?id=891038
Apparently we don't re-trigger asynchronously - at least that would explain the absence of a Minion job and the timeout.
The error is actually gone now and jobs have been scheduled. There are still errors regarding dependencies you might want to look into.
Was this the only problematic scheduled product? I suppose I'll have to check for further occurrences. Likely the problem is that the same ISO
is used in different scheduled products and the scheduling took place concurrently and DBIx's find_or_create
is not atomic.
Updated by okurz over 2 years ago
mkittler wrote:
Apparently we don't re-trigger is async
english?
Updated by mkittler over 2 years ago
The following scheduled products had the same error:
- https://openqa.suse.de/admin/productlog?id=890824
- https://openqa.suse.de/admin/productlog?id=890301
- https://openqa.suse.de/admin/productlog?id=890047
- https://openqa.suse.de/admin/productlog?id=889395
- https://openqa.suse.de/admin/productlog?id=888031
- https://openqa.suse.de/admin/productlog?id=883598
- https://openqa.suse.de/admin/productlog?id=883192
I amended the typo in the previous comment.
Updated by mkittler over 2 years ago
By the way, it isn't really a new issue: https://openqa.suse.de/admin/productlog?id=633835 https://openqa.suse.de/admin/productlog?id=517260 https://openqa.suse.de/admin/productlog?id=431568
Updated by mkittler over 2 years ago
- Related to action #35749: Ignore insert errors in limit_assets added
Updated by mkittler over 2 years ago
- Status changed from In Progress to Feedback
Updated by okurz over 2 years ago
- Priority changed from Immediate to High
I guess another problem is that the trigger script does not catch the error in the async output. When switching from synchronous to asynchronous triggering I suggested anikitin to await a successful confirmation in the asynchronous message but that was not done yet. I would appreciate if you can look into a solution for that as well. At best we avoid polling the openQA API to see if the scheduled product suceeded. I am sure we can find better ideas then that.
Updated by mkittler over 2 years ago
The fix has been deployed on 02.04.22 07:23 CEST on OSD. I'll use select id, t_created, t_updated as error from scheduled_products where results ->> 'error' like '%unique constraint%' order by id desc limit 10;
in a few days to check whether the problem is fixed. So far there's no further occurrence.
I'm afraid polling is the best we can do (without adding/changing existing routes in openQA and risking timeouts), e.g.:
error=$(openqa-cli api --osd --pretty isos/890824 | jq '.results | .error')
[[ $error != null ]] && echo "handle error"
Updated by mkittler over 2 years ago
I did some digging and I suppose the relevant commit/repository/place is https://github.com/os-autoinst/openqa-trigger-from-obs/commit/1d575e09c3b6192196728fee50c983079786671a. Unfortunately that's a Bash-for-loop generated by a Bash script generated by Python code.
I suppose I could let it generate something like:
product_id=$(openqa-cli api -X post isos?async=1 … | jq '.scheduled_product_id')
timeout=${timeout:-480}
poll_interval=${poll_inverval:-10}
response=$(openqa-cli api "isos/$product_id")
status=$(echo "$response" | jq -r '.status')
while [[ $status != scheduled ]]; then
timeout=$((timeout - poll_interval))
if [[ $timeout -le 0 ]]; then
echo "timeout exceeded when waiting for scheduled product $product_id"
fi
sleep "$poll_inverval"
response=$(openqa-cli api --osd "isos/$product_id")
status=$(echo "$response" | jq -r '.status')
done
error=$(echo "$response" | jq '.results | .error')
if [[ $error != null ]]; then
echo "unable to schedule product $product_id: $error"
exit 1
fi
Updated by mkittler over 2 years ago
I tried something but it isn't working as expected. Not sure whether I should try harder. Considering all the escaping and no good way of testing it, it doesn't seem a worthwhile improvement. So I'd rather consider this ticket done if the error is fixed on the openQA-side.
Updated by okurz over 2 years ago
Then I suggest we discuss in the team what we can do. Because when you say https://github.com/os-autoinst/openqa-trigger-from-obs is in this area both broken by design and not fixable then we would have a bigger problem, right? :)
EDIT: Discussed in daily 2022-04-06 and we agreed that we should look into a new subcommand for openqa-cli which does the "trigger asynchronously & await results". Any 3rd party tooling including openqa-trigger-from-obs can use that new command then to report back when the scheduling yields errors.
Updated by mkittler over 2 years ago
Well, I'm at least saying that its design makes it hard to make the iso posting a bit more sophisticated. However, I suppose my second attempt would be creating a distinct script to do the API call (including the waiting) and only invoke that in the generated scripts. But I'm also not sure whether that would be the best way and how to ensure that the wrapper script is present in the context it is called.
Updated by livdywan over 2 years ago
mkittler wrote:
Well, I'm at least saying that its design makes it hard to make the iso posting a bit more sophisticated. However, I suppose my second attempt would be creating a distinct script to do the API call (including the waiting) and only invoke that in the generated scripts. But I'm also not sure whether that would be the best way and how to ensure that the wrapper script is present in the context it is called.
Discussed it briefly. We should have a follow-up ticket for a new cli command for this
Updated by mkittler over 2 years ago
- Related to action #109560: Add openqa-cli sub-command for async scheduling and keeping track of the result added
Updated by mkittler over 2 years ago
I've created the ticket and so far select id, t_created, t_updated as error from scheduled_products where t_created >= '2022-04-02' and results ->> 'error' like '%unique constraint%' order by id desc limit 10;
doesn't return any results.
Updated by ybonatakis over 2 years ago
- Status changed from Feedback to Resolved
Resolved as there is no problem with the latest build.