action #109292
OSD is missing x86_64 jobs duplicate key value violates unique constraint "assets_type_name" in lib/OpenQA/Schema/ResultSet/Assets.pm line 33 within find_or_create
0%
Description
With the last two (117.1,118.3) or three builds x86_64 jobs are missing.
The very first time there was a dependency circle issue with one of the job group yaml. That found to prevent the scheduling. However the jobs are keep missing even after the correction and the scheduling looks to work without problem after manual intervention.
Related issues
History
#1
Updated by JERiveraMoya 3 months ago
https://openqa.suse.de/minion/jobs?id=4181330
4181330 schedule_iso default 15 hours ago finished a few seconds --- args: - scheduled_product_id: 889867 scheduling_params: ARCH: x86_64 ASSET_256: SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso.sha256 BUILD: '118.3' BUILD_HA: '118.3' BUILD_SDK: '118.3' BUILD_SES: '118.3' BUILD_SLE: '118.3' CHECKSUM_ISO: 000f3eef757f334ff367ebd7bd715816be97f4e206ca96cc69dc2791e42c8748 DISTRI: SLE FLAVOR: Online ISO: SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso MIRROR_FTP: ftp://openqa.suse.de/SLE-15-SP4-Online-x86_64-Build118.3-Media1 MIRROR_HTTP: http://openqa.suse.de/assets/repo/SLE-15-SP4-Online-x86_64-Build118.3-Media1 MIRROR_HTTPS: https://openqa.suse.de/assets/repo/SLE-15-SP4-Online-x86_64-Build118.3-Media1 MIRROR_NFS: nfs://openqa.suse.de/var/lib/openqa/share/factory/repo/SLE-15-SP4-Online-x86_64-Build118.3-Media1 MIRROR_SMB: smb://openqa.suse.de/inst/SLE-15-SP4-Online-x86_64-Build118.3-Media1 REPO_0: SLE-15-SP4-Online-x86_64-Build118.3-Media1 REPO_10: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media2 REPO_11: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media3 REPO_12: SLE-15-SP4-Module-Desktop-Applications-POOL-x86_64-Build118.3-Media1 REPO_13: SLE-15-SP4-Module-Development-Tools-POOL-x86_64-Build118.3-Media1 REPO_14: SLE-15-SP4-Module-Legacy-POOL-x86_64-Build118.3-Media1 REPO_15: SLE-15-SP4-Module-SAP-Applications-POOL-x86_64-Build118.3-Media1 REPO_16: SLE-15-SP4-Module-Server-Applications-POOL-x86_64-Build118.3-Media1 REPO_17: SLE-15-SP4-Module-Public-Cloud-POOL-x86_64-Build118.3-Media1 REPO_18: SLE-15-SP4-Module-Web-Scripting-POOL-x86_64-Build118.3-Media1 REPO_19: SLE-15-SP4-Module-Containers-POOL-x86_64-Build118.3-Media1 REPO_20: SLE-15-SP4-Module-Live-Patching-POOL-x86_64-Build118.3-Media1 REPO_21: SLE-15-SP4-Module-Transactional-Server-POOL-x86_64-Build118.3-Media1 REPO_22: SLE-15-SP4-Module-Python3-POOL-x86_64-Build118.3-Media1 REPO_23: SLE-15-SP4-Module-Packagehub-Subpackages-POOL-x86_64-Build118.3-Media1 REPO_24: SLE-15-SP4-Module-HPC-POOL-x86_64-Build118.3-Media1 REPO_25: SLE-15-SP4-Module-RT-POOL-x86_64-Build118.3-Media1 REPO_26: SLE-15-SP4-Module-Certifications-POOL-x86_64-Build118.3-Media1 REPO_27: SLE-15-SP4-Product-SLES-POOL-x86_64-Build118.3-Media1 REPO_28: SLE-15-SP4-Product-SLES-POOL-x86_64-Build118.3-Media1.license REPO_29: SLE-15-SP4-Product-SLED-POOL-x86_64-Build118.3-Media1 REPO_30: SLE-15-SP4-Product-SLED-POOL-x86_64-Build118.3-Media1.license REPO_31: SLE-15-SP4-Product-SLES_SAP-POOL-x86_64-Build118.3-Media1 REPO_32: SLE-15-SP4-Product-SLES_SAP-POOL-x86_64-Build118.3-Media1.license REPO_33: SLE-15-SP4-Product-HPC-POOL-x86_64-Build118.3-Media1 REPO_34: SLE-15-SP4-Product-HPC-POOL-x86_64-Build118.3-Media1.license REPO_35: SLE-15-SP4-Product-HPC-LTSS-POOL-x86_64-Build118.3-Media1 REPO_36: SLE-15-SP4-Product-HPC-ESPOS-POOL-x86_64-Build118.3-Media1 REPO_37: SLE-15-SP4-Product-RT-POOL-x86_64-Build118.3-Media1 REPO_38: SLE-15-SP4-Product-RT-POOL-x86_64-Build118.3-Media1.license REPO_39: SLE-15-SP4-Product-WE-POOL-x86_64-Build118.3-Media1 REPO_40: SLE-15-SP4-Product-WE-POOL-x86_64-Build118.3-Media1.license REPO_41: SLE-15-SP4-Product-HA-POOL-x86_64-Build118.3-Media1 REPO_42: SLE-15-SP4-Product-HA-POOL-x86_64-Build118.3-Media1.license REPO_9: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_BASESYSTEM: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_BASESYSTEM_DEBUG: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media3 REPO_SLE_MODULE_BASESYSTEM_DEBUG_PACKAGES: coreutils*,kernel-default*,selinux*,yast2-network*,yast2-http-server* REPO_SLE_MODULE_BASESYSTEM_SOURCE: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media2 REPO_SLE_MODULE_BASESYSTEM_SOURCE_PACKAGES: java*,kernel-default*,selinux*,yast2-network*,yast2-http-server* REPO_SLE_MODULE_CERTIFICATIONS: SLE-15-SP4-Module-Certifications-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_CONTAINERS: SLE-15-SP4-Module-Containers-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_DESKTOP_APPLICATIONS: SLE-15-SP4-Module-Desktop-Applications-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_DEVELOPMENT_TOOLS: SLE-15-SP4-Module-Development-Tools-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_HPC: SLE-15-SP4-Module-HPC-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_LEGACY: SLE-15-SP4-Module-Legacy-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_LIVE_PATCHING: SLE-15-SP4-Module-Live-Patching-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_PACKAGEHUB_SUBPACKAGES: SLE-15-SP4-Module-Packagehub-Subpackages-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_PUBLIC_CLOUD: SLE-15-SP4-Module-Public-Cloud-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_PYTHON3: SLE-15-SP4-Module-Python3-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_RT: SLE-15-SP4-Module-RT-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_SAP_APPLICATIONS: SLE-15-SP4-Module-SAP-Applications-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_SERVER_APPLICATIONS: SLE-15-SP4-Module-Server-Applications-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_TRANSACTIONAL_SERVER: SLE-15-SP4-Module-Transactional-Server-POOL-x86_64-Build118.3-Media1 REPO_SLE_MODULE_WEB_SCRIPTING: SLE-15-SP4-Module-Web-Scripting-POOL-x86_64-Build118.3-Media1 REPO_SLE_PRODUCT_HA: SLE-15-SP4-Product-HA-POOL-x86_64-Build118.3-Media1 REPO_SLE_PRODUCT_HPC: SLE-15-SP4-Product-HPC-POOL-x86_64-Build118.3-Media1 REPO_SLE_PRODUCT_HPC_ESPOS: SLE-15-SP4-Product-HPC-ESPOS-POOL-x86_64-Build118.3-Media1 REPO_SLE_PRODUCT_HPC_LTSS: SLE-15-SP4-Product-HPC-LTSS-POOL-x86_64-Build118.3-Media1 REPO_SLE_PRODUCT_RT: SLE-15-SP4-Product-RT-POOL-x86_64-Build118.3-Media1 REPO_SLE_PRODUCT_SLED: SLE-15-SP4-Product-SLED-POOL-x86_64-Build118.3-Media1 REPO_SLE_PRODUCT_SLES: SLE-15-SP4-Product-SLES-POOL-x86_64-Build118.3-Media1 REPO_SLE_PRODUCT_SLES_SAP: SLE-15-SP4-Product-SLES_SAP-POOL-x86_64-Build118.3-Media1 REPO_SLE_PRODUCT_WE: SLE-15-SP4-Product-WE-POOL-x86_64-Build118.3-Media1 VERSION: 15-SP4 _DEPRIORITIZEBUILD: '1' attempts: 1 children: [] created: 2022-03-30T17:03:13.637306Z delayed: 2022-03-30T17:03:13.637306Z expires: 2022-03-30T17:13:13.637306Z finished: 2022-03-30T17:03:45.927807Z id: 4181330 lax: 0 notes: gru_id: 31295142 parents: [] priority: 10 queue: default result: error: | DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR: duplicate key value violates unique constraint "assets_type_name" DETAIL: Key (type, name)=(iso, SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso) already exists. [for Statement "INSERT INTO assets (name, t_created, t_updated, type) VALUES (?, ?, ?, ?) RETURNING id" with ParamValues: 1='SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso', 2='2022-03-30 17:03:37', 3='2022-03-30 17:03:37', 4='iso'] at /usr/share/openqa/script/../lib/OpenQA/Schema/ResultSet/Assets.pm line 33 retried: ~ retries: 0 started: 2022-03-30T17:03:37.057702Z state: finished task: schedule_iso time: 2022-03-31T08:01:41.391308Z worker: 685
#2
Updated by okurz 3 months ago
- Project changed from SUSE QA to openQA Project
- Subject changed from OSD is missing x86_64 jobs to OSD is missing x86_64 jobs duplicate key value violates unique constraint "assets_type_name" in lib/OpenQA/Schema/ResultSet/Assets.pm line 33 within find_or_create
- Category set to Concrete Bugs
- Priority changed from Normal to Immediate
- Target version set to Ready
#3
Updated by mkittler 3 months ago
I've been trying to re-schedule: https://openqa.suse.de/admin/productlog?id=891038
However, that doesn't seem to have worked. At least I couldn't find a corresponding job in the Minion dashboard.
#4
Updated by mkittler 3 months ago
Seems like that's the reason:
(This showed on https://openqa.suse.de/admin/productlog?id=889867 after a while when clicking the retry button. However, that re-triggering is broken is likely unrelated.)
#6
Updated by mkittler 3 months ago
- Status changed from New to In Progress
- Assignee set to mkittler
Despite the timeout the re-triggering actually worked: https://openqa.suse.de/admin/productlog?id=891038
Apparently we don't re-trigger asynchronously - at least that would explain the absence of a Minion job and the timeout.
The error is actually gone now and jobs have been scheduled. There are still errors regarding dependencies you might want to look into.
Was this the only problematic scheduled product? I suppose I'll have to check for further occurrences. Likely the problem is that the same ISO
is used in different scheduled products and the scheduling took place concurrently and DBIx's find_or_create
is not atomic.
#8
Updated by mkittler 3 months ago
The following scheduled products had the same error:
- https://openqa.suse.de/admin/productlog?id=890824
- https://openqa.suse.de/admin/productlog?id=890301
- https://openqa.suse.de/admin/productlog?id=890047
- https://openqa.suse.de/admin/productlog?id=889395
- https://openqa.suse.de/admin/productlog?id=888031
- https://openqa.suse.de/admin/productlog?id=883598
- https://openqa.suse.de/admin/productlog?id=883192
I amended the typo in the previous comment.
#9
Updated by mkittler 3 months ago
By the way, it isn't really a new issue: https://openqa.suse.de/admin/productlog?id=633835 https://openqa.suse.de/admin/productlog?id=517260 https://openqa.suse.de/admin/productlog?id=431568
#10
Updated by mkittler 3 months ago
- Related to action #35749: Ignore insert errors in limit_assets added
#12
Updated by okurz 3 months ago
- Priority changed from Immediate to High
I guess another problem is that the trigger script does not catch the error in the async output. When switching from synchronous to asynchronous triggering I suggested anikitin to await a successful confirmation in the asynchronous message but that was not done yet. I would appreciate if you can look into a solution for that as well. At best we avoid polling the openQA API to see if the scheduled product suceeded. I am sure we can find better ideas then that.
#13
Updated by mkittler 3 months ago
The fix has been deployed on 02.04.22 07:23 CEST on OSD. I'll use select id, t_created, t_updated as error from scheduled_products where results ->> 'error' like '%unique constraint%' order by id desc limit 10;
in a few days to check whether the problem is fixed. So far there's no further occurrence.
I'm afraid polling is the best we can do (without adding/changing existing routes in openQA and risking timeouts), e.g.:
error=$(openqa-cli api --osd --pretty isos/890824 | jq '.results | .error') [[ $error != null ]] && echo "handle error"
#14
Updated by mkittler 3 months ago
I did some digging and I suppose the relevant commit/repository/place is https://github.com/os-autoinst/openqa-trigger-from-obs/commit/1d575e09c3b6192196728fee50c983079786671a. Unfortunately that's a Bash-for-loop generated by a Bash script generated by Python code.
I suppose I could let it generate something like:
product_id=$(openqa-cli api -X post isos?async=1 … | jq '.scheduled_product_id') timeout=${timeout:-480} poll_interval=${poll_inverval:-10} response=$(openqa-cli api "isos/$product_id") status=$(echo "$response" | jq -r '.status') while [[ $status != scheduled ]]; then timeout=$((timeout - poll_interval)) if [[ $timeout -le 0 ]]; then echo "timeout exceeded when waiting for scheduled product $product_id" fi sleep "$poll_inverval" response=$(openqa-cli api --osd "isos/$product_id") status=$(echo "$response" | jq -r '.status') done error=$(echo "$response" | jq '.results | .error') if [[ $error != null ]]; then echo "unable to schedule product $product_id: $error" exit 1 fi
#15
Updated by mkittler 3 months ago
I tried something but it isn't working as expected. Not sure whether I should try harder. Considering all the escaping and no good way of testing it, it doesn't seem a worthwhile improvement. So I'd rather consider this ticket done if the error is fixed on the openQA-side.
#16
Updated by okurz 3 months ago
Then I suggest we discuss in the team what we can do. Because when you say https://github.com/os-autoinst/openqa-trigger-from-obs is in this area both broken by design and not fixable then we would have a bigger problem, right? :)
EDIT: Discussed in daily 2022-04-06 and we agreed that we should look into a new subcommand for openqa-cli which does the "trigger asynchronously & await results". Any 3rd party tooling including openqa-trigger-from-obs can use that new command then to report back when the scheduling yields errors.
#17
Updated by mkittler 3 months ago
Well, I'm at least saying that its design makes it hard to make the iso posting a bit more sophisticated. However, I suppose my second attempt would be creating a distinct script to do the API call (including the waiting) and only invoke that in the generated scripts. But I'm also not sure whether that would be the best way and how to ensure that the wrapper script is present in the context it is called.
#18
Updated by cdywan 3 months ago
mkittler wrote:
Well, I'm at least saying that its design makes it hard to make the iso posting a bit more sophisticated. However, I suppose my second attempt would be creating a distinct script to do the API call (including the waiting) and only invoke that in the generated scripts. But I'm also not sure whether that would be the best way and how to ensure that the wrapper script is present in the context it is called.
Discussed it briefly. We should have a follow-up ticket for a new cli command for this
#19
Updated by mkittler 3 months ago
- Related to action #109560: Add openqa-cli sub-command for async scheduling and keeping track of the result added
#21
Updated by ybonatakis 3 months ago
- Status changed from Feedback to Resolved
Resolved as there is no problem with the latest build.