Project

General

Profile

Actions

action #109292

closed

OSD is missing x86_64 jobs duplicate key value violates unique constraint "assets_type_name" in lib/OpenQA/Schema/ResultSet/Assets.pm line 33 within find_or_create

Added by ybonatakis almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-03-31
Due date:
% Done:

0%

Estimated time:

Description

With the last two (117.1,118.3) or three builds x86_64 jobs are missing.

The very first time there was a dependency circle issue with one of the job group yaml. That found to prevent the scheduling. However the jobs are keep missing even after the correction and the scheduling looks to work without problem after manual intervention.


Files

screenshot_20220401_103125.png (49.8 KB) screenshot_20220401_103125.png mkittler, 2022-04-01 08:32

Related issues 2 (1 open1 closed)

Related to openQA Project (public) - action #35749: Ignore insert errors in limit_assetsResolvedmkittler2018-05-02

Actions
Related to openQA Project (public) - action #109560: Add openqa-cli sub-command for async scheduling and keeping track of the resultNew2022-04-06

Actions
Actions #1

Updated by JERiveraMoya almost 3 years ago

https://openqa.suse.de/minion/jobs?id=4181330

4181330 schedule_iso    default 15 hours ago    
finished    a few seconds   
---
args:
- scheduled_product_id: 889867
  scheduling_params:
    ARCH: x86_64
    ASSET_256: SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso.sha256
    BUILD: '118.3'
    BUILD_HA: '118.3'
    BUILD_SDK: '118.3'
    BUILD_SES: '118.3'
    BUILD_SLE: '118.3'
    CHECKSUM_ISO: 000f3eef757f334ff367ebd7bd715816be97f4e206ca96cc69dc2791e42c8748
    DISTRI: SLE
    FLAVOR: Online
    ISO: SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso
    MIRROR_FTP: ftp://openqa.suse.de/SLE-15-SP4-Online-x86_64-Build118.3-Media1
    MIRROR_HTTP: http://openqa.suse.de/assets/repo/SLE-15-SP4-Online-x86_64-Build118.3-Media1
    MIRROR_HTTPS: https://openqa.suse.de/assets/repo/SLE-15-SP4-Online-x86_64-Build118.3-Media1
    MIRROR_NFS: nfs://openqa.suse.de/var/lib/openqa/share/factory/repo/SLE-15-SP4-Online-x86_64-Build118.3-Media1
    MIRROR_SMB: smb://openqa.suse.de/inst/SLE-15-SP4-Online-x86_64-Build118.3-Media1
    REPO_0: SLE-15-SP4-Online-x86_64-Build118.3-Media1
    REPO_10: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media2
    REPO_11: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media3
    REPO_12: SLE-15-SP4-Module-Desktop-Applications-POOL-x86_64-Build118.3-Media1
    REPO_13: SLE-15-SP4-Module-Development-Tools-POOL-x86_64-Build118.3-Media1
    REPO_14: SLE-15-SP4-Module-Legacy-POOL-x86_64-Build118.3-Media1
    REPO_15: SLE-15-SP4-Module-SAP-Applications-POOL-x86_64-Build118.3-Media1
    REPO_16: SLE-15-SP4-Module-Server-Applications-POOL-x86_64-Build118.3-Media1
    REPO_17: SLE-15-SP4-Module-Public-Cloud-POOL-x86_64-Build118.3-Media1
    REPO_18: SLE-15-SP4-Module-Web-Scripting-POOL-x86_64-Build118.3-Media1
    REPO_19: SLE-15-SP4-Module-Containers-POOL-x86_64-Build118.3-Media1
    REPO_20: SLE-15-SP4-Module-Live-Patching-POOL-x86_64-Build118.3-Media1
    REPO_21: SLE-15-SP4-Module-Transactional-Server-POOL-x86_64-Build118.3-Media1
    REPO_22: SLE-15-SP4-Module-Python3-POOL-x86_64-Build118.3-Media1
    REPO_23: SLE-15-SP4-Module-Packagehub-Subpackages-POOL-x86_64-Build118.3-Media1
    REPO_24: SLE-15-SP4-Module-HPC-POOL-x86_64-Build118.3-Media1
    REPO_25: SLE-15-SP4-Module-RT-POOL-x86_64-Build118.3-Media1
    REPO_26: SLE-15-SP4-Module-Certifications-POOL-x86_64-Build118.3-Media1
    REPO_27: SLE-15-SP4-Product-SLES-POOL-x86_64-Build118.3-Media1
    REPO_28: SLE-15-SP4-Product-SLES-POOL-x86_64-Build118.3-Media1.license
    REPO_29: SLE-15-SP4-Product-SLED-POOL-x86_64-Build118.3-Media1
    REPO_30: SLE-15-SP4-Product-SLED-POOL-x86_64-Build118.3-Media1.license
    REPO_31: SLE-15-SP4-Product-SLES_SAP-POOL-x86_64-Build118.3-Media1
    REPO_32: SLE-15-SP4-Product-SLES_SAP-POOL-x86_64-Build118.3-Media1.license
    REPO_33: SLE-15-SP4-Product-HPC-POOL-x86_64-Build118.3-Media1
    REPO_34: SLE-15-SP4-Product-HPC-POOL-x86_64-Build118.3-Media1.license
    REPO_35: SLE-15-SP4-Product-HPC-LTSS-POOL-x86_64-Build118.3-Media1
    REPO_36: SLE-15-SP4-Product-HPC-ESPOS-POOL-x86_64-Build118.3-Media1
    REPO_37: SLE-15-SP4-Product-RT-POOL-x86_64-Build118.3-Media1
    REPO_38: SLE-15-SP4-Product-RT-POOL-x86_64-Build118.3-Media1.license
    REPO_39: SLE-15-SP4-Product-WE-POOL-x86_64-Build118.3-Media1
    REPO_40: SLE-15-SP4-Product-WE-POOL-x86_64-Build118.3-Media1.license
    REPO_41: SLE-15-SP4-Product-HA-POOL-x86_64-Build118.3-Media1
    REPO_42: SLE-15-SP4-Product-HA-POOL-x86_64-Build118.3-Media1.license
    REPO_9: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_BASESYSTEM: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_BASESYSTEM_DEBUG: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media3
    REPO_SLE_MODULE_BASESYSTEM_DEBUG_PACKAGES: coreutils*,kernel-default*,selinux*,yast2-network*,yast2-http-server*
    REPO_SLE_MODULE_BASESYSTEM_SOURCE: SLE-15-SP4-Module-Basesystem-POOL-x86_64-Build118.3-Media2
    REPO_SLE_MODULE_BASESYSTEM_SOURCE_PACKAGES: java*,kernel-default*,selinux*,yast2-network*,yast2-http-server*
    REPO_SLE_MODULE_CERTIFICATIONS: SLE-15-SP4-Module-Certifications-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_CONTAINERS: SLE-15-SP4-Module-Containers-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_DESKTOP_APPLICATIONS: SLE-15-SP4-Module-Desktop-Applications-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_DEVELOPMENT_TOOLS: SLE-15-SP4-Module-Development-Tools-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_HPC: SLE-15-SP4-Module-HPC-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_LEGACY: SLE-15-SP4-Module-Legacy-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_LIVE_PATCHING: SLE-15-SP4-Module-Live-Patching-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_PACKAGEHUB_SUBPACKAGES: SLE-15-SP4-Module-Packagehub-Subpackages-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_PUBLIC_CLOUD: SLE-15-SP4-Module-Public-Cloud-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_PYTHON3: SLE-15-SP4-Module-Python3-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_RT: SLE-15-SP4-Module-RT-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_SAP_APPLICATIONS: SLE-15-SP4-Module-SAP-Applications-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_SERVER_APPLICATIONS: SLE-15-SP4-Module-Server-Applications-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_TRANSACTIONAL_SERVER: SLE-15-SP4-Module-Transactional-Server-POOL-x86_64-Build118.3-Media1
    REPO_SLE_MODULE_WEB_SCRIPTING: SLE-15-SP4-Module-Web-Scripting-POOL-x86_64-Build118.3-Media1
    REPO_SLE_PRODUCT_HA: SLE-15-SP4-Product-HA-POOL-x86_64-Build118.3-Media1
    REPO_SLE_PRODUCT_HPC: SLE-15-SP4-Product-HPC-POOL-x86_64-Build118.3-Media1
    REPO_SLE_PRODUCT_HPC_ESPOS: SLE-15-SP4-Product-HPC-ESPOS-POOL-x86_64-Build118.3-Media1
    REPO_SLE_PRODUCT_HPC_LTSS: SLE-15-SP4-Product-HPC-LTSS-POOL-x86_64-Build118.3-Media1
    REPO_SLE_PRODUCT_RT: SLE-15-SP4-Product-RT-POOL-x86_64-Build118.3-Media1
    REPO_SLE_PRODUCT_SLED: SLE-15-SP4-Product-SLED-POOL-x86_64-Build118.3-Media1
    REPO_SLE_PRODUCT_SLES: SLE-15-SP4-Product-SLES-POOL-x86_64-Build118.3-Media1
    REPO_SLE_PRODUCT_SLES_SAP: SLE-15-SP4-Product-SLES_SAP-POOL-x86_64-Build118.3-Media1
    REPO_SLE_PRODUCT_WE: SLE-15-SP4-Product-WE-POOL-x86_64-Build118.3-Media1
    VERSION: 15-SP4
    _DEPRIORITIZEBUILD: '1'
attempts: 1
children: []
created: 2022-03-30T17:03:13.637306Z
delayed: 2022-03-30T17:03:13.637306Z
expires: 2022-03-30T17:13:13.637306Z
finished: 2022-03-30T17:03:45.927807Z
id: 4181330
lax: 0
notes:
  gru_id: 31295142
parents: []
priority: 10
queue: default
result:
  error: |
    DBIx::Class::Storage::DBI::_dbh_execute(): DBI Exception: DBD::Pg::st execute failed: ERROR:  duplicate key value violates unique constraint "assets_type_name"
    DETAIL:  Key (type, name)=(iso, SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso) already exists. [for Statement "INSERT INTO assets (name, t_created, t_updated, type) VALUES (?, ?, ?, ?) RETURNING id" with ParamValues: 1='SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso', 2='2022-03-30 17:03:37', 3='2022-03-30 17:03:37', 4='iso'] at /usr/share/openqa/script/../lib/OpenQA/Schema/ResultSet/Assets.pm line 33
retried: ~
retries: 0
started: 2022-03-30T17:03:37.057702Z
state: finished
task: schedule_iso
time: 2022-03-31T08:01:41.391308Z
worker: 685
Actions #2

Updated by okurz over 2 years ago

  • Project changed from 46 to openQA Project (public)
  • Subject changed from OSD is missing x86_64 jobs to OSD is missing x86_64 jobs duplicate key value violates unique constraint "assets_type_name" in lib/OpenQA/Schema/ResultSet/Assets.pm line 33 within find_or_create
  • Category set to Regressions/Crashes
  • Priority changed from Normal to Immediate
  • Target version set to Ready
Actions #3

Updated by mkittler over 2 years ago

I've been trying to re-schedule: https://openqa.suse.de/admin/productlog?id=891038

However, that doesn't seem to have worked. At least I couldn't find a corresponding job in the Minion dashboard.

Actions #4

Updated by mkittler over 2 years ago

Seems like that's the reason:

(This showed on https://openqa.suse.de/admin/productlog?id=889867 after a while when clicking the retry button. However, that re-triggering is broken is likely unrelated.)

Actions #5

Updated by mkittler over 2 years ago

I cannot reproduce the issue with sudo -u geekotest /usr/share/openqa/script/openqa eval -V 'app->schema->resultset("Assets")->register("iso", "SLE-15-SP4-Online-x86_64-Build118.3-Media1.iso", {missing_ok => 1})->id' on OSD (because that returns the expected ID).

Actions #6

Updated by mkittler over 2 years ago

  • Status changed from New to In Progress
  • Assignee set to mkittler

Despite the timeout the re-triggering actually worked: https://openqa.suse.de/admin/productlog?id=891038

Apparently we don't re-trigger asynchronously - at least that would explain the absence of a Minion job and the timeout.

The error is actually gone now and jobs have been scheduled. There are still errors regarding dependencies you might want to look into.

Was this the only problematic scheduled product? I suppose I'll have to check for further occurrences. Likely the problem is that the same ISO is used in different scheduled products and the scheduling took place concurrently and DBIx's find_or_create is not atomic.

Actions #7

Updated by okurz over 2 years ago

mkittler wrote:

Apparently we don't re-trigger is async

english?

Actions #10

Updated by mkittler over 2 years ago

  • Related to action #35749: Ignore insert errors in limit_assets added
Actions #11

Updated by mkittler over 2 years ago

  • Status changed from In Progress to Feedback
Actions #12

Updated by okurz over 2 years ago

  • Priority changed from Immediate to High

deployed on OSD

I guess another problem is that the trigger script does not catch the error in the async output. When switching from synchronous to asynchronous triggering I suggested anikitin to await a successful confirmation in the asynchronous message but that was not done yet. I would appreciate if you can look into a solution for that as well. At best we avoid polling the openQA API to see if the scheduled product suceeded. I am sure we can find better ideas then that.

Actions #13

Updated by mkittler over 2 years ago

The fix has been deployed on 02.04.22 07:23 CEST on OSD. I'll use select id, t_created, t_updated as error from scheduled_products where results ->> 'error' like '%unique constraint%' order by id desc limit 10; in a few days to check whether the problem is fixed. So far there's no further occurrence.

I'm afraid polling is the best we can do (without adding/changing existing routes in openQA and risking timeouts), e.g.:

error=$(openqa-cli api --osd --pretty isos/890824 | jq '.results | .error')
[[ $error != null ]] && echo "handle error"
Actions #14

Updated by mkittler over 2 years ago

I did some digging and I suppose the relevant commit/repository/place is https://github.com/os-autoinst/openqa-trigger-from-obs/commit/1d575e09c3b6192196728fee50c983079786671a. Unfortunately that's a Bash-for-loop generated by a Bash script generated by Python code.

I suppose I could let it generate something like:

product_id=$(openqa-cli api -X post isos?async=1 … | jq '.scheduled_product_id')
timeout=${timeout:-480}
poll_interval=${poll_inverval:-10}
response=$(openqa-cli api "isos/$product_id")
status=$(echo "$response" | jq -r '.status')
while [[ $status != scheduled ]]; then
  timeout=$((timeout - poll_interval))
  if [[ $timeout -le 0 ]]; then
    echo "timeout exceeded when waiting for scheduled product $product_id"
  fi
  sleep "$poll_inverval"
  response=$(openqa-cli api --osd "isos/$product_id")
  status=$(echo "$response" | jq -r '.status')
done
error=$(echo "$response" | jq '.results | .error')
if [[ $error != null ]]; then
  echo "unable to schedule product $product_id: $error"
  exit 1
fi
Actions #15

Updated by mkittler over 2 years ago

I tried something but it isn't working as expected. Not sure whether I should try harder. Considering all the escaping and no good way of testing it, it doesn't seem a worthwhile improvement. So I'd rather consider this ticket done if the error is fixed on the openQA-side.

Actions #16

Updated by okurz over 2 years ago

Then I suggest we discuss in the team what we can do. Because when you say https://github.com/os-autoinst/openqa-trigger-from-obs is in this area both broken by design and not fixable then we would have a bigger problem, right? :)

EDIT: Discussed in daily 2022-04-06 and we agreed that we should look into a new subcommand for openqa-cli which does the "trigger asynchronously & await results". Any 3rd party tooling including openqa-trigger-from-obs can use that new command then to report back when the scheduling yields errors.

Actions #17

Updated by mkittler over 2 years ago

Well, I'm at least saying that its design makes it hard to make the iso posting a bit more sophisticated. However, I suppose my second attempt would be creating a distinct script to do the API call (including the waiting) and only invoke that in the generated scripts. But I'm also not sure whether that would be the best way and how to ensure that the wrapper script is present in the context it is called.

Actions #18

Updated by livdywan over 2 years ago

mkittler wrote:

Well, I'm at least saying that its design makes it hard to make the iso posting a bit more sophisticated. However, I suppose my second attempt would be creating a distinct script to do the API call (including the waiting) and only invoke that in the generated scripts. But I'm also not sure whether that would be the best way and how to ensure that the wrapper script is present in the context it is called.

Discussed it briefly. We should have a follow-up ticket for a new cli command for this

Actions #19

Updated by mkittler over 2 years ago

  • Related to action #109560: Add openqa-cli sub-command for async scheduling and keeping track of the result added
Actions #20

Updated by mkittler over 2 years ago

I've created the ticket and so far select id, t_created, t_updated as error from scheduled_products where t_created >= '2022-04-02' and results ->> 'error' like '%unique constraint%' order by id desc limit 10; doesn't return any results.

Actions #21

Updated by ybonatakis over 2 years ago

  • Status changed from Feedback to Resolved

Resolved as there is no problem with the latest build.

Actions

Also available in: Atom PDF