Project

General

Profile

Actions

action #57143

closed

[YAML] Editor does not check if same combination of test suite/arch/flavor/version already used in different job group

Added by asmorodskyi over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2019-09-20
Due date:
% Done:

0%

Estimated time:

Description

In legacy Job Group Editor you were not able to add a second combination of the same test suite + arch + flavor + version. It was also not possible if the job group was different. In this case you've got an SQL error (see the related ticket).

In the YAML Job Group editor you can have the same 'test suite + arch + flavor + version' combination twice in different job groups. E.g. on OSD we currently have:

defaults:
  x86_64:
    machine: 64bit
    priority: 50
products:
  sle-15-SP2-Installer-DVD-x86_64:
    distri: sle
    flavor: Installer-DVD
    version: 15-SP2
scenarios:
  x86_64:
    sle-15-SP2-Installer-DVD-x86_64:
    - wicked_basic_sut
    - wicked_advanced_ref
    - wicked_advanced_sut
    - wicked_startandstop_sut
    - wicked_startandstop_ref
    - wicked_basic_ref

and

defaults:
  aarch64:
    machine: aarch64
    priority: 50
  x86_64:
    machine: 64bit
    priority: 50
products:
  sle-15-SP2-Installer-DVD-aarch64: &products
    distri: sle
    flavor: Installer-DVD
    version: 15-SP2
  sle-15-SP2-Installer-DVD-x86_64:
    *products
scenarios:
  aarch64:
    sle-15-SP2-Installer-DVD-aarch64: &tests
    - wicked_basic_sut: &general_settings
        settings:
          DESKTOP: textmode
          EXTRATEST: wicked
          KEEP_GRUB_TIMEOUT: '1'
          VIDEOMODE: text
          WICKED_TCPDUMP: '1'
          VIRTIO_CONSOLE_NUM: '2'
    - wicked_advanced_ref:
        *general_settings
    - wicked_advanced_sut:
        *general_settings
    - wicked_startandstop_sut:
        *general_settings
    - wicked_startandstop_ref:
        *general_settings
    - wicked_basic_ref:
        *general_settings
    - wicked_aggregate_sut:
        *general_settings
    - wicked_aggregate_ref:
        *general_settings
    - create_hdd_autoyast_wicked:
        settings:
          AUTOYAST: autoyast_sle15/autoyast_wicked_%ARCH%.xml
        priority: 45
  x86_64:
    sle-15-SP2-Installer-DVD-x86_64:
      *tests

in the job groups 117 (SLE 15 Development -> Network) and 262 (SLE 15 -> Network).

This leads to the job being scheduled twice when posting a new ISO. Considering that the job templates actually exist twice this is expected behavior. The question is whether we want to allow the same 'test suite + arch + flavor + version' combination in different job groups.

(Note that the ticket from @asmorodskyi implies that the new editor should check this as the old editor did. I personally would leave this open for discussion.)


Related issues 1 (0 open1 closed)

Related to openQA Project - action #15192: [tools]DB exception popup while trying to add Test Suite with same name Resolvedmkittler2016-12-01

Actions
Actions #1

Updated by mkittler over 4 years ago

  • Related to action #15192: [tools]DB exception popup while trying to add Test Suite with same name added
Actions #2

Updated by mkittler over 4 years ago

  • Subject changed from [YAML] Editor does not check if such combination of test suite/arch/flavor/version already in use to [YAML] Editor does not check if same combination of test suite/arch/flavor/version already used in different job group
  • Description updated (diff)
Actions #3

Updated by livdywan over 4 years ago

  • So we have add_unique_constraint([qw(product_id machine_id name test_suite_id)]) in the code right now.
  • When updating job templates we do $schema->resultset('JobTemplates')->find_or_create() which fails on non-unique combinations of the above within the same group.
  • There's no error from the database for different job groups
Actions #4

Updated by livdywan over 4 years ago

  • Status changed from New to In Progress
  • Assignee set to livdywan

I'm investigating this now. If the above doesn't make sense, please ignore. These were just quick notes to keep a record of the investigation.

Actions #6

Updated by livdywan over 4 years ago

  • Status changed from In Progress to Resolved
Actions #7

Updated by livdywan over 4 years ago

The fix I just merged enforces the correct checks for unique combinations across groups.

Note that no immediate changes will result from that when it's deployed, but the editor will require affected groups to be updated the next time they're modified.

Actions #8

Updated by okurz over 4 years ago

  • Category set to Regressions/Crashes
  • Status changed from Resolved to Feedback

It seems on o3 the migration did not show any problems – at least I am not aware of not seen any – but on OSD we had:

Sep 27 07:34:32 openqa systemd[1]: Started The openQA web UI.
Sep 27 07:34:35 openqa openqa[27056]: failed to run SQL in /usr/share/openqa/script/../dbicdh/PostgreSQL/upgrade/81-82/001-update.sql: DBIx::Class::DeploymentHandler::DeployMethod::SQL::Translator::try {...} (): DBI Exceptio>
Sep 27 07:34:35 openqa openqa[27056]: DETAIL:  Key (product_id, machine_id, name, test_suite_id)=(339, 60, , 1652) already exists. at inline delegation in DBIx::Class::DeploymentHandler for deploy_method->upgrade_single_step>
Sep 27 07:34:35 openqa openqa[27056]:  (running line 'update job_templates set name='' where name is null') at /usr/lib/perl5/vendor_perl/5.26.1/DBIx/Class/DeploymentHandler/DeployMethod/SQL/Translator.pm line 248.
Sep 27 07:34:35 openqa openqa[27056]: DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error. Rolling back. at /usr/share/openqa/script/openqa>
Sep 27 07:34:35 openqa systemd[1]: openqa-webui.service: Main process exited, code=exited, status=255/n/a
Sep 27 07:34:35 openqa systemd[1]: openqa-webui.service: Unit entered failed state.
Sep 27 07:34:35 openqa systemd[1]: openqa-webui.service: Failed with result 'exit-code'.
Sep 27 07:35:32 openqa systemd[1]: Started The openQA web UI.
Sep 27 07:35:33 openqa systemd[1]: Stopping The openQA web UI...
Sep 27 07:35:33 openqa systemd[1]: Stopped The openQA web UI.
Sep 27 07:35:33 openqa systemd[1]: Started The openQA web UI.
Sep 27 07:35:35 openqa openqa[27391]: failed to run SQL in /usr/share/openqa/script/../dbicdh/PostgreSQL/upgrade/81-82/002-auto.sql: DBIx::Class::DeploymentHandler::DeployMethod::SQL::Translator::try {...} (): DBI Exception:>
Sep 27 07:35:35 openqa openqa[27391]:  (running line 'ALTER TABLE job_templates ALTER COLUMN name SET NOT NULL') at /usr/lib/perl5/vendor_perl/5.26.1/DBIx/Class/DeploymentHandler/DeployMethod/SQL/Translator.pm line 248.
Sep 27 07:35:35 openqa openqa[27391]: DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error. Rolling back. at /usr/share/openqa/script/openqa>
Sep 27 07:35:35 openqa openqa[27391]: DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error. Rolling back. at /usr/share/openqa/script/openqa>
Sep 27 07:35:35 openqa systemd[1]: openqa-webui.service: Main process exited, code=exited, status=255/n/a

I manually disabled the two files to bring the web UI up again. Please provide a fix also for the running instance on osd.

Actions #9

Updated by coolo over 4 years ago

  • Assignee changed from livdywan to okurz

there is no code fix to be provided - the admin of the site will have to sort out the duplicates and decide which one wins. And as you decided to deploy on a friday morning, that would be you

Actions #10

Updated by coolo over 4 years ago

On a friday morning you're reportedly on vacation - I forgot to add

Actions #11

Updated by okurz over 4 years ago

half-day vacation, same as for the whole week as well as next week. Thank you for encouraging the team to take more responsibility ;)

Actions #12

Updated by okurz over 4 years ago

  • Status changed from Feedback to Resolved

coolo wrote:

On a friday morning you're reportedly on vacation - I forgot to add

half-day vacation, same as for the whole week as well as next week. Thank you for encouraging the team to take more responsibility ;)

Note that neither the ticket nor the PR state that the final version did not include an automatic remedy. Also I would have expected that the openQA instance would have been updated accordingly upfront.

I checked the database manually and prevented duplicate job templates in job groups, mainly some lvm, cryptlvm scenarios which were defined both in the production YaST job groups as well as test development so I simply deleted them from the test development job groups. Then I called the content of the migration scripts again to apply the changes.

Actions #13

Updated by mkittler over 4 years ago

  • Category deleted (Regressions/Crashes)
  • Status changed from Resolved to In Progress
  • Assignee changed from okurz to livdywan

Note that neither the ticket nor the PR state that the final version did not include an automatic remedy.

@okurz I mentioned the problem: "I'm wondering whether it is worth/required to add a migration for detecting job templates which are wrongly "shared" between job groups." (https://github.com/os-autoinst/openQA/pull/2345#pullrequestreview-292266649-body-html)

I checked the database manually and prevented duplicate job templates in job groups, mainly some lvm, cryptlvm scenarios which were defined both in the production YaST job groups as well as test development so I simply deleted them from the test development job groups. Then I called the content of the migration scripts again to apply the changes.

Thanks. I guess then we can save the work of implementing an automatic database migration for this.


This migration should have been tested with recent OSD data (like almost every migration).

Actions #14

Updated by okurz over 4 years ago

  • Category set to Regressions/Crashes
  • Status changed from In Progress to Resolved
  • Target version set to Current Sprint

@mkittler I think you have reopened the ticket by mistake because you even removed the category.

Actions

Also available in: Atom PDF