Project

General

Profile

action #57143

[YAML] Editor does not check if same combination of test suite/arch/flavor/version already used in different job group

Added by asmorodskyi 10 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Concrete Bugs
Target version:
Start date:
2019-09-20
Due date:
% Done:

0%

Estimated time:
Difficulty:
Duration:

Description

In legacy Job Group Editor you were not able to add a second combination of the same test suite + arch + flavor + version. It was also not possible if the job group was different. In this case you've got an SQL error (see the related ticket).

In the YAML Job Group editor you can have the same 'test suite + arch + flavor + version' combination twice in different job groups. E.g. on OSD we currently have:

defaults:
  x86_64:
    machine: 64bit
    priority: 50
products:
  sle-15-SP2-Installer-DVD-x86_64:
    distri: sle
    flavor: Installer-DVD
    version: 15-SP2
scenarios:
  x86_64:
    sle-15-SP2-Installer-DVD-x86_64:
    - wicked_basic_sut
    - wicked_advanced_ref
    - wicked_advanced_sut
    - wicked_startandstop_sut
    - wicked_startandstop_ref
    - wicked_basic_ref

and

defaults:
  aarch64:
    machine: aarch64
    priority: 50
  x86_64:
    machine: 64bit
    priority: 50
products:
  sle-15-SP2-Installer-DVD-aarch64: &products
    distri: sle
    flavor: Installer-DVD
    version: 15-SP2
  sle-15-SP2-Installer-DVD-x86_64:
    *products
scenarios:
  aarch64:
    sle-15-SP2-Installer-DVD-aarch64: &tests
    - wicked_basic_sut: &general_settings
        settings:
          DESKTOP: textmode
          EXTRATEST: wicked
          KEEP_GRUB_TIMEOUT: '1'
          VIDEOMODE: text
          WICKED_TCPDUMP: '1'
          VIRTIO_CONSOLE_NUM: '2'
    - wicked_advanced_ref:
        *general_settings
    - wicked_advanced_sut:
        *general_settings
    - wicked_startandstop_sut:
        *general_settings
    - wicked_startandstop_ref:
        *general_settings
    - wicked_basic_ref:
        *general_settings
    - wicked_aggregate_sut:
        *general_settings
    - wicked_aggregate_ref:
        *general_settings
    - create_hdd_autoyast_wicked:
        settings:
          AUTOYAST: autoyast_sle15/autoyast_wicked_%ARCH%.xml
        priority: 45
  x86_64:
    sle-15-SP2-Installer-DVD-x86_64:
      *tests

in the job groups 117 (SLE 15 Development -> Network) and 262 (SLE 15 -> Network).

This leads to the job being scheduled twice when posting a new ISO. Considering that the job templates actually exist twice this is expected behavior. The question is whether we want to allow the same 'test suite + arch + flavor + version' combination in different job groups.

(Note that the ticket from asmorodskyi implies that the new editor should check this as the old editor did. I personally would leave this open for discussion.)


Related issues

Related to openQA Project - action #15192: [tools]DB exception popup while trying to add Test Suite with same name Resolved2016-12-01

History

#1 Updated by mkittler 10 months ago

  • Related to action #15192: [tools]DB exception popup while trying to add Test Suite with same name added

#2 Updated by mkittler 10 months ago

  • Subject changed from [YAML] Editor does not check if such combination of test suite/arch/flavor/version already in use to [YAML] Editor does not check if same combination of test suite/arch/flavor/version already used in different job group
  • Description updated (diff)

#3 Updated by cdywan 10 months ago

  • So we have add_unique_constraint([qw(product_id machine_id name test_suite_id)]) in the code right now.
  • When updating job templates we do $schema->resultset('JobTemplates')->find_or_create() which fails on non-unique combinations of the above within the same group.
  • There's no error from the database for different job groups

#4 Updated by cdywan 10 months ago

  • Status changed from New to In Progress
  • Assignee set to cdywan

I'm investigating this now. If the above doesn't make sense, please ignore. These were just quick notes to keep a record of the investigation.

#6 Updated by cdywan 10 months ago

  • Status changed from In Progress to Resolved

#7 Updated by cdywan 10 months ago

The fix I just merged enforces the correct checks for unique combinations across groups.

Note that no immediate changes will result from that when it's deployed, but the editor will require affected groups to be updated the next time they're modified.

#8 Updated by okurz 10 months ago

  • Category set to Concrete Bugs
  • Status changed from Resolved to Feedback

It seems on o3 the migration did not show any problems – at least I am not aware of not seen any – but on OSD we had:

Sep 27 07:34:32 openqa systemd[1]: Started The openQA web UI.
Sep 27 07:34:35 openqa openqa[27056]: failed to run SQL in /usr/share/openqa/script/../dbicdh/PostgreSQL/upgrade/81-82/001-update.sql: DBIx::Class::DeploymentHandler::DeployMethod::SQL::Translator::try {...} (): DBI Exceptio>
Sep 27 07:34:35 openqa openqa[27056]: DETAIL:  Key (product_id, machine_id, name, test_suite_id)=(339, 60, , 1652) already exists. at inline delegation in DBIx::Class::DeploymentHandler for deploy_method->upgrade_single_step>
Sep 27 07:34:35 openqa openqa[27056]:  (running line 'update job_templates set name='' where name is null') at /usr/lib/perl5/vendor_perl/5.26.1/DBIx/Class/DeploymentHandler/DeployMethod/SQL/Translator.pm line 248.
Sep 27 07:34:35 openqa openqa[27056]: DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error. Rolling back. at /usr/share/openqa/script/openqa>
Sep 27 07:34:35 openqa systemd[1]: openqa-webui.service: Main process exited, code=exited, status=255/n/a
Sep 27 07:34:35 openqa systemd[1]: openqa-webui.service: Unit entered failed state.
Sep 27 07:34:35 openqa systemd[1]: openqa-webui.service: Failed with result 'exit-code'.
Sep 27 07:35:32 openqa systemd[1]: Started The openQA web UI.
Sep 27 07:35:33 openqa systemd[1]: Stopping The openQA web UI...
Sep 27 07:35:33 openqa systemd[1]: Stopped The openQA web UI.
Sep 27 07:35:33 openqa systemd[1]: Started The openQA web UI.
Sep 27 07:35:35 openqa openqa[27391]: failed to run SQL in /usr/share/openqa/script/../dbicdh/PostgreSQL/upgrade/81-82/002-auto.sql: DBIx::Class::DeploymentHandler::DeployMethod::SQL::Translator::try {...} (): DBI Exception:>
Sep 27 07:35:35 openqa openqa[27391]:  (running line 'ALTER TABLE job_templates ALTER COLUMN name SET NOT NULL') at /usr/lib/perl5/vendor_perl/5.26.1/DBIx/Class/DeploymentHandler/DeployMethod/SQL/Translator.pm line 248.
Sep 27 07:35:35 openqa openqa[27391]: DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error. Rolling back. at /usr/share/openqa/script/openqa>
Sep 27 07:35:35 openqa openqa[27391]: DBIx::Class::Storage::TxnScopeGuard::DESTROY(): A DBIx::Class::Storage::TxnScopeGuard went out of scope without explicit commit or error. Rolling back. at /usr/share/openqa/script/openqa>
Sep 27 07:35:35 openqa systemd[1]: openqa-webui.service: Main process exited, code=exited, status=255/n/a

I manually disabled the two files to bring the web UI up again. Please provide a fix also for the running instance on osd.

#9 Updated by coolo 10 months ago

  • Assignee changed from cdywan to okurz

there is no code fix to be provided - the admin of the site will have to sort out the duplicates and decide which one wins. And as you decided to deploy on a friday morning, that would be you

#10 Updated by coolo 10 months ago

On a friday morning you're reportedly on vacation - I forgot to add

#11 Updated by okurz 10 months ago

half-day vacation, same as for the whole week as well as next week. Thank you for encouraging the team to take more responsibility ;)

#12 Updated by okurz 10 months ago

  • Status changed from Feedback to Resolved

coolo wrote:

On a friday morning you're reportedly on vacation - I forgot to add

half-day vacation, same as for the whole week as well as next week. Thank you for encouraging the team to take more responsibility ;)

Note that neither the ticket nor the PR state that the final version did not include an automatic remedy. Also I would have expected that the openQA instance would have been updated accordingly upfront.

I checked the database manually and prevented duplicate job templates in job groups, mainly some lvm, cryptlvm scenarios which were defined both in the production YaST job groups as well as test development so I simply deleted them from the test development job groups. Then I called the content of the migration scripts again to apply the changes.

#13 Updated by mkittler 10 months ago

  • Category deleted (Concrete Bugs)
  • Status changed from Resolved to In Progress
  • Assignee changed from okurz to cdywan

Note that neither the ticket nor the PR state that the final version did not include an automatic remedy.

okurz I mentioned the problem: "I'm wondering whether it is worth/required to add a migration for detecting job templates which are wrongly "shared" between job groups." (https://github.com/os-autoinst/openQA/pull/2345#pullrequestreview-292266649-body-html)

I checked the database manually and prevented duplicate job templates in job groups, mainly some lvm, cryptlvm scenarios which were defined both in the production YaST job groups as well as test development so I simply deleted them from the test development job groups. Then I called the content of the migration scripts again to apply the changes.

Thanks. I guess then we can save the work of implementing an automatic database migration for this.


This migration should have been tested with recent OSD data (like almost every migration).

#14 Updated by okurz 10 months ago

  • Category set to Concrete Bugs
  • Status changed from In Progress to Resolved
  • Target version set to Current Sprint

mkittler I think you have reopened the ticket by mistake because you even removed the category.

Also available in: Atom PDF