Project

General

Profile

Actions

action #120939

closed

[alert] Pipeline for scheduling incidents runs into timeout size:M

Added by mkittler over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2022-11-24
Due date:
2022-12-13
% Done:

0%

Estimated time:
Tags:

Description

Observation

It has already happened two times. The logs look like this:

…
INFO: Triggering {'api': 'api/incident_settings', 'qem': {'incident': 26739, 'arch': 'x86_64', 'flavor': 'Azure-SAP-BYOS-Incidents-saptune', 'version': '15-SP1', 'withAggregate': False, 'settings': {'DISTRI': 'sle', 'VERSION': '15-SP1', 'ARCH': 'x86_64', 'FLAVOR': 'Azure-SAP-BYOS-Incidents-saptune', '_ONLY_OBSOLETE_SAME_BUILD': '1', '_OBSOLETE': '1', 'INCIDENT_ID': 26739, '__CI_JOB_URL': 'https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1255284', 'BUILD': ':26739:nodejs10', 'RRID': 'SUSE:Maintenance:26739:284015', 'REPOHASH': 1667984242, 'OS_TEST_ISSUES': '26739', 'INCIDENT_REPO': 'http://download.suse.de/ibs/SUSE:/Maintenance:/26739/SUSE_Updates_SLE-Product-SLES_SAP_15-SP1_x86_64', '_PRIORITY': 60, '__SMELT_INCIDENT_URL': 'https://smelt.suse.de/incident/26739', '__DASHBOARD_INCIDENT_URL': 'https://dashboard.qam.suse.de/incident/26739'}}, 'openqa': {'DISTRI': 'sle', 'VERSION': '15-SP1', 'ARCH': 'x86_64', 'FLAVOR': 'Azure-SAP-BYOS-Incidents-saptune', '_ONLY_OBSOLETE_SAME_BUILD': '1', '_OBSOLETE': '1', 'INCIDENT_ID': 26739, '__CI_JOB_URL': 'https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1255284', 'BUILD': ':26739:nodejs10', 'RRID': 'SUSE:Maintenance:26739:284015', 'REPOHASH': 1667984242, 'OS_TEST_ISSUES': '26739', 'INCIDENT_REPO': 'http://download.suse.de/ibs/SUSE:/Maintenance:/26739/SUSE_Updates_SLE-Product-SLES_SAP_15-SP1_x86_64', '_PRIORITY': 60, '__SMELT_INCIDENT_URL': 'https://smelt.suse.de/incident/26739', '__DASHBOARD_INCIDENT_URL': 'https://dashboard.qam.suse.de/incident/26739'}}
INFO: openqa-cli api --host https://openqa.suse.de -X post isos DISTRI=sle VERSION=15-SP1 ARCH=x86_64 FLAVOR=Azure-SAP-BYOS-Incidents-saptune _ONLY_OBSOLETE_SAME_BUILD=1 _OBSOLETE=1 INCIDENT_ID=26739 __CI_JOB_URL=https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1255284 BUILD=:26739:nodejs10 RRID=SUSE:Maintenance:26739:284015 REPOHASH=1667984242 OS_TEST_ISSUES=26739 INCIDENT_REPO=http://download.suse.de/ibs/SUSE:/Maintenance:/26739/SUSE_Updates_SLE-Product-SLES_SAP_15-SP1_x86_64 _PRIORITY=60 __SMELT_INCIDENT_URL=https://smelt.suse.de/incident/26739 __DASHBOARD_INCIDENT_URL=https://dashboard.qam.suse.de/incident/26739
ERROR: Job failed: execution took longer than 1h0m0s seconds

(see https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1255284)

Acceptance criteria

  • AC1: Pipeline does not fail anymore

Rollback steps

Suggestions

  • The issue is still occurring
  • Output the response from openQA if we get a 404 from isos post
  • Track down what's causing the delay e.g. OBS and maybe less likely openQA which returns the 404
  • Ensure we can see timestamps in the logs

Related issues 1 (0 open1 closed)

Related to QA - action #107923: qem-bot: Ignore not-ok openQA jobs for specific incident based on openQA job comment size:MResolvedjbaier_cz

Actions
Actions

Also available in: Atom PDF