Project

General

Profile

Actions

action #133457

closed

salt-states-openqa gitlab CI pipeline aborted with error after 2h of execution size:M

Added by livdywan over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1714239

  Name: /etc/systemd/system/auto-update.service - Function: file.managed - Result: Clean - Started: 21:29:26.689214 - Duration: 359.255 ms
  Name: service.systemctl_reload - Function: module.run - Result: Clean - Started: 21:29:27.053802 - Duration: 0.018 ms
  Name: auto-upgrade.service - Function: service.dead - Result: Clean - Started: 21:29:27.054218 - Duration: 61.444 ms
  Name: auto-upgrade.timer - Function: service.dead - Result: Clean - Started: 21:29:27.116368 - Duration: 82.058 ms
  Name: auto-update.timer - Function: service.running - Result: Clean - Started: 21:29:27.203488 - Duration: 255.774 ms
Summary for openqa.suse.de
--------------
Succeeded: 345 (changed=30)
Failed:      0
--------------
Total states run:     345
Total run time:   383.468 s.++ echo -n .
++ true
++ sleep 1
.++ echo -n .

[...]
++ true
++ sleep 1
.++ echo -n .
++ true
++ sleep 1
ERROR: Job failed: execution took longer than 2h0m0s seconds
 took longer than 2h0m0s seconds

Acceptance criteria

  • AC1: jobs commonly don't run into the 2h gitlab CI timeout
  • AC2: We can identify the faulty salt minion (because very likely it's one of those being stuck)

Suggestions

  • look up an older ticket and read what we did there about this
  • check if there are actually artifacts uploaded or not
  • check if machines can be reached over salt
  • check usual runtimes of salt state apply
  • try if it is reproducible
  • research upstream if there is anything better we can do to prevent to run into the seemingly hardcoded gitlab 2h timeout
  • run the internal command of salt apply with a timeout well below 2h, e.g. in https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/deploy.yml#L43 just prepend "timeout 1h …"

Related issues 5 (3 open2 closed)

Related to openQA Infrastructure (public) - action #119479: openqABot pipeline failed after runner getting stuck for 1h0m0s size:MWorkable2022-10-21

Actions
Related to openQA Infrastructure (public) - action #133793: salt-pillars-openqa failing to apply within 2h and it is not clear which minion(s) are missing size:MResolvedokurz2023-08-04

Actions
Related to openQA Project (public) - action #134810: [tools] GitlabCI deploy on salt-states-openqa took too much time Rejectedlivdywan2023-08-30

Actions
Related to openQA Infrastructure (public) - action #134819: Errors in salt minion and master log on osdNew2023-08-30

Actions
Copied from QA (public) - action #123894: qem-bot+openqa-bot gitlab CI pipeline aborted with error after 1h of executionNew2023-02-02

Actions
Actions

Also available in: Atom PDF