action #75454: sometimes clone job is incomplete because of api failure - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #75454

closed

sometimes clone job is incomplete because of api failure

Added by zluo over 4 years ago. Updated over 4 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

okurz

Category:

Support

Target version:

Ready

Start date:

2020-10-28

Due date:

% Done:

Estimated time:

Description

Please see following:
http://10.162.23.47/tests/8409

Result: incomplete finished 7 minutes ago ( 00:50 minutes )
Reason: api failure
Clone of 8408 Cloned as 8410
Scheduled product: job has not been created by posting an ISO (but possibly the original job)
Assigned worker: couperin:1

This happens sometimes and it just states api failure, but without any logs or further information.

Re-trying clone job works in most cases.

Files

openqa-worker@1.log.gz (11.1 MB) openqa-worker@1.log.gz

zluo, 2020-10-29 07:45

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by okurz over 4 years ago

Subject changed from [tools] sometimes clone job is incomplete because of api failure to sometimes clone job is incomplete because of api failure
Category set to Support
Status changed from New to Feedback
Assignee set to okurz
Target version set to Ready

as the job has no logs uploaded can you please provide the logs from the systemd worker service? E.g. log into couperin over ssh and call journalctl -u openqa-worker@1.

Actions

Copy link

Updated by zluo over 4 years ago

File openqa-worker@1.log.gz openqa-worker@1.log.gz added

logs of openqa-worker, thanks.

Actions

Copy link

Updated by okurz over 4 years ago

Copied to action #76765: job is incomplete with reason just being "api failure" and no logs can be uploaded due to OOM condition on worker, improve reason to point to potential causes added

Actions

Copy link

Updated by okurz over 4 years ago

Status changed from Feedback to Resolved

ok, this is the complete log since 2020-04 (!). It would have been sufficient to just copy the relevant section from the worker journal at the time the job incompleted. There I see:

Oct 28 10:19:49 couperin worker[4904]: [debug] [pid:4904] REST-API call: POST http://10.162.23.47/api/v1/jobs/8408/status
Oct 28 10:19:49 couperin worker[4904]: [error] [pid:4904] Upload images subprocess error: Can't fork: Cannot allocate memory
Oct 28 10:19:59 couperin worker[4904]: [info] [pid:4904] Test schedule has changed, reloading test_order.json
Oct 28 10:19:59 couperin worker[4904]: [debug] [pid:4904] REST-API call: POST http://10.162.23.47/api/v1/jobs/8408/status
Oct 28 10:19:59 couperin worker[4904]: [error] [pid:4904] Upload images subprocess error: Can't fork: Cannot allocate memory
Oct 28 10:20:09 couperin worker[4904]: [debug] [pid:4904] REST-API call: POST http://10.162.23.47/api/v1/jobs/8408/status
Oct 28 10:20:11 couperin worker[4904]: [error] [pid:4904] Upload images subprocess error: Can't fork: Cannot allocate memory
Oct 28 10:20:21 couperin worker[4904]: [debug] [pid:4904] REST-API call: POST http://10.162.23.47/api/v1/jobs/8408/status
Oct 28 10:20:22 couperin worker[4904]: [error] [pid:4904] Upload images subprocess error: Can't fork: Cannot allocate memory
Oct 28 10:20:32 couperin worker[4904]: [debug] [pid:4904] REST-API call: POST http://10.162.23.47/api/v1/jobs/8408/status

so you exhausted all available memory of your machine. This is something that we can not prevent. I suggest you either try to carefully avoid such situation or you have a monitoring alert that informs you if your worker host is suffering from situations like these. What we can try to improve is the extend the "api failure" reason to point to potential problems that could lead to this symptom. Created #76765 for this minor improvement.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #75454

sometimes clone job is incomplete because of api failure

Updated by okurz over 4 years ago

Updated by zluo over 4 years ago

Updated by okurz over 4 years ago

Updated by okurz over 4 years ago