action #75454
closed
sometimes clone job is incomplete because of api failure
Added by zluo about 4 years ago.
Updated about 4 years ago.
Description
Please see following:
http://10.162.23.47/tests/8409
Result: incomplete finished 7 minutes ago ( 00:50 minutes )
Reason: api failure
Clone of 8408 Cloned as 8410
Scheduled product: job has not been created by posting an ISO (but possibly the original job)
Assigned worker: couperin:1
This happens sometimes and it just states api failure, but without any logs or further information.
Re-trying clone job works in most cases.
Files
Related issues
1 (1 open — 0 closed)
- Subject changed from [tools] sometimes clone job is incomplete because of api failure to sometimes clone job is incomplete because of api failure
- Category set to Support
- Status changed from New to Feedback
- Assignee set to okurz
- Target version set to Ready
as the job has no logs uploaded can you please provide the logs from the systemd worker service? E.g. log into couperin over ssh and call journalctl -u openqa-worker@1
.
logs of openqa-worker, thanks.
- Copied to action #76765: job is incomplete with reason just being "api failure" and no logs can be uploaded due to OOM condition on worker, improve reason to point to potential causes added
- Status changed from Feedback to Resolved
ok, this is the complete log since 2020-04 (!). It would have been sufficient to just copy the relevant section from the worker journal at the time the job incompleted. There I see:
Oct 28 10:19:49 couperin worker[4904]: [debug] [pid:4904] REST-API call: POST http://10.162.23.47/api/v1/jobs/8408/status
Oct 28 10:19:49 couperin worker[4904]: [error] [pid:4904] Upload images subprocess error: Can't fork: Cannot allocate memory
Oct 28 10:19:59 couperin worker[4904]: [info] [pid:4904] Test schedule has changed, reloading test_order.json
Oct 28 10:19:59 couperin worker[4904]: [debug] [pid:4904] REST-API call: POST http://10.162.23.47/api/v1/jobs/8408/status
Oct 28 10:19:59 couperin worker[4904]: [error] [pid:4904] Upload images subprocess error: Can't fork: Cannot allocate memory
Oct 28 10:20:09 couperin worker[4904]: [debug] [pid:4904] REST-API call: POST http://10.162.23.47/api/v1/jobs/8408/status
Oct 28 10:20:11 couperin worker[4904]: [error] [pid:4904] Upload images subprocess error: Can't fork: Cannot allocate memory
Oct 28 10:20:21 couperin worker[4904]: [debug] [pid:4904] REST-API call: POST http://10.162.23.47/api/v1/jobs/8408/status
Oct 28 10:20:22 couperin worker[4904]: [error] [pid:4904] Upload images subprocess error: Can't fork: Cannot allocate memory
Oct 28 10:20:32 couperin worker[4904]: [debug] [pid:4904] REST-API call: POST http://10.162.23.47/api/v1/jobs/8408/status
so you exhausted all available memory of your machine. This is something that we can not prevent. I suggest you either try to carefully avoid such situation or you have a monitoring alert that informs you if your worker host is suffering from situations like these. What we can try to improve is the extend the "api failure" reason to point to potential problems that could lead to this symptom. Created #76765 for this minor improvement.
Also available in: Atom
PDF