Project

General

Profile

Actions

action #56447

closed

openqaworker-arm-2 is out-of-space on /was: openQA on osd fails with empty logs

Added by JERiveraMoya about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2019-07-31
Due date:
% Done:

0%

Estimated time:


Related issues 3 (0 open3 closed)

Related to openQA Project - action #55328: job is considered incomplete by openQA but worker still pushes updates so that "job is not considered dead"Resolvedkraih2019-07-31

Actions
Related to openQA Infrastructure - action #41882: all arm worker die after some timeResolvedokurz2018-10-02

Actions
Related to openQA Infrastructure - action #54128: [tools] openqaworker-arm-3 is brokenResolvedokurz2019-07-11

Actions
Actions #1

Updated by JERiveraMoya about 5 years ago

  • Copied from action #54902: openQA on osd fails at "incomplete" status when uploading, "502 response: Proxy Error" added
Actions #2

Updated by JERiveraMoya about 5 years ago

  • Copied from deleted (action #54902: openQA on osd fails at "incomplete" status when uploading, "502 response: Proxy Error")
Actions #3

Updated by okurz about 5 years ago

  • Related to action #55328: job is considered incomplete by openQA but worker still pushes updates so that "job is not considered dead" added
Actions #4

Updated by okurz about 5 years ago

  • Status changed from New to Workable
  • Assignee set to kraih
  • Priority changed from Normal to High
  • Target version set to Current Sprint

as this is very likely caused by https://github.com/os-autoinst/openQA/pull/2270 assigning to you to crosscheck.

Actions #5

Updated by okurz about 5 years ago

  • Status changed from Workable to New
  • Assignee deleted (kraih)
  • Target version deleted (Current Sprint)

sorry, I am all wrong. the PR is not yet deployed.

Actions #6

Updated by okurz about 5 years ago

  • Subject changed from openQA on osd fails with empty logs to openqaworker-arm-2 is out-of-space on /was: openQA on osd fails with empty logs
  • Status changed from New to In Progress
  • Assignee set to okurz
  • Priority changed from High to Urgent
  • Target version set to Current Sprint
Actions #7

Updated by okurz about 5 years ago

  • Related to action #41882: all arm worker die after some time added
Actions #8

Updated by okurz about 5 years ago

  • Related to action #54128: [tools] openqaworker-arm-3 is broken added
Actions #9

Updated by okurz about 5 years ago

I stopped salt-minion and openqa-worker.target. It looks like /var/lib/openqa/pool is on the same partition as / . I don't know what changed or how it looked like in before. Probably pool should be on NVME as well. systemctl cat openqa_nvme_prepare.service creates the pool but does not seem to do anything with it. This looks similar to #53261 only about "pool", not "cache". Could it be we deleted the "pool" symlink by mistake and should use a bind mount as well? Probably to be done properly with salt.

  1. change to bind mount for all,
  2. add that to salt
  3. add -3 the same and monitor all three

Done first two with https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/160

Actions #10

Updated by okurz about 5 years ago

today we have some incompletes on aarch64 but seems like only openqaworker-arm-2. I disabled the worker target on the host and will retrigger incompletes. They should be picked up on openqaworker-arm-1. See e.g. https://openqa.suse.de/tests/3326179 from https://openqa.suse.de/tests/?&resultfilter=Incomplete

Actions #11

Updated by okurz about 5 years ago

  • Status changed from In Progress to Resolved

all problems resolved. The nvme preparation is done as available in salt and a workaround for nscd is applied, see https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/162 . The worker was able to successfully test build 0307 of SLES12SP5 so we should be good.

Actions

Also available in: Atom PDF