action #56447

openqaworker-arm-2 is out-of-space on /was: openQA on osd fails with empty logs

Added by JERiveraMoya 6 months ago. Updated 6 months ago.

Status:ResolvedStart date:31/07/2019
Priority:UrgentDue date:
Assignee:okurz% Done:

0%

Category:Concrete Bugs
Target version:Current Sprint
Difficulty:
Duration:


Related issues

Related to openQA Project - action #55328: job is considered incomplete by openQA but worker still p... Resolved 31/07/2019
Related to openQA Infrastructure - action #41882: all arm worker die after some time Feedback 02/10/2018
Related to openQA Infrastructure - action #54128: [tools] openqaworker-arm-3 is broken Resolved 11/07/2019

History

#1 Updated by JERiveraMoya 6 months ago

  • Copied from action #54902: openQA on osd fails at "incomplete" status when uploading, "502 response: Proxy Error" added

#2 Updated by JERiveraMoya 6 months ago

  • Copied from deleted (action #54902: openQA on osd fails at "incomplete" status when uploading, "502 response: Proxy Error")

#3 Updated by okurz 6 months ago

  • Related to action #55328: job is considered incomplete by openQA but worker still pushes updates so that "job is not considered dead" added

#4 Updated by okurz 6 months ago

  • Status changed from New to Workable
  • Assignee set to kraih
  • Priority changed from Normal to High
  • Target version set to Current Sprint

as this is very likely caused by https://github.com/os-autoinst/openQA/pull/2270 assigning to you to crosscheck.

#5 Updated by okurz 6 months ago

  • Status changed from Workable to New
  • Assignee deleted (kraih)
  • Target version deleted (Current Sprint)

sorry, I am all wrong. the PR is not yet deployed.

#6 Updated by okurz 6 months ago

  • Subject changed from openQA on osd fails with empty logs to openqaworker-arm-2 is out-of-space on /was: openQA on osd fails with empty logs
  • Status changed from New to In Progress
  • Assignee set to okurz
  • Priority changed from High to Urgent
  • Target version set to Current Sprint

#7 Updated by okurz 6 months ago

  • Related to action #41882: all arm worker die after some time added

#8 Updated by okurz 6 months ago

  • Related to action #54128: [tools] openqaworker-arm-3 is broken added

#9 Updated by okurz 6 months ago

I stopped salt-minion and openqa-worker.target. It looks like /var/lib/openqa/pool is on the same partition as / . I don't know what changed or how it looked like in before. Probably pool should be on NVME as well. systemctl cat openqa_nvme_prepare.service creates the pool but does not seem to do anything with it. This looks similar to #53261 only about "pool", not "cache". Could it be we deleted the "pool" symlink by mistake and should use a bind mount as well? Probably to be done properly with salt.

  1. change to bind mount for all,
  2. add that to salt
  3. add -3 the same and monitor all three

Done first two with https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/160

#10 Updated by okurz 6 months ago

today we have some incompletes on aarch64 but seems like only openqaworker-arm-2. I disabled the worker target on the host and will retrigger incompletes. They should be picked up on openqaworker-arm-1. See e.g. https://openqa.suse.de/tests/3326179 from https://openqa.suse.de/tests/?&resultfilter=Incomplete

#11 Updated by okurz 6 months ago

  • Status changed from In Progress to Resolved

all problems resolved. The nvme preparation is done as available in salt and a workaround for nscd is applied, see https://gitlab.suse.de/openqa/salt-states-openqa/merge_requests/162 . The worker was able to successfully test build 0307 of SLES12SP5 so we should be good.

Also available in: Atom PDF