Project

General

Profile

action #103953

coordination #64746: [saga][epic] Scale up: Efficient handling of large storage to be able to run current tests efficiently but keep big archives of old results

coordination #80546: [epic] Scale up: Enable to store more results

Use openQA archiving feature on o3 size:S

Added by okurz 7 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
Start date:
2021-12-14
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

See parent #80546 . To be able to scale-up openQA we want to be able to separate recent openQA results (on expensive+fast storage) from older, important openQA results (on slow+cheap storage). With #91782 we have an archiving feature which we can now use (in conjunction with storage.qa.suse.de as backup target)

Acceptance criteria

  • AC1: results on openQA o3 can be automatically archived to a separate space on o3 itself

Suggestions

  • Create new partition from existing volume group vg0
  • Create a new directory as target for archiving on the newpartition
  • Mount that somewhere where openQA can move archived jobs to
  • Enable archiving feature in /etc/openqa/openqa.ini
  • Monitor over some days if that actually works
Screenshot_20211214_162919_archiving_icon.png (13.1 KB) Screenshot_20211214_162919_archiving_icon.png archiving icon from production okurz, 2021-12-14 15:38
12306

Related issues

Copied from openQA Project - action #92788: Use openQA archiving feature on osd size:SResolved

History

#1 Updated by okurz 7 months ago

  • Copied from action #92788: Use openQA archiving feature on osd size:S added

#2 Updated by okurz 7 months ago

  • Status changed from New to In Progress
  • Assignee set to okurz
  • Created logical volume and filesystem with the free space in extents as reported by LVM
lvcreate -n archive -l 262143 vg0
mkfs.xfs /dev/vg0/archive

In /etc/fstab

/dev/vg0/archive /archive xfs defaults,logbufs=8,logbsize=256k,noatime,nodiratime 1 2
/archive /var/lib/openqa/archive bind bind 0 0

then

mkdir -p /archive /var/lib/openqa/archive
mount -a
mkdir -p /var/lib/openqa/archive/testresults

We edited the config and enabled the archiving in /etc/openqa/openqa.ini with the block

[archiving]
# to "${OPENQA_ARCHIVEDIR:-${OPENQA_BASEDIR:-/var/lib}/openqa/archive}/testresults"
archive_preserved_important_jobs = 1

and restarted services

systemctl restart openqa-{webui,gru}

https://openqa.opensuse.org/ is still ok. Eventually we should see archiving minion jobs on https://openqa.opensuse.org/minion/jobs

I changed the config once for trying to cleanup more space so that the result cleanup actually tries to do something useful. Then I triggered systemctl restart openqa-gru && systemctl start openqa-enqueue-result-cleanup which started cleanup. Not yet sure if any archiving tasks are triggered. We can wait if any show up. So check back later and see if there are any archiving jobs on https://openqa.opensuse.org/minion/jobs or if data shows up in /archive/testresults/

#3 Updated by okurz 7 months ago

12306

minion jobs failed because permissions were not correct. So did chown -R geekotest.root /archive/testresults and then we need apparmor additions -> https://github.com/os-autoinst/openQA/pull/4404

Screenshot_20211214_162919_archiving_icon.png

but screenshots are missing. Checking in the test result directory I can see all symlinks for pictures present but within /var/lib/openqa/images. Turned out to be just apparmor as well :facepalm:. Somehow I thought I found real "missing" image files but ok :) All good now!

#4 Updated by mkittler 7 months ago

Btw, I've retried the Minion jobs via sudo -u geekotest /usr/share/openqa/script/openqa eval -V 'for (my ($jobs, $job) = app->minion->jobs({states => ["failed"], tasks => ["archive_job_results"]}); $job = $jobs->next; ) { app->minion->job($job->{id})->retry }'.

#5 Updated by okurz 7 months ago

  • Status changed from In Progress to Resolved

https://openqa.opensuse.org/minion/jobs?task=archive_job_results looks still sane. No new archiving jobs have been triggered since yesterday since we currently have enough space for results on o3. I announced the change in https://matrix.to/#/!ilXMcHXPOjTZeauZcg:libera.chat/$xN8T4UODse9fmtJQXJaecwVT9IPfiydpaXztwIZ-4u4 . I tweaked the retention periods only for the Leap job group and adjusted the default to not store important indefinitely but only 10 years :) After https://github.com/os-autoinst/openQA/pull/4404 was merged it was automatically deployed some hours ago. I confirmed that /etc/apparmor.d/usr.share.openqa.script.openqa looks clean and have removed /etc/apparmor.d/usr.share.openqa.script.openqa.rpmsave . I monitored /var/log/audit/audit.log and have not seen more problems right now. Archived jobs like https://openqa.opensuse.org/tests/1941411 look just fine.

Also available in: Atom PDF