Project

General

Profile

Actions

action #154156

closed

coordination #58184: [saga][epic][use case] full version control awareness within openQA

coordination #152847: [epic] version control awareness within openQA for test distributions

[spike][timeboxed:10h] Cache test distributions from git on production size:S

Added by okurz 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

As part of #138029 support for git caching was included in os-autoinst covering test distributions as well as wheel repository. We can now look into caching test distributions from git fully enabled on a production openQA instance.

Acceptance criteria

  • AC1: At least one test distribution on openqa.opensuse.org uses git caching successfully

Suggestions

  • Read about GIT_CACHE_DIR in https://github.com/os-autoinst/os-autoinst/blob/master/doc/backend_vars.asciidoc and experiment with that in an openQA environment
  • Experiment how the git cache dir will be populated for multiple openQA jobs running in the same environment
  • Try out multiple openQA jobs running in parallel relying on access to GIT_CACHE_DIR
  • Possibly use it directly on o3 and monitor the impact

Out of scope

  • Manage storage capacity long-term / clean-up service
  • Considering how the worker cache service interacts

Related issues 5 (0 open5 closed)

Related to openQA Project - action #154240: Ensure cloning openQA jobs with GIT_CACHE_DIR works in usual use casesResolvedmkittler2024-01-25

Actions
Related to openQA Infrastructure - action #155104: sh: /usr/bin/du: Permission denied on openqaworker21Rejected2024-02-07

Actions
Copied from openQA Project - action #138029: [research][timeboxed:10h] How to cache "wheel" repositories which are stored on github size:MResolvedmkittler

Actions
Copied to openQA Project - action #154237: [spike][timeboxed:10h] Ensure the worker cache doesn't duplicate git caching of test distributions on o3 size:SResolvedmkittler

Actions
Copied to openQA Project - action #154783: [spike][timeboxed:10h] Run os-autoinst-distri-example directly from git and ensure candidate needles show up on the web UI size:SResolvedmkittler

Actions
Actions #1

Updated by okurz 3 months ago

  • Copied from action #138029: [research][timeboxed:10h] How to cache "wheel" repositories which are stored on github size:M added
Actions #2

Updated by okurz 3 months ago

  • Copied to action #154237: [spike][timeboxed:10h] Ensure the worker cache doesn't duplicate git caching of test distributions on o3 size:S added
Actions #3

Updated by livdywan 3 months ago

  • Description updated (diff)
Actions #4

Updated by livdywan 3 months ago

  • Subject changed from [spike][timeboxed:10h] Cache test distributions from git on production but also considering the worker cache service to [spike][timeboxed:10h] Cache test distributions from git on production
Actions #5

Updated by okurz 3 months ago

  • Related to action #154240: Ensure cloning openQA jobs with GIT_CACHE_DIR works in usual use cases added
Actions #6

Updated by okurz 3 months ago

  • Status changed from New to Blocked
  • Assignee set to okurz
  • Priority changed from High to Normal

#154240 first as decided in estimation call

Actions #7

Updated by mkittler 3 months ago

With https://github.com/os-autoinst/openQA/pull/5438 #154240 should be done but for doing the acceptance tests it would make sense to implement that spike ticket first. (Both tickets are really intervened and splitting them up was likely not very useful.)

Actions #8

Updated by mkittler 3 months ago

  • Status changed from Blocked to Workable

I now tested #154240 without production job, see #154240#note-9 so you can definitely continue with this ticket.

Actions #9

Updated by okurz 3 months ago

  • Status changed from Workable to New
  • Assignee deleted (okurz)
Actions #10

Updated by mkittler 3 months ago

  • Status changed from New to In Progress
  • Assignee set to mkittler
Actions #11

Updated by mkittler 3 months ago · Edited

I enabled Git caching on openqaworker21 via:

mkdir -p /var/lib/openqa/cache/git && chown _openqa-worker:nogroup /var/lib/openqa/cache/git && bash -c "grep -q GIT_CACHE_DIR /etc/openqa/workers.ini || sed -i '/CACHEDIRECTORY/a GIT_CACHE_DIR = /var/lib/openqa/cache/git' /etc/openqa/workers.ini"

and cloned a test job:

openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org/tests/3902538 _GROUP=0 BUILD+=-test-for-poo-154156 WORKER_CLASS+=,openqaworker21 CASEDIR=https://github.com/os-autoinst/os-autoinst-distri-opensuse.git

1 job has been created:


It seems to work:

[2024-01-30T16:00:20.179490Z] [debug] [pid:71462] Current version is 4.6.1706517008.1bcd6e7 [interface v40]
[2024-01-30T16:00:20.187742Z] [info] [pid:71462] ::: OpenQA::Isotovideo::Utils::clone_git: Cloning git URL 'https://github.com/os-autoinst/os-autoinst-distri-opensuse.git' into '/var/lib/openqa/pool/12'
[2024-01-30T16:00:20.189092Z] [info] [pid:71462] ::: OpenQA::Isotovideo::Utils::_clone_bare_repo: Creating bare repository for caching 'https://github.com/os-autoinst/os-autoinst-distri-opensuse.git' under '/var/lib/openqa/cache/git//os-autoinst/os-autoinst-distri-opensuse.git'
[2024-01-30T16:00:50.256431Z] [debug] [pid:71462] Cloning into bare repository '/var/lib/openqa/cache/git//os-autoinst/os-autoinst-distri-opensuse.git'...

[2024-01-30T16:00:50.256745Z] [info] [pid:71462] ::: OpenQA::Isotovideo::Utils::_fetch_new_refs: Updating Git cache for 'https://github.com/os-autoinst/os-autoinst-distri-opensuse.git' under '/var/lib/openqa/cache/git//os-autoinst/os-autoinst-distri-opensuse.git'
[2024-01-30T16:00:50.811681Z] [debug] [pid:71462] From https://github.com/os-autoinst/os-autoinst-distri-opensuse
    * branch            HEAD       -> FETCH_HEAD

[2024-01-30T16:00:53.372112Z] [debug] [pid:71462] Cloning into 'os-autoinst-distri-opensuse'...

[2024-01-30T16:00:53.382681Z] [debug] [pid:71462] git hash in '/var/lib/openqa/pool/12/os-autoinst-distri-opensuse': b5b0e1cc1ad8e8187bae931f7909527ea8ce5b3e
[2024-01-30T16:00:53.400748Z] [debug] [pid:71462] git url in '/var/lib/openqa/pool/12/os-autoinst-distri-opensuse': /var/lib/openqa/cache/git//os-autoinst/os-autoinst-distri-opensuse.git

I created the command to configure this in a way so we could easily run it on all workers. Not sure whether it would actually make sense to enable this on all o3 workers at this point, though. Without any cleanup it is probably a bad idea. Maybe that's something I could look into as part of this ticket as we gave it 10 hours and I have just spent 15 minutes :-)

Actions #12

Updated by okurz 3 months ago

This looks promising:

[2024-01-30T16:00:20.187742Z] [info] [pid:71462] ::: OpenQA::Isotovideo::Utils::clone_git: Cloning git URL 'https://github.com/os-autoinst/os-autoinst-distri-opensuse.git' into '/var/lib/openqa/pool/12'
[2024-01-30T16:00:20.189092Z] [info] [pid:71462] ::: OpenQA::Isotovideo::Utils::_clone_bare_repo: Creating bare repository for caching 'https://github.com/os-autoinst/os-autoinst-distri-opensuse.git' under '/var/lib/openqa/cache/git//os-autoinst/os-autoinst-distri-opensuse.git'
[2024-01-30T16:00:50.256431Z] [debug] [pid:71462] Cloning into bare repository '/var/lib/openqa/cache/git//os-autoinst/os-autoinst-distri-opensuse.git'...

[2024-01-30T16:00:50.256745Z] [info] [pid:71462] ::: OpenQA::Isotovideo::Utils::_fetch_new_refs: Updating Git cache for 'https://github.com/os-autoinst/os-autoinst-distri-opensuse.git' under '/var/lib/openqa/cache/git//os-autoinst/os-autoinst-distri-opensuse.git'
[2024-01-30T16:00:50.811681Z] [debug] [pid:71462] From https://github.com/os-autoinst/os-autoinst-distri-opensuse
    * branch            HEAD       -> FETCH_HEAD

[2024-01-30T16:00:53.372112Z] [debug] [pid:71462] Cloning into 'os-autoinst-distri-opensuse'...

[2024-01-30T16:00:53.382681Z] [debug] [pid:71462] git hash in '/var/lib/openqa/pool/12/os-autoinst-distri-opensuse': b5b0e1cc1ad8e8187bae931f7909527ea8ce5b3e
[2024-01-30T16:00:53.400748Z] [debug] [pid:71462] git url in '/var/lib/openqa/pool/12/os-autoinst-distri-opensuse': /var/lib/openqa/cache/git//os-autoinst/os-autoinst-distri-opensuse.git
Actions #13

Updated by tinita 3 months ago

One thing to note:
https://openqa.opensuse.org/tests/3903427/file/vars.json

   "TEST_GIT_URL" : "/var/lib/openqa/cache/git//os-autoinst/os-autoinst-distri-opensuse.git",

We might want to make the generation of TEST_GIT_URL a bit more intelligent.
Or we make openqa-investigate more intelligent to figure out the web url for that.
(Also maybe we should try to get rid of the double slash. Just because it hurts my eye ;-)

Actions #14

Updated by openqa_review 3 months ago

  • Due date set to 2024-02-14

Setting due date based on mean cycle time of SUSE QE Tools

Actions #15

Updated by okurz 3 months ago · Edited

tinita wrote in #note-13:

One thing to note:
https://openqa.opensuse.org/tests/3903427/file/vars.json

   "TEST_GIT_URL" : "/var/lib/openqa/cache/git//os-autoinst/os-autoinst-distri-opensuse.git",

We might want to make the generation of TEST_GIT_URL a bit more intelligent.
Or we make openqa-investigate more intelligent to figure out the web url for that.

How about setting
TEST_GIT_ORIG_URL=TEST_GIT_URL if GIT_CACHE_DIR

or the inverse: Never update TEST_GIT_URL but save an additional TEST_GIT_CACHE_URL if GIT_CACHE_DIR

Actions #16

Updated by mkittler 3 months ago

It looks like TEST_GIT_URL is never read by os-autoinst itself; it is only created. So I guess it makes most sense to try to still create TEST_GIT_URL with the original URL.

Actions #17

Updated by mkittler 3 months ago

This PR will restore the behavior regarding those variables: https://github.com/os-autoinst/os-autoinst/pull/2452

Actions #18

Updated by okurz 3 months ago

  • Subject changed from [spike][timeboxed:10h] Cache test distributions from git on production to [spike][timeboxed:10h] Cache test distributions from git on production size:S
Actions #19

Updated by mkittler 3 months ago

Here's how basic cleanup could look like: https://github.com/os-autoinst/os-autoinst/pull/2453
(No optimizations like avoiding the cleanup altogether when the disk usage is below a certain threshold has been implemented.)

Actions #20

Updated by okurz 3 months ago

  • Parent task changed from #108527 to #152847
Actions #21

Updated by okurz 3 months ago

  • Copied to action #154783: [spike][timeboxed:10h] Run os-autoinst-distri-example directly from git and ensure candidate needles show up on the web UI size:S added
Actions #22

Updated by mkittler 3 months ago · Edited

Plan for this ticket:

  1. Do further testing of https://github.com/os-autoinst/os-autoinst/pull/2453 locally (so far I only ran unit tests).
  2. Wait for feedback on https://github.com/os-autoinst/os-autoinst/pull/2453 and possibly implement requested changes.
  3. Wait until https://github.com/os-autoinst/os-autoinst/pull/2453 is deployed.
  4. Enable Git caching on all o3 workers (see #154156#note-11).
  5. Do another round of testing and wait at least 2 days to see whether this didn't break anything.

Caching of needles on the web UI side is a whole different story which makes no sense to fit into this timeboxed ticket (especially as it doesn't really match this ticket's title and AC). For that we also already have a pending PR (https://github.com/os-autoinst/openQA/pull/5175).

Actions #23

Updated by mkittler 3 months ago

  • Status changed from In Progress to Feedback
Actions #24

Updated by mkittler 3 months ago

  • Related to action #155104: sh: /usr/bin/du: Permission denied on openqaworker21 added
Actions #25

Updated by mkittler 3 months ago

Actions #26

Updated by mkittler 3 months ago · Edited

  • Status changed from Feedback to In Progress

I rebooted openqaworker21 again, enabled the caching and cloned a test job:

openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org/tests/3922612 _GROUP=0 {TEST,BUILD}+=-test-for-poo-154156 WORKER_CLASS+=,openqaworker21 CASEDIR=https://github.com/os-autoinst/os-autoinst-distri-opensuse.git

1 job has been created:

EDIT: It works, I'm going to reboot the other machines and enable caching there as well.

EDIT: I now enabled it on all workers via (for all sets of hosts):

for i in $hosts; do echo $i && ssh root@$i " mkdir -p /var/lib/openqa/cache/git && chown _openqa-worker:nogroup /var/lib/openqa/cache/git && bash -c \"grep -q GIT_CACHE_DIR /etc/openqa/workers.ini || sed -i '/CACHEDIRECTORY/a GIT_CACHE_DIR = /var/lib/openqa/cache/git' /etc/openqa/workers.ini ; grep -q GIT_CACHE_DIR_LIMIT /etc/openqa/workers.ini || sed -i '/GIT_CACHE_DIR/a GIT_CACHE_DIR_LIMIT = 10737418240' /etc/openqa/workers.ini\" " ; done

The only worker I skipped was openqaworker27 (no openQA worker setup).

Actions #27

Updated by mkittler 3 months ago

  • Status changed from In Progress to Feedback

Judging by the output of for i in $hosts; do echo $i && ssh root@$i " cat /var/lib/openqa/cache/git/index.json " ; done this works already on some hosts.

Actions #28

Updated by mkittler 3 months ago

  • Status changed from Feedback to Resolved

It still looks good and I haven't gotten any further complaints (checked the relevant chat channels).

Actions #29

Updated by okurz 3 months ago

Thank you. That's great. I will look to pull in corresponding follow-up tasks into our backlog if not already present.

Actions #30

Updated by okurz 3 months ago

  • Due date deleted (2024-02-14)
Actions

Also available in: Atom PDF