action #60272: Make fetching custom git repos (e.g. needles) more efficient - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #60272

open

coordination #162539: [saga][epic] future ideas version for version control features within openQA

coordination #162551: [epic] Extend needle version control handling - part 2

Make fetching custom git repos (e.g. needles) more efficient

Added by tinita over 5 years ago. Updated 12 months ago.

Status:

New

Priority:

Low

Assignee:

Category:

Feature requests

Target version:

QA (public) - future

Start date:

2019-11-26

Due date:

% Done:

Estimated time:

Description

Cloning a needles repository can take a lot of time and traffic, even if we use --depth 1.
It could be made more efficient by having a local mirror/proxy repo.

Checking out branches could be done via git worktree. A worktree will share its .git directory with the original directory, so this would be faster and use less disk space.

Actions

Copy link

Updated by okurz over 5 years ago

Category set to Feature requests

Do you mean to make the custom git repo cloning more efficient? Because the normal case is that there is just a single working copy on the webui host which is updated with git fetch and such. The content is then either provided as a shared mount point to workers or synced with rsync to each worker using the cache service.

Actions

Copy link

Updated by tinita over 5 years ago

Subject changed from Make fetching needles more efficient to Make fetching (custom) needles more efficient

okurz wrote:

Do you mean to make the custom git repo cloning more efficient?

Yes.

Actions

Copy link

Updated by tinita over 5 years ago

Subject changed from Make fetching (custom) needles more efficient to Make fetching custom git repos (e.g. needles) more efficient

Actions

Copy link

Updated by tinita over 5 years ago

Just tested fetching a PR:

cd os-autoinst-needles-opensuse
git fetch -f git@github.com:os-autoinst/os-autoinst-needles-opensuse.git "refs/pull/619/head:PR/619"
git worktree add ../PR-619 PR/619

This took less than 5 seconds.

Actions

Copy link

Updated by okurz about 5 years ago

I have an idea that is simple to do in the meantime and provide another benefit: Try to checkout a git refspec from already existing git working copy: https://github.com/os-autoinst/os-autoinst/pull/1358 . Not sure if this helps us that much though within openQA. We commonly use caching meaning that we have tests which reside within the cache directory, common for all worker instances. We don't want to checkout something in there because this would affect other instances. What we could try to do is to clone locally from cache to pool and then checkout.

Actions

Copy link

Updated by tinita about 5 years ago

okurz wrote:

I have an idea that is simple to do in the meantime and provide another benefit: Try to checkout a git refspec from already existing git working copy

But the working copy would still need a git fetch, right?

Actions

Copy link

Updated by okurz about 5 years ago

tinita wrote:

But the working copy would still need a git fetch, right?

Not necessarily. I envision https://github.com/os-autoinst/os-autoinst/pull/1358 to be used when you want to use an older git commit within "master" from the same repo.

Actions

Copy link

Updated by okurz about 5 years ago

I have looked into "git worktree" and I did not understand if it can provide any benefit over using local git clones which use hardlinks by default. I guess the problem for us is basically the same regardless of the approach: How to map remote repositories to local checkouts and have them available on workers as well. One approach I could think of: Map any remote URLs to corresponding trees in the filesystem depending on repo and refspec and always try to clone/fetch locally first before reverting to any remote operation, e.g.

https://github.com/os-autoinst/os-autoinst-distri-opensuse#master -> /var/lib/openqa/share/tests/github.com/os-autoinst/os-autoinst-distri-opensuse/master
https://github.com/okurz/os-autoinst-distri-opensuse#feature/foo -> /var/lib/openqa/share/tests/github.com/okurz/os-autoinst-distri-opensuse/feature/foo
https://github.com/perlpunk/os-autoinst-distri-opensuse#02535deadbeef -> /var/lib/openqa/share/tests/github.com/perlpunk/os-autoinst-distri-opensuse/02535deadbeef

This would already help to not clone again and again on workers and be able to show corresponding source code. Now when someone triggers tests using another repo or fork or branch we could additionally need to see if a corresponding sibling exists and clone+fetch first from there as a "local cache" and only get what is needed from remote.

Another challenge in general on top is that currently either workers use …/share directly or a copy from openQA cache which is shared for all worker instances. Somehow I have the feeling we are coming back to what I originally already thought some years ago when the "caching" was envisioned: We should just use git to clone from …/share into each worker's pool dir.

Actions

Copy link

Updated by tinita about 5 years ago

One advantage of worktrees is that a list of them is kept in the repo, so you have an automatic overview how many clones are out there.

Also, having manual clones would still mean to do a fetch in the main clone first, and then doing a fetch in the local clone.

I experimented a bit with a normal clone and a worktree:

% git clone git@github.com:os-autoinst/os-autoinst-needles-opensuse --depth 1
% cd os-autoinst-needles-opensuse
% git fetch -f git@github.com:os-autoinst/os-autoinst-needles-opensuse.git "refs/pull/651/head:PR/651" --depth 1
% git fetch -f git@github.com:os-autoinst/os-autoinst-needles-opensuse.git "refs/pull/652/head:PR/652" --depth 1
% git worktree add ../PR-652 PR/652
% cd ..
% git clone os-autoinst-needles-opensuse/.git -b PR/651 PR-651
% du -hs *
1.7G    PR-651
863M    PR-652
1.7G    os-autoinst-needles-opensuse

Also the worktree command was a bit faster (4s vs. 15s).

I'll repeat the test with a full clone.

Edit: ok, if I have a full original clone, then the local clones have smaller sizes also.

2.7G    PR-649 # manual local clone
872M    PR-651 # manual local clone
863M    PR-652 # worktree
870M    os-autoinst-needles-opensuse

Actions

Copy link

#10

Updated by tinita about 5 years ago

okurz wrote:

Map any remote URLs to corresponding trees in the filesystem depending on repo and refspec and always try to clone/fetch locally first before reverting to any remote operation, e.g.
...
Somehow I have the feeling we are coming back to what I originally already thought some years ago when the "caching" was envisioned: We should just use git to clone from …/share into each worker's pool dir.

Regardless of using worktree or not, yes, I think that's necessary.

Actions

Copy link

#11

Updated by tinita about 5 years ago

A disadvantage of worktree (or a shared .git folder in general) is, if you remove the original repo folder, the git info in the clone/worktree is lost.

Actions

Copy link

#12

Updated by okurz about 5 years ago

I think we can live with that, no problem.

Actions

Copy link

#13

Updated by okurz almost 5 years ago

Priority changed from Normal to Low
Target version set to Ready

By now we have seen that cloning from github all the time is actually not a problem so far. We should still follow up with this but effectively we have lower prio here.

Actions

Copy link

#14

Updated by okurz over 4 years ago

Target version changed from Ready to future

Actions

Copy link

#15

Updated by okurz 12 months ago

Parent task changed from #58184 to #162551

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public)

Tags

Custom queries

action #60272

Make fetching custom git repos (e.g. needles) more efficient

Updated by okurz over 5 years ago

Updated by tinita over 5 years ago

Updated by tinita over 5 years ago

Updated by tinita over 5 years ago

Updated by okurz about 5 years ago

Updated by tinita about 5 years ago

Updated by okurz about 5 years ago

Updated by okurz about 5 years ago

Updated by tinita about 5 years ago

Updated by tinita about 5 years ago

Updated by tinita about 5 years ago

Updated by okurz about 5 years ago

Updated by okurz almost 5 years ago

Updated by okurz over 4 years ago

Updated by okurz 12 months ago