Project

General

Profile

action #60272

coordination #58184: [saga][epic][use case] full version control awareness within openQA, e.g. user forks and branches, fully versioned test schedules and configuration settings

Make fetching custom git repos (e.g. needles) more efficient

Added by tinita over 1 year ago. Updated 9 months ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2019-11-26
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Cloning a needles repository can take a lot of time and traffic, even if we use --depth 1.
It could be made more efficient by having a local mirror/proxy repo.

Checking out branches could be done via git worktree. A worktree will share its .git directory with the original directory, so this would be faster and use less disk space.

History

#1 Updated by okurz over 1 year ago

  • Category set to Feature requests

Do you mean to make the custom git repo cloning more efficient? Because the normal case is that there is just a single working copy on the webui host which is updated with git fetch and such. The content is then either provided as a shared mount point to workers or synced with rsync to each worker using the cache service.

#2 Updated by tinita over 1 year ago

  • Subject changed from Make fetching needles more efficient to Make fetching (custom) needles more efficient

okurz wrote:

Do you mean to make the custom git repo cloning more efficient?

Yes.

#3 Updated by tinita over 1 year ago

  • Subject changed from Make fetching (custom) needles more efficient to Make fetching custom git repos (e.g. needles) more efficient

#4 Updated by tinita over 1 year ago

Just tested fetching a PR:

cd os-autoinst-needles-opensuse
git fetch -f git@github.com:os-autoinst/os-autoinst-needles-opensuse.git "refs/pull/619/head:PR/619"
git worktree add ../PR-619 PR/619

This took less than 5 seconds.

#5 Updated by okurz over 1 year ago

I have an idea that is simple to do in the meantime and provide another benefit: Try to checkout a git refspec from already existing git working copy: https://github.com/os-autoinst/os-autoinst/pull/1358 . Not sure if this helps us that much though within openQA. We commonly use caching meaning that we have tests which reside within the cache directory, common for all worker instances. We don't want to checkout something in there because this would affect other instances. What we could try to do is to clone locally from cache to pool and then checkout.

#6 Updated by tinita over 1 year ago

okurz wrote:

I have an idea that is simple to do in the meantime and provide another benefit: Try to checkout a git refspec from already existing git working copy

But the working copy would still need a git fetch, right?

#7 Updated by okurz over 1 year ago

tinita wrote:

But the working copy would still need a git fetch, right?

Not necessarily. I envision https://github.com/os-autoinst/os-autoinst/pull/1358 to be used when you want to use an older git commit within "master" from the same repo.

#8 Updated by okurz over 1 year ago

I have looked into "git worktree" and I did not understand if it can provide any benefit over using local git clones which use hardlinks by default. I guess the problem for us is basically the same regardless of the approach: How to map remote repositories to local checkouts and have them available on workers as well. One approach I could think of: Map any remote URLs to corresponding trees in the filesystem depending on repo and refspec and always try to clone/fetch locally first before reverting to any remote operation, e.g.

This would already help to not clone again and again on workers and be able to show corresponding source code. Now when someone triggers tests using another repo or fork or branch we could additionally need to see if a corresponding sibling exists and clone+fetch first from there as a "local cache" and only get what is needed from remote.

Another challenge in general on top is that currently either workers use …/share directly or a copy from openQA cache which is shared for all worker instances. Somehow I have the feeling we are coming back to what I originally already thought some years ago when the "caching" was envisioned: We should just use git to clone from …/share into each worker's pool dir.

#9 Updated by tinita over 1 year ago

One advantage of worktrees is that a list of them is kept in the repo, so you have an automatic overview how many clones are out there.

Also, having manual clones would still mean to do a fetch in the main clone first, and then doing a fetch in the local clone.

I experimented a bit with a normal clone and a worktree:

% git clone git@github.com:os-autoinst/os-autoinst-needles-opensuse --depth 1
% cd os-autoinst-needles-opensuse
% git fetch -f git@github.com:os-autoinst/os-autoinst-needles-opensuse.git "refs/pull/651/head:PR/651" --depth 1
% git fetch -f git@github.com:os-autoinst/os-autoinst-needles-opensuse.git "refs/pull/652/head:PR/652" --depth 1
% git worktree add ../PR-652 PR/652
% cd ..
% git clone os-autoinst-needles-opensuse/.git -b PR/651 PR-651
% du -hs *
1.7G    PR-651
863M    PR-652
1.7G    os-autoinst-needles-opensuse

Also the worktree command was a bit faster (4s vs. 15s).

I'll repeat the test with a full clone.

Edit: ok, if I have a full original clone, then the local clones have smaller sizes also.

2.7G    PR-649 # manual local clone
872M    PR-651 # manual local clone
863M    PR-652 # worktree
870M    os-autoinst-needles-opensuse

#10 Updated by tinita over 1 year ago

okurz wrote:

Map any remote URLs to corresponding trees in the filesystem depending on repo and refspec and always try to clone/fetch locally first before reverting to any remote operation, e.g.
...
Somehow I have the feeling we are coming back to what I originally already thought some years ago when the "caching" was envisioned: We should just use git to clone from …/share into each worker's pool dir.

Regardless of using worktree or not, yes, I think that's necessary.

#11 Updated by tinita over 1 year ago

A disadvantage of worktree (or a shared .git folder in general) is, if you remove the original repo folder, the git info in the clone/worktree is lost.

#12 Updated by okurz over 1 year ago

I think we can live with that, no problem.

#13 Updated by okurz about 1 year ago

  • Priority changed from Normal to Low
  • Target version set to Ready

By now we have seen that cloning from github all the time is actually not a problem so far. We should still follow up with this but effectively we have lower prio here.

#14 Updated by okurz 9 months ago

  • Target version changed from Ready to future

Also available in: Atom PDF