coordination #58184: [saga][epic][use case] full version control awareness within openQA, e.g. user forks and branches, fully versioned test schedules and configuration settings
Make fetching custom git repos (e.g. needles) more efficient
Cloning a needles repository can take a lot of time and traffic, even if we use
It could be made more efficient by having a local mirror/proxy repo.
Checking out branches could be done via
git worktree. A worktree will share its
.git directory with the original directory, so this would be faster and use less disk space.
#1 Updated by okurz over 1 year ago
- Category set to Feature requests
Do you mean to make the custom git repo cloning more efficient? Because the normal case is that there is just a single working copy on the webui host which is updated with
git fetch and such. The content is then either provided as a shared mount point to workers or synced with rsync to each worker using the cache service.
#5 Updated by okurz over 1 year ago
I have an idea that is simple to do in the meantime and provide another benefit: Try to checkout a git refspec from already existing git working copy: https://github.com/os-autoinst/os-autoinst/pull/1358 . Not sure if this helps us that much though within openQA. We commonly use caching meaning that we have tests which reside within the cache directory, common for all worker instances. We don't want to checkout something in there because this would affect other instances. What we could try to do is to clone locally from cache to pool and then checkout.
#7 Updated by okurz over 1 year ago
But the working copy would still need a
git fetch, right?
Not necessarily. I envision https://github.com/os-autoinst/os-autoinst/pull/1358 to be used when you want to use an older git commit within "master" from the same repo.
#8 Updated by okurz over 1 year ago
I have looked into "git worktree" and I did not understand if it can provide any benefit over using local git clones which use hardlinks by default. I guess the problem for us is basically the same regardless of the approach: How to map remote repositories to local checkouts and have them available on workers as well. One approach I could think of: Map any remote URLs to corresponding trees in the filesystem depending on repo and refspec and always try to clone/fetch locally first before reverting to any remote operation, e.g.
- https://github.com/os-autoinst/os-autoinst-distri-opensuse#master -> /var/lib/openqa/share/tests/github.com/os-autoinst/os-autoinst-distri-opensuse/master
- https://github.com/okurz/os-autoinst-distri-opensuse#feature/foo -> /var/lib/openqa/share/tests/github.com/okurz/os-autoinst-distri-opensuse/feature/foo
- https://github.com/perlpunk/os-autoinst-distri-opensuse#02535deadbeef -> /var/lib/openqa/share/tests/github.com/perlpunk/os-autoinst-distri-opensuse/02535deadbeef
This would already help to not clone again and again on workers and be able to show corresponding source code. Now when someone triggers tests using another repo or fork or branch we could additionally need to see if a corresponding sibling exists and clone+fetch first from there as a "local cache" and only get what is needed from remote.
Another challenge in general on top is that currently either workers use …/share directly or a copy from openQA cache which is shared for all worker instances. Somehow I have the feeling we are coming back to what I originally already thought some years ago when the "caching" was envisioned: We should just use git to clone from …/share into each worker's pool dir.
#9 Updated by tinita over 1 year ago
One advantage of worktrees is that a list of them is kept in the repo, so you have an automatic overview how many clones are out there.
Also, having manual clones would still mean to do a fetch in the main clone first, and then doing a fetch in the local clone.
I experimented a bit with a normal clone and a worktree:
% git clone firstname.lastname@example.org:os-autoinst/os-autoinst-needles-opensuse --depth 1 % cd os-autoinst-needles-opensuse % git fetch -f email@example.com:os-autoinst/os-autoinst-needles-opensuse.git "refs/pull/651/head:PR/651" --depth 1 % git fetch -f firstname.lastname@example.org:os-autoinst/os-autoinst-needles-opensuse.git "refs/pull/652/head:PR/652" --depth 1 % git worktree add ../PR-652 PR/652 % cd .. % git clone os-autoinst-needles-opensuse/.git -b PR/651 PR-651 % du -hs * 1.7G PR-651 863M PR-652 1.7G os-autoinst-needles-opensuse
worktree command was a bit faster (4s vs. 15s).
I'll repeat the test with a full clone.
Edit: ok, if I have a full original clone, then the local clones have smaller sizes also.
2.7G PR-649 # manual local clone 872M PR-651 # manual local clone 863M PR-652 # worktree 870M os-autoinst-needles-opensuse
#10 Updated by tinita over 1 year ago
Map any remote URLs to corresponding trees in the filesystem depending on repo and refspec and always try to clone/fetch locally first before reverting to any remote operation, e.g.
Somehow I have the feeling we are coming back to what I originally already thought some years ago when the "caching" was envisioned: We should just use git to clone from …/share into each worker's pool dir.
Regardless of using worktree or not, yes, I think that's necessary.