tickets #161411
openDedicated networks for openSUSE GitHub Runners
40%
Description
The SUSE Labs department will sponsor an unused old four-node chassis for usage as GitHub Runners. The maintenance will be done by me (Enno Gotthold/SchoolGuy) during my work hours. One of the nodes will be used for the Cobbler org but the other three can be freely integrated into the openSUSE GitHub Org.
As GitHub Runners are essentially executing untrusted code by design they should be isolated as much as possible. I am proposing a VLAN for each GitHub Org (one for Cobbler and one for openSUSE).
The idea is to use https://github.com/actions/actions-runner-controller on top of a k3s cluster to manage the runners. Furthermore, I would desire to use MicroOS as the base OS.
The host is not yet configured with a static network configuration. The four nodes each have a dedicated BMC that only has a Java Web Start based UI for machine access.
Updated by crameleon 11 months ago
Hi Enno,
I will try to configure the network soon. From reading your SUSE ticket, I should probably be able to find the physical connections in SUSE RackTables. Is there some networking configured I can use to connect to the BMC to then configure the correct addresses for our management network? If not, I could spawn a temporary DHCP server.
I understand why MicroOS would be a good candidate for this application. However, I had terrible experience integrating it with our infrastructure in the past. A lot of the Salt states either do not support transactional operation at all, or require dirty hacks. Also a lot of packages are not included in the base distribution, and required maintaining a separate project with various links: https://build.opensuse.org/project/show/openSUSE:infrastructure:Micro. It lead me to eventually move the two servers I tried it with to Leap again and to give up with pursuing the effort to make it work.
Hence I suggest to make your servers Leap based as well but to confine the relevant services with systemd hardening and AppArmor.
I have an AutoYaST profile we can use for deployment of the base OS (there's currently no network boot server in our infrastructure since we rarely ever have new hardware, hence I'd just load it with an image through the BMC, if possible).
The names are fine with me.
Updated by SchoolGuy 11 months ago
crameleon wrote in #note-3:
Hi Enno,
I will try to configure the network soon. From reading your SUSE ticket, I should probably be able to find the physical connections in SUSE RackTables. Is there some networking configured I can use to connect to the BMC to then configure the correct addresses for our management network? If not, I could spawn a temporary DHCP server.
I understand why MicroOS would be a good candidate for this application. However, I had terrible experience integrating it with our infrastructure in the past. A lot of the Salt states either do not support transactional operation at all, or require dirty hacks. Also a lot of packages are not included in the base distribution, and required maintaining a separate project with various links: https://build.opensuse.org/project/show/openSUSE:infrastructure:Micro. It lead me to eventually move the two servers I tried it with to Leap again and to give up with pursuing the effort to make it work.
Hence I suggest to make your servers Leap based as well but to confine the relevant services with systemd hardening and AppArmor.I have an AutoYaST profile we can use for deployment of the base OS (there's currently no network boot server in our infrastructure since we rarely ever have new hardware, hence I'd just load it with an image through the BMC, if possible).
The names are fine with me.
Feel free to go ahead with Leap. I just wanted to save myself a bit of maintenance. The BMC should have DHCP, so spawning a temporary DHCP server should make them accessible. Username/Password I will give you via the work messenger.
Updated by SchoolGuy 11 months ago
crameleon wrote in #note-4:
On a second thought, I wonder if the names shouldn't be something more generic.
I know we will only use these machines as GitHub runners now, but I have this fear of finding a new purpose at some point in the future making the names no longer make sense. ;-)
I have no hard feelings about other names. It was an idea from me. I don't know if we have a naming schema in the openSUSE Infra but if yes then feel free to apply it.
Updated by crameleon 11 months ago
Thanks, found the credentials. Will try them soon and let you know.
Naming scheme is sometimes service related and sometimes just creativity. For physical machines usually the latter (as I feel those are more involved to relabel down the line). What about apollo-chassis + apollo0{1,2,3,4}?
Updated by crameleon 11 months ago
- Precedes tickets #161963: Prepare GitHub runner servers added
Updated by crameleon 11 months ago · Edited
- % Done changed from 10 to 20
Created network allocations:
2a07:de40:b27e:1207::/64
- Machine network for Cobbler runners
https://netbox.infra.opensuse.org/ipam/prefixes/35
with
VLAN 1207 openSUSE-GHR-Cobbler
https://netbox.infra.opensuse.org/ipam/vlans/33
2a07:de40:b27e:1208::/64
- Machine network for openSUSE runners
https://netbox.infra.opensuse.org/ipam/prefixes/36
with
VLAN 1208 openSUSE-GHR-openSUSE
https://netbox.infra.opensuse.org/ipam/vlans/34
2a07:de40:b27e:4003::/64
- K3S Cluster network for Cobbler runners
https://netbox.infra.opensuse.org/ipam/prefixes/37
2a07:de40:b27e:4004::/64
- K3S Service network for Cobbler runners
https://netbox.infra.opensuse.org/ipam/prefixes/38
2a07:de40:b27e:4005::/64
- K3S Cluster network for openSUSE runners
https://netbox.infra.opensuse.org/ipam/prefixes/39
2a07:de40:b27e:4006::/64
- K3S Service network for openSUSE runners
https://netbox.infra.opensuse.org/ipam/prefixes/40
For configuring K3S networking, https://docs.k3s.io/networking/basic-network-options#single-stack-ipv6-networking should be followed (we don't use router advertisements so the warning is not relevant).
Updated by crameleon 11 months ago
- % Done changed from 20 to 30
Patch for routing configuration and firewall baseline submitted as https://gitlab.infra.opensuse.org/infra/salt/-/merge_requests/1917.
Updated by SchoolGuy 11 days ago
I think that the networking for both the Cobbler and the openSUSE Hosts can be identical. The tricky part is that ARC doesn't document what it needs itself, but the Actions themselves could, in theory, access any GitHub-related resource. All IP ranges can be found on the GitHub API and are IPv4 only, afaik.
Link: https://api.github.com/meta
Furthermore, to set up k3s, I would like the Hosts to be able to access https://get.k3s.io
Updated by crameleon 11 days ago
Requested firewall rules implemented via https://gitlab.infra.opensuse.org/infra/salt/-/merge_requests/2421. To add "packages" ranges and other apollo nodes soon.
Updated by SchoolGuy 7 days ago
Either the MR didn't work as intended or the change didn't get deployed.
apollo01 (Cobbler GitHub Runner, K3S):~ # ping docker.io
PING docker.io(2600:1f18:2148:bc01:89b:94df:3759:2fb0 (2600:1f18:2148:bc01:89b:94df:3759:2fb0)) 56 data bytes
From 2a07:de40:b27e:1207::3 (2a07:de40:b27e:1207::3) icmp_seq=1 Destination unreachable: Administratively prohibited
Updated by SchoolGuy 6 days ago
Apparently docker.io is just a redirect for other registries that are "hidden". On this page they describe all needed URLs to allow for Docker Desktop: https://docs.docker.com/desktop/setup/allow-list/
I do believe we have to whitelist the following URLs:
- https://registry-1.docker.io/
- https://docker-images-prod.6aa30f8b08e16409b46e0173d6de2f56.r2.cloudflarestorage.com/
According to this article we will also need the following URLs: https://support.sonatype.com/hc/en-us/articles/115015442847-Whitelisting-Docker-Hub-Hosts-for-Firewalls-and-HTTP-Proxy-Servers
- https://index.docker.io/
- https://production.cloudflare.docker.com/
- https://dseasb33srnrn.cloudfront.net/
However, according to Docker Forums, we will run into issues with static whitelisting... - https://forums.docker.com/t/docker-registry-public-ip-addresses/10013/2
I don't believe hosting a proxy registry is a good idea since the effort to maintain this is quite large, IMHO. As our dear friends at Rancher don't build their stuff inside OBS (and it is not viable from an effort perspective to help them), the easiest option is to somehow broaden the scope of the firewall.
Reading up on https://docs.k3s.io/installation/airgap says that we could download the images from the releases page and push them to a registry. I must say that doing this feels like a lot of effort in the long term...
Updated by crameleon 6 days ago
Thanks for the investigation, it's rather unfortunate not having this deployed off in-house packages and containers.
Dynamic ACLs are a TODO of mine (also for other services which do not publish static IP addresses but only domain names), but I feel it would not add much in reliability here as the list of URLs you collected from different sources feels rather convoluted.
Given the circumstances I can agree to allow wider access for the host system, but I do not feel comfortable also having this reflected in the runner containers (else someone can use the CI for arbitrary download jobs). Can we achieve filtering host and container traffic separately? With https://progress.opensuse.org/issues/161411#note-10 the containers do have their own network, but the linked article does not quite tell me if K3S will do NAT or "proper" routing (in order to get the container network source addresses on the firewall) - https://docs.k3s.io/networking/basic-network-options suggests that with IPv6 it will not do masquerading/NAT by default (which sounds good).
Updated by SchoolGuy 6 days ago · Edited
Yes but both the GitHub Actions and the host will need Docker Hub access as reusable GitHub Actions may use this, so while we can filter separately, it will mean that we will just move the issues to a later point in time (aka once ARK is running).
Edit: The only way to achieve proper separation is to mirror the needed Docker Hub images and block Docker Hub. That would require a dedicated host and dedicated everything else. But I never attempted this, and it would mean that fully qualified registry paths (like with OBS) would fail (in case of Docker Hub-based images).
Updated by crameleon 4 days ago
Weird for the containers to require access to registries, I would expect the host to pull the images.
Sorry, but I don't have a solution for this at the moment, I don't deem our current network setup ready to face arbitrary and unrestricted internet workloads.
I would consider some alternative container implementation, for for GitLab CI for example we use Podman.
Updated by SchoolGuy 4 days ago
GitHub ARC is only suitable for k8s. We can instead try using https://fireactions.io/. That would mean that those ephemeral VMs would pull the images. Would that better suit our network infrastructure? It is a young project but as far as I can tell it is used in production by Hostinger.
Updated by crameleon 4 days ago
That sounds very interesting, though I am a bit biased because I like Firecracker VMs. ;)
I read through the documentation, but after several attempts I still can't quite figure out though how the VMs play together with the containers - the runner service is part of the container image, but the host only runs a VM - so does the container then run inside the VM? Do all jobs use a pre-defined image? It seems for the VMs, network namespaces are used, but it's not clear how that'll reflect in the containers / whether those have separate networking or whether we'll end in the same situation just at a different layer.
Updated by SchoolGuy 2 days ago
My understanding is that the workloads are executed in the VMs. The VM image type is, in this case, an OCI-compliant Docker image (yes, this is a thing https://github.com/codecrafters-io/oci-image-executor).
So from a network perspective:
- Host has a systemd service that executes Firecracker VMs
- VM executes any workload that the GitHub Action decides to execute.
This means the host is pulling the images from ghcr.io and the ephemeral VMs download and execute any PyPi/Docker Hub/pkgs.go.dev/Rust Crates/Node.js packages that are needed to run the GitHub Action workflow. So the VMs will get full internet access and the Host will need access to ghcr.io and whatever else the Ansible Playbooks asks for.
Updated by SchoolGuy 2 days ago · Edited
P.S.: Looking at the Ansible code, they are installing everything from source. While we can change this in the long run, I would still like to deploy the Ansible Playbook as-is to have some progress. I would commit to opening an issue and attempting to gain support from them to install as much as possible from openSUSE packaged RPMs.
Source: https://github.com/hostinger/ansible-collection-fireactions
Updated by crameleon 1 day ago
Thanks for explaining.
So the VMs will get full internet access
Doesn't this put us in the same situation, where the CI workloads get unrestricted access to the internet as a result?
I would commit to opening an issue and attempting to gain support from them to install as much as possible from openSUSE packaged RPMs.
I'm not sure how much interest upstream would have to work on this, but I would be happy to do the packaging.
Updated by crameleon 1 day ago
It's not so much about what's running in the container or VM, I trust we can implement good means of isolation, especially with the VM approach.
I more care about our current network setup not being ready to deal with arbitrary internet load from strangers.
Someone can cause either technical problems by saturating all of our uplink through big downloads, or cause us legal problems through problematic downloads.
Allowing outbound traffic to only selected resources on the internet would mitigate these concerns (even if not ruling them out completely).
I do want to implement based bandwidth throttling and traffic observability in our infrastructure at some point, but it's not there yet.
contributors are vetted by me
It sounds like you would be ok with limiting who's allowed to start pipelines on these runners instead of allowing arbitrary GitHub users to do so. I think this would definitely help until I have better protections in place (once that is the case, I would be happy to relax the restrictions, as to make it less annoying for external contributors).
Would you mind briefly elaborating how this would look like in practice? I guess filtering by organization members can be considered trustworthy for the ~25 people in Cobbler (https://github.com/orgs/cobbler/people), but I'm not confident about applying the same trust of not accidentally running pipelines on malicious PRs to the ~450 people in openSUSE (https://github.com/orgs/openSUSE/people). Or would it happen on a team or repository level?
Updated by SchoolGuy 1 day ago
The runners in openSUSE would be for anyone in the org to use, so yes, that is a much higher level. However, since there seems to be no need, we could create pools per-repository, which would limit it. With Cobbler, you have already guessed that it is sub 30 people.
GitHub, by default, requires a maintainer to approve running the workflows. Meaning, on a PR of a member that hasn't made a PR which was merged, the workflow is not executed. This is the mechanism that is in place and in my eyes, enough to protect from abuse.