action #158242
closedopenQA Project (public) - coordination #105624: [saga][epic] Reconsider how openQA handles secrets
openQA Project (public) - coordination #157537: [epic] Secure setup of openQA test machines with secure network+secure authentication
Prevent ssh access to test VMs on svirt hypervisor hosts with firewall size:M
0%
Description
Motivation¶
In https://sd.suse.com/servicedesk/customer/portal/1/SD-150437 we are asked to handle "compromised root passwords in QA segments" including s390zl11…16
Acceptance criteria¶
- AC1: firewall on OSD svirt hosts prevents direct ssh+vnc access from outside, i.e. normal office networks
- AC2: openQA svirt jobs are still able to access ssh+vnc as necessary, e.g. from openQA workers in the same network OR openQA workers on the hypervisor hosts themselves
Suggestions¶
- Take openQA svirt worker instances related to one hypervisor host, e.g. s390zl12, out of production for testing
- Configure a/the firewall on that host to block ssh+vnc to VMs running on that host, e.g. s390kvm080.oqa.prg2.suse.org…s390kvm099.oqa.prg2.suse.org
- Allow traffic from other hosts in oqa.prg2.suse.org
- Ensure that openQA tests still work, e.g. the login to the target SUT VM in "boot_to_desktop". Use for verification
- Ensure that the according firewall config is made boot-persistent and in salt
- Crosscheck with at least one reboot
- Ensure that the solution at least applies to s390kvm080.oqa.prg2.suse.org…s390kvm099.oqa.prg2.suse.org
- Apply the same solution to all other OSD svirt hosts, at least unreal6+openqaw5-xen
- Use at least https://openqa.suse.de/tests/latest?machine=svirt-xen-pv&test=default for verification
Updated by okurz 9 months ago
- Copied from action #157555: [spike][timeboxed:10h][qe-core] Use a different ssh root password for any svirt (s390, x86, etc) installation openQA jobs size:S added
Updated by mkittler 9 months ago · Edited
Take openQA svirt worker instances related to one hypervisor host, e.g. s390zl12, out of production for testing
It looks like login via ssh root@s390zl12.oqa.prg2.suse.org
is already not possible using the password in workerconf.sls
. The host has PermitRootLogin without-password
but also PasswordAuthentication no
so I think things are already configured as expected (especially considering the host is already in salt).
I'm only wondering how openQA tests can access the machine then but maybe we deployed a key on the relevant worker hosts? I couldn't find such a key and also cannot connect via ssh root@s390zl12.oqa.prg2.suse.org
from worker31. Very strange, because tests seem to be able to do so: https://openqa.suse.de/tests/13924867#step/bootloader_zkvm/1
Updated by mkittler 9 months ago
- Assignee deleted (
mkittler)
Looks like the following hosts are in salt: s390zl12.oqa.prg2.suse.org s390zl13.oqa.prg2.suse.org
I couldn't find any actual references in workerconf.sls
for s390zl11. It only appears on worker classes but isn't actually configured anywhere.
The hosts s390zl14.oqa.prg2.suse.org to s390zl17.oqa.prg2.suse.org (besides 12 and 13) are configured in workerconf.sls
but they are not in salt and I cannot login at all.
I think to work on this ticket we need some kind of reproducer, e.g. "login on host … via … using the too simple password …". Then one could do some firewall changes and verify that the login the was possible before is now no longer possible. It would also be good to know how the openQA tests currently manage to login (so one can ensure this keeps working).
Updated by mkittler 9 months ago
From the SD ticket:
I understand what QA machines are used for, and I understand why you need password auth, it is fair.
Do we really need password auth on the svirt hosts? When I understand correctly (probably not) the s390zl11…16 hosts are svirt hosts. So they are not SUTs themselves. We could simply deploy an SSH key there (and then wouldn't even have to worry about typing issues on the password prompt). Tests needed to be adjusted, though. (As they currently expect a password prompt, e.g. the bootloader_zkvm
module that runs in e.g. https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=BCI-Updates&machine=s390x-kvm&test=bci-init_15.6_on_SLES_15-SP4-fips_podman&version=15-SP6).
Updated by okurz 9 months ago
- Description updated (diff)
@mkittler you seem to have misunderstood which systems we are talking about. This ticket is about "Prevent ssh access to test VMs" on the hypervisor hosts, not hypervisor hosts themselves, examples of those hosts are s390kvm080.oqa.prg2.suse.org. through s390kvm099.oqa.prg2.suse.org.. s390zl12+13 are svirt hypervisor hosts. Yes, openQA tests access those svirt hypervisor hosts with password authentication but that shouldn't be a problem. The passwords used are within openqa/workerconf.sls in the variable "VIRSH_PASSWORD" and the password is not the default os-autoinst testing password anymore.
You referenced https://openqa.suse.de/tests/13924867#step/bootloader_zkvm/1 showing the successful password authentication into s390zl12, not the problem here. One can see an authentication sent to the target SUT but over the PTY exported to the hypervisor host in https://openqa.suse.de/tests/13924867#step/boot_to_desktop/13 but again not the problem as this is over the hypervisor host. https://openqa.suse.de/tests/13924867#step/boot_to_desktop/19 shows the password based ssh authentication to the target SUT s390kvm083.oqa.prg2.suse.org. That's the one that should be prevented from outside the .oqa.prg2.suse.org network. I updated the description accordingly.
Updated by okurz 9 months ago
- Copied to action #158455: [spike][timeboxed:10h] openQA worker native on s390x added
Updated by mkittler 9 months ago · Edited
My problem with the login was just that my local workerconf.sls
was an old version.
So this can be easily tested by spawning a test SUT like this:
openqa-clone-job --skip-chained-deps --within-instance 'https://openqa.suse.de/tests/13913585' _GROUP=0 {BUILD,TEST}+=-test-for-poo158242 PAUSE_AT=system_prepare WORKER_CLASS=s390kvm081
(or use the latest job in the scenario https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Online&machine=s390x-kvm&test=extra_tests_bootloader&version=15-SP6)
Then it is possible to login via ssh root@s390kvm081.oqa.prg2.suse.org
not only from worker31.oqa.prg2.suse.org
but also my local machine (in VPN).
I'm not sure how to prevent it, though. I experimented with firewalld. The following commands do the right thing but only for the svirt host s390zl12.oqa.prg2.suse.org
itself:
firewall-cmd --zone=public --remove-service=ssh
firewall-cmd --zone=work --add-source=10.145.10.0/24
So with this SSH access to s390zl12.oqa.prg2.suse.org
is no longer possible from my machine but remains possible from worker31.oqa.prg2.suse.org
and worker32.oqa.prg2.suse.org
(and other workers in that IP range but those two are the relevant ones).
Unfortunately the SUT s390kvm081.oqa.prg2.suse.org
remains accessible from everywhere and I haven't found out how to prevent the access yet. Probably because bridging is used here so the firewall cannot intercept the traffic.
To temporarily add/remove workers from production I used the following commands on worker31
and worker32
:
systemctl mask --now openqa-worker-auto-restart@{1..5}.service openqa-reload-worker-auto-restart@{1..5}.{service,path}
systemctl unmask --now openqa-worker-auto-restart@{1..5}.service openqa-reload-worker-auto-restart@{1..5}.{service,path}
(I actually kept slot 2 on worker31 alive for my test job.)
Any ideas on how to block the traffic of the VMs despite the bridging? Or maybe it is possible to configure the VMs network interface differently? It is currently configured like this:
# virsh dumpxml 1
…
<interface type='direct'>
<mac address='52:54:00:9d:09:6a'/>
<source dev='vlan2114' mode='bridge'/>
<target dev='macvtap0'/>
<model type='virtio'/>
<alias name='net0'/>
<address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0001'/>
</interface>
…
I restored the previous configuration except for keeping the source in the work zone (which shouldn't make a difference):
s390zl12:/home/martchus # firewall-cmd --get-active-zones
docker
interfaces: docker0
public
interfaces: eth0 vlan0 vlan2114 docker virbr0
work
sources: 10.145.10.0/24
The interface virbr0
is also still shown in the public zone but this didn't make any difference.
Note that the hypervisor can be accessed via https://zhmc2.suse.de (select ZL12 in the table, then click on the tiny arrow icon, then "Recovery -> Integrated ASCII console").
Updated by mkittler 9 months ago
We haven't managed to block the SSH access to the VMs. Even iptables -P INPUT DROP
and iptables -P FORWARD DROP
didn't make a difference. (We also installed the non-legacy version of iptables
and made sure it is being used.)
We also tried to use nft commands directly (like nft add rule ip filter FORWARD ip daddr 10.145.10.0/24 reject
but also on various other chains and also ensuring via handle …
that the rules are at the right place as necessary).
Useful resources:
- https://firewalld.org/documentation/zone/predefined-zones.html
- https://wiki.archlinux.org/title/Nftables
- https://developers.redhat.com/blog/2020/08/18/iptables-the-two-variants-and-their-relationship-with-nftables#using_iptables_nft
- https://stuffphilwrites.com/wp-content/uploads/2014/09/FW-IDS-iptables-Flowchart-v2019-04-30-1.png
- https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-libvirt-config-virsh.html
Link to my "test job scenario" (we can simply restart this job to continue): https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Online&machine=s390x-kvm&test=extra_tests_bootloader-test-for-poo158242&version=15-SP6
We haven't configured anything persistently so I just rebooted the machine to restore the normal state and unmasked the worker slots again.
Updated by mkittler 9 months ago
We configured a bridge device on the svirt host and adjusted the VM settings to use regular bridge mode instead of macvtap¹. This didn't work. The DHCP server (suttner) seemed to assign the correct IP but it didn't reach back to the VM host.
(This is how the configuration would look like in openQA code: https://github.com/Martchus/os-autoinst-distri-opensuse/tree/zkvm-networking)
Updated by openqa_review 9 months ago
- Due date set to 2024-04-20
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz 9 months ago
there is also https://unix.stackexchange.com/questions/499756/how-does-iptable-work-with-linux-bridge/500022#500022 to explicitly enable layer 2 traffic being brought under control of the firewall. So we can look into that.
Another alternative is to change the network config so that s390x kvm tests work like other svirt tests, like on unreal6 and openqaw5-xen, that seem to use a different network configuration. /etc/libvirt/libxl/openQA-SUT-1.xml says:
<interface type='bridge'>
<mac address='00:16:…'/>
<source bridge='br0'/>
<virtualport type='openvswitch'>
<parameters interfaceid='18…'/>
</virtualport>
<model type='netfront'/>
</interface>
Updated by mkittler 9 months ago · Edited
@mgriessmeier mentioned the following documentation: https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-libvirt-networks.html#libvirt-networks-virtual
Maybe we can use the same setup as we use for xen hosts (e.g. https://openqa.suse.de/tests/13964449/logfile?filename=autoinst-log.txt):
<interface type='bridge'>
<mac address='00:16:…'/>
<source bridge='br0'/>
<virtualport type='openvswitch'>
<parameters interfaceid='03…'/>
</virtualport>
<target dev='vif100.0'/>
<model type='netfront'/>
</interface>
This is similar to what we tried with the bridge on Friday but maybe the additional Open vSwitch configuration helps.
Maybe we also just needed to enable forwarding. It is enabled on openqaw5-xen.qe.prg2.suse.org where the mentioned bridge configuration is used and works (cat /proc/sys/net/ipv4/ip_forward
returns 1). It is not enabled on s390zl12.oqa.prg2.suse.org (the command returns 0). @dheidler Do you think our setup would have worked with forwarding enabled?
Updated by mkittler 9 months ago
Additional resources:
- https://wiki.libvirt.org/Net.bridge.bridge-nf-call_and_sysctl.conf.html
- https://www.ibm.com/docs/en/linux-on-systems?topic=recommendations-kvm-host-networking-configuration-choices
- https://virt.kernelnewbies.org/MacVTap
- https://askubuntu.com/questions/383082/whats-the-difference-between-tun-tap-vs-bridgevnet-vs-macvtap-for-virtualiza
Updated by mkittler 9 months ago
- Status changed from In Progress to Feedback
It didn't help to modprobe br_netfilter
and modprobe nf_conntrack_bridge
(enabling all those settings).
So I tried again with a regular bridge device together with @nicksinger following https://wiki.libvirt.org/Networking.html#bridged-networking-aka-shared-physical-device and https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-libvirt-networks.html. However, we still didn't manage that the response to the VM's DHCP request can actually reach back the VM. Enabling forwarding didn't help and disabling the firewall completely neither. We did a lot of tinkering with the VLAN config because the fact that a vlan device is used here is probably what makes configuring this host so difficult. However, we also were not able to come to a working configuration.
I think at this point we should explore other options.
Updated by okurz 8 months ago
As visible in https://openqa.suse.de/tests/13958029#step/check_network/3 sle-15-SP6-Online-x86_64-Build79.1-default@svirt-xen-pv from the scenario https://openqa.suse.de/tests/latest?test=default&machine=svirt-xen-pv other libvirt managed VMs also have an IP in IT managed networks but dynamic DHCP leases and possibly without an SSH server enabled. So the network setup looks comparable to s390zl12+13 but the test and backend setup is different. Hence we suggest to reject this ticket and follow-up with alternatives as planned in the parent epic.
Updated by dheidler 8 months ago
- Status changed from In Progress to Rejected
Some experiments showed that the existing tests need to reach the SUT from the worker host.
The worker host is in this case not s390zl12.oqa.prg2.suse.org (which is the KVM host) but worker31.oqa.prg2.suse.org.
This means, that my idea of simply using a private virsh network on s390zl12 that is NATing into the oqa.prg2.suse.org network won't work.
The openqa worker software runs on worker31 and not on s390zl12 because as of now we were not building openqa packages for s390x.
This is about to change in the future, so we might move the openqa-worker software to s390zl12 eventually.
This would allow us to scrap the libvirt setup and move the qemu backend similar to what we do on standard x86_64 tests.
Updated by okurz 8 months ago
- Copied to action #159066: network-level firewall preventing direct ssh+vnc access to openQA test VMs size:M added