Project

General

Profile

Actions

action #158242

closed

openQA Project - coordination #105624: [saga][epic] Reconsider how openQA handles secrets

openQA Project - coordination #157537: [epic] Secure setup of openQA test machines with secure network+secure authentication

Prevent ssh access to test VMs on svirt hypervisor hosts with firewall size:M

Added by okurz 6 months ago. Updated 4 months ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
Feature requests
Target version:
Start date:
2024-03-28
Due date:
% Done:

0%

Estimated time:

Description

Motivation

In https://sd.suse.com/servicedesk/customer/portal/1/SD-150437 we are asked to handle "compromised root passwords in QA segments" including s390zl11…16

Acceptance criteria

  • AC1: firewall on OSD svirt hosts prevents direct ssh+vnc access from outside, i.e. normal office networks
  • AC2: openQA svirt jobs are still able to access ssh+vnc as necessary, e.g. from openQA workers in the same network OR openQA workers on the hypervisor hosts themselves

Suggestions


Related issues 3 (0 open3 closed)

Copied from openQA Tests - action #157555: [spike][timeboxed:10h][qe-core] Use a different ssh root password for any svirt (s390, x86, etc) installation openQA jobs size:SRejectedokurz

Actions
Copied to openQA Project - action #158455: [spike][timeboxed:10h] openQA worker native on s390xResolvedokurz2024-03-28

Actions
Copied to openQA Infrastructure - action #159066: network-level firewall preventing direct ssh+vnc access to openQA test VMs size:MResolvednicksinger2024-03-28

Actions
Actions #1

Updated by okurz 6 months ago

  • Copied from action #157555: [spike][timeboxed:10h][qe-core] Use a different ssh root password for any svirt (s390, x86, etc) installation openQA jobs size:S added
Actions #2

Updated by okurz 6 months ago

  • Priority changed from Normal to High
Actions #3

Updated by mkittler 6 months ago

  • Assignee set to mkittler
Actions #4

Updated by mkittler 6 months ago · Edited

Take openQA svirt worker instances related to one hypervisor host, e.g. s390zl12, out of production for testing

It looks like login via ssh root@s390zl12.oqa.prg2.suse.org is already not possible using the password in workerconf.sls. The host has PermitRootLogin without-password but also PasswordAuthentication no so I think things are already configured as expected (especially considering the host is already in salt).

I'm only wondering how openQA tests can access the machine then but maybe we deployed a key on the relevant worker hosts? I couldn't find such a key and also cannot connect via ssh root@s390zl12.oqa.prg2.suse.org from worker31. Very strange, because tests seem to be able to do so: https://openqa.suse.de/tests/13924867#step/bootloader_zkvm/1

Actions #5

Updated by mkittler 6 months ago

  • Assignee deleted (mkittler)

Looks like the following hosts are in salt: s390zl12.oqa.prg2.suse.org s390zl13.oqa.prg2.suse.org

I couldn't find any actual references in workerconf.sls for s390zl11. It only appears on worker classes but isn't actually configured anywhere.

The hosts s390zl14.oqa.prg2.suse.org to s390zl17.oqa.prg2.suse.org (besides 12 and 13) are configured in workerconf.sls but they are not in salt and I cannot login at all.

I think to work on this ticket we need some kind of reproducer, e.g. "login on host … via … using the too simple password …". Then one could do some firewall changes and verify that the login the was possible before is now no longer possible. It would also be good to know how the openQA tests currently manage to login (so one can ensure this keeps working).

Actions #6

Updated by mkittler 6 months ago

From the SD ticket:

I understand what QA machines are used for, and I understand why you need password auth, it is fair.

Do we really need password auth on the svirt hosts? When I understand correctly (probably not) the s390zl11…16 hosts are svirt hosts. So they are not SUTs themselves. We could simply deploy an SSH key there (and then wouldn't even have to worry about typing issues on the password prompt). Tests needed to be adjusted, though. (As they currently expect a password prompt, e.g. the bootloader_zkvm module that runs in e.g. https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=BCI-Updates&machine=s390x-kvm&test=bci-init_15.6_on_SLES_15-SP4-fips_podman&version=15-SP6).

Actions #7

Updated by okurz 6 months ago

  • Description updated (diff)

@mkittler you seem to have misunderstood which systems we are talking about. This ticket is about "Prevent ssh access to test VMs" on the hypervisor hosts, not hypervisor hosts themselves, examples of those hosts are s390kvm080.oqa.prg2.suse.org. through s390kvm099.oqa.prg2.suse.org.. s390zl12+13 are svirt hypervisor hosts. Yes, openQA tests access those svirt hypervisor hosts with password authentication but that shouldn't be a problem. The passwords used are within openqa/workerconf.sls in the variable "VIRSH_PASSWORD" and the password is not the default os-autoinst testing password anymore.

You referenced https://openqa.suse.de/tests/13924867#step/bootloader_zkvm/1 showing the successful password authentication into s390zl12, not the problem here. One can see an authentication sent to the target SUT but over the PTY exported to the hypervisor host in https://openqa.suse.de/tests/13924867#step/boot_to_desktop/13 but again not the problem as this is over the hypervisor host. https://openqa.suse.de/tests/13924867#step/boot_to_desktop/19 shows the password based ssh authentication to the target SUT s390kvm083.oqa.prg2.suse.org. That's the one that should be prevented from outside the .oqa.prg2.suse.org network. I updated the description accordingly.

Actions #8

Updated by okurz 6 months ago

  • Subject changed from Prevent ssh access to test VMs on svirt hypervisor hosts with firewall to Prevent ssh access to test VMs on svirt hypervisor hosts with firewall size:M
  • Status changed from New to Workable
Actions #9

Updated by mkittler 6 months ago

  • Assignee set to mkittler
Actions #10

Updated by okurz 6 months ago

  • Copied to action #158455: [spike][timeboxed:10h] openQA worker native on s390x added
Actions #11

Updated by mkittler 6 months ago · Edited

My problem with the login was just that my local workerconf.sls was an old version.


So this can be easily tested by spawning a test SUT like this:

openqa-clone-job --skip-chained-deps --within-instance 'https://openqa.suse.de/tests/13913585' _GROUP=0 {BUILD,TEST}+=-test-for-poo158242 PAUSE_AT=system_prepare WORKER_CLASS=s390kvm081 (or use the latest job in the scenario https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Online&machine=s390x-kvm&test=extra_tests_bootloader&version=15-SP6)

Then it is possible to login via ssh root@s390kvm081.oqa.prg2.suse.org not only from worker31.oqa.prg2.suse.org but also my local machine (in VPN).

I'm not sure how to prevent it, though. I experimented with firewalld. The following commands do the right thing but only for the svirt host s390zl12.oqa.prg2.suse.org itself:

firewall-cmd --zone=public --remove-service=ssh
firewall-cmd --zone=work --add-source=10.145.10.0/24

So with this SSH access to s390zl12.oqa.prg2.suse.org is no longer possible from my machine but remains possible from worker31.oqa.prg2.suse.org and worker32.oqa.prg2.suse.org (and other workers in that IP range but those two are the relevant ones).

Unfortunately the SUT s390kvm081.oqa.prg2.suse.org remains accessible from everywhere and I haven't found out how to prevent the access yet. Probably because bridging is used here so the firewall cannot intercept the traffic.


To temporarily add/remove workers from production I used the following commands on worker31 and worker32:

systemctl mask --now openqa-worker-auto-restart@{1..5}.service openqa-reload-worker-auto-restart@{1..5}.{service,path}
systemctl unmask --now openqa-worker-auto-restart@{1..5}.service openqa-reload-worker-auto-restart@{1..5}.{service,path}

(I actually kept slot 2 on worker31 alive for my test job.)


Any ideas on how to block the traffic of the VMs despite the bridging? Or maybe it is possible to configure the VMs network interface differently? It is currently configured like this:

# virsh dumpxml 1
…
    <interface type='direct'>
      <mac address='52:54:00:9d:09:6a'/>
      <source dev='vlan2114' mode='bridge'/>
      <target dev='macvtap0'/>
      <model type='virtio'/>
      <alias name='net0'/>
      <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0001'/>
    </interface>
…

I restored the previous configuration except for keeping the source in the work zone (which shouldn't make a difference):

s390zl12:/home/martchus # firewall-cmd --get-active-zones 
docker
  interfaces: docker0
public
  interfaces: eth0 vlan0 vlan2114 docker virbr0
work
  sources: 10.145.10.0/24

The interface virbr0 is also still shown in the public zone but this didn't make any difference.


Note that the hypervisor can be accessed via https://zhmc2.suse.de (select ZL12 in the table, then click on the tiny arrow icon, then "Recovery -> Integrated ASCII console").

Actions #12

Updated by mkittler 5 months ago

We haven't managed to block the SSH access to the VMs. Even iptables -P INPUT DROP and iptables -P FORWARD DROP didn't make a difference. (We also installed the non-legacy version of iptables and made sure it is being used.)

We also tried to use nft commands directly (like nft add rule ip filter FORWARD ip daddr 10.145.10.0/24 reject but also on various other chains and also ensuring via handle … that the rules are at the right place as necessary).


Useful resources:


Link to my "test job scenario" (we can simply restart this job to continue): https://openqa.suse.de/tests/latest?arch=s390x&distri=sle&flavor=Online&machine=s390x-kvm&test=extra_tests_bootloader-test-for-poo158242&version=15-SP6


We haven't configured anything persistently so I just rebooted the machine to restore the normal state and unmasked the worker slots again.

Actions #13

Updated by okurz 5 months ago

  • Status changed from Workable to In Progress
Actions #14

Updated by mkittler 5 months ago

We configured a bridge device on the svirt host and adjusted the VM settings to use regular bridge mode instead of macvtap¹. This didn't work. The DHCP server (suttner) seemed to assign the correct IP but it didn't reach back to the VM host.

(This is how the configuration would look like in openQA code: https://github.com/Martchus/os-autoinst-distri-opensuse/tree/zkvm-networking)

Actions #15

Updated by openqa_review 5 months ago

  • Due date set to 2024-04-20

Setting due date based on mean cycle time of SUSE QE Tools

Actions #16

Updated by mkittler 5 months ago · Edited

Actions #17

Updated by okurz 5 months ago

there is also https://unix.stackexchange.com/questions/499756/how-does-iptable-work-with-linux-bridge/500022#500022 to explicitly enable layer 2 traffic being brought under control of the firewall. So we can look into that.

Another alternative is to change the network config so that s390x kvm tests work like other svirt tests, like on unreal6 and openqaw5-xen, that seem to use a different network configuration. /etc/libvirt/libxl/openQA-SUT-1.xml says:

    <interface type='bridge'>
      <mac address='00:16:…'/>
      <source bridge='br0'/>
      <virtualport type='openvswitch'>
        <parameters interfaceid='18…'/>
      </virtualport>
      <model type='netfront'/>
    </interface>
Actions #18

Updated by mkittler 5 months ago · Edited

@mgriessmeier mentioned the following documentation: https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-libvirt-networks.html#libvirt-networks-virtual

Maybe we can use the same setup as we use for xen hosts (e.g. https://openqa.suse.de/tests/13964449/logfile?filename=autoinst-log.txt):

      <interface type='bridge'>
        <mac address='00:16:…'/>
        <source bridge='br0'/>
        <virtualport type='openvswitch'>
          <parameters interfaceid='03…'/>
        </virtualport>
        <target dev='vif100.0'/>
        <model type='netfront'/>
      </interface>

This is similar to what we tried with the bridge on Friday but maybe the additional Open vSwitch configuration helps.

Maybe we also just needed to enable forwarding. It is enabled on openqaw5-xen.qe.prg2.suse.org where the mentioned bridge configuration is used and works (cat /proc/sys/net/ipv4/ip_forward returns 1). It is not enabled on s390zl12.oqa.prg2.suse.org (the command returns 0). @dheidler Do you think our setup would have worked with forwarding enabled?

Actions #20

Updated by mkittler 5 months ago

  • Status changed from In Progress to Feedback

It didn't help to modprobe br_netfilter and modprobe nf_conntrack_bridge (enabling all those settings).

So I tried again with a regular bridge device together with @nicksinger following https://wiki.libvirt.org/Networking.html#bridged-networking-aka-shared-physical-device and https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-libvirt-networks.html. However, we still didn't manage that the response to the VM's DHCP request can actually reach back the VM. Enabling forwarding didn't help and disabling the firewall completely neither. We did a lot of tinkering with the VLAN config because the fact that a vlan device is used here is probably what makes configuring this host so difficult. However, we also were not able to come to a working configuration.

I think at this point we should explore other options.

Actions #21

Updated by mkittler 5 months ago

  • Assignee deleted (mkittler)
Actions #22

Updated by livdywan 5 months ago

  • Status changed from Feedback to Workable
Actions #23

Updated by okurz 5 months ago

I realized that ssh login to s390zl12 over IPv6 does not work anymore, IPv4 works. The command ssh s390zl12.oqa.prg2.suse.org takes very long until ssh falls back to IPv4.

Actions #24

Updated by mkittler 5 months ago

Fixed after setting net.ipv6.conf.interface.accept_ra to 2. I disabled forwarding again via yast because that's supposedly what made the difference. (Otherwise we didn't change any persistent settings.)

Actions #25

Updated by dheidler 5 months ago

  • Assignee set to dheidler
Actions #27

Updated by okurz 5 months ago

  • Status changed from Workable to In Progress
Actions #28

Updated by okurz 5 months ago

As visible in https://openqa.suse.de/tests/13958029#step/check_network/3 sle-15-SP6-Online-x86_64-Build79.1-default@svirt-xen-pv from the scenario https://openqa.suse.de/tests/latest?test=default&machine=svirt-xen-pv other libvirt managed VMs also have an IP in IT managed networks but dynamic DHCP leases and possibly without an SSH server enabled. So the network setup looks comparable to s390zl12+13 but the test and backend setup is different. Hence we suggest to reject this ticket and follow-up with alternatives as planned in the parent epic.

Actions #29

Updated by dheidler 5 months ago

  • Status changed from In Progress to Rejected

Some experiments showed that the existing tests need to reach the SUT from the worker host.
The worker host is in this case not s390zl12.oqa.prg2.suse.org (which is the KVM host) but worker31.oqa.prg2.suse.org.
This means, that my idea of simply using a private virsh network on s390zl12 that is NATing into the oqa.prg2.suse.org network won't work.

The openqa worker software runs on worker31 and not on s390zl12 because as of now we were not building openqa packages for s390x.
This is about to change in the future, so we might move the openqa-worker software to s390zl12 eventually.
This would allow us to scrap the libvirt setup and move the qemu backend similar to what we do on standard x86_64 tests.

Actions #30

Updated by okurz 5 months ago

  • Copied to action #159066: network-level firewall preventing direct ssh+vnc access to openQA test VMs size:M added
Actions #31

Updated by okurz 4 months ago

  • Due date deleted (2024-04-20)
Actions

Also available in: Atom PDF