Project

General

Profile

News

Cumulative update (1 comment)

Added by crameleon 5 months ago

Hello!

It's been a long time since a news update here. Many things changed, and there are lots of exciting news which have been partially communicated in emails and meetings, but not in much detail. I try to clean up some of the news backlog gathered in a good half of a year with this post.

Data center move

With SUSE moving one of their datacenters (the one which hosted the majority of openSUSE infrastructure) to Prague towards the end of 2023, we got the opportunity to not only move our systems along, but also to introduce fundamental changes which would have otherwise been difficult to implement in the existing environment - both for SUSE, and for openSUSE.

SUSE graciously provided us with not only brand new hardware, but also autonomy over what we do with it. Whereas the physical infrastructure in the old location was entirely operated by SUSE, with openSUSE merely having SSH access to the virtual machines, the new setup allows the openSUSE Heroes to fully manage their hypervisors and networking stack. This freedom allows the team to implement new ideas and react to issues with virtual machines without relying on and waiting for SUSE internal support.

Maintenance

Of course, lots of freedom comes with lots of responsibility. That makes it even more important for our infrastructure to be stable and easy to maintain - after all, the Heroes team is made of volunteers, of which there aren't many who can be motivated to spend their free time with boring maintenance tasks. Hence we now enforce automatic security updates on all machines using os-update and rebootmgr, with added logic to reduce outages by staggering the package updates on reboots of machines in high availability clusters, and to alert administrators about machines requiring updates which cannot install them on their own, or ones requiring manual reboot intervention.

Automation

We already use Salt as an automation and infrastructure as code solution since a long time, but increased the Salt coverage by a large margin. Whereas previously only a comparatively small amount of machine configuration was actively maintained using Salt, now the whole base system (network configuration, repositories, kernel settings, authentication, monitoring, logging, certificates, email, automatic updates) is covered, with a steadily increasing amount of the various machine specific services (not surprisingly, the large amount of services under the opensuse.org umbrella comes with a large amount of applications requiring configuration).
Using the infrastructure as code approach has multiple benefits, some of which are:

  • configuration changes are tracked and documented in a Git history
  • configuration is unified across all machines
  • central deployment of changes is possible To fully use those benefits, it is essential to have a large coverage, and to reduce the times one needs to manually visit a machine for changes.

We encourage you to check our Git repositories if you are curious (the salt.git repository is now automatically mirrored to code.opensuse.org and GitHub!):
https://github.com/openSUSE/heroes-salt or https://code.opensuse.org/heroes/salt/tree/production
https://github.com/openSUSE/salt-formulas or https://code.opensuse.org/heroes/salt-formulas/tree/main

Monitoring

Monitoring already had an important role in our infrastructure before, but it was time to expand it further. The setup which had its center in our old data center, and was based on Nagios and Icinga, served us very well - but after working with it for some time, I gathered the following concerns:

  • the server configuration was not managed by Salt, and there was no existing Salt formula community around it
  • lots of checks are executed on machines through a central agent which runs as root
  • single failure domain
  • often the need for custom check scripts to cover new applications

The data center move broke the existing setup, which finally answered my question about whether to refactor or whether to build something new.
After evaluating multiple solutions, including Nagios again, Munin and Zabbix, the choice eventually fell to a stack with Prometheus at its heart. Not only do I have existing experience with Prometheus, but also did I value:

  • configuration of all server and client components is either through environment variables, or YAML files - perfect for integration into a code based infrastructure without lots of custom templating
  • existing Salt formulas
  • each service being monitored is equipped with a dedicated metrics exporter - this for one keeps other services actively monitored if one exporter breaks, but more importantly it allows for deep privilege separation, as most monitoring checks can be performed with only a small set of permissions
  • a large amount of applications, especially ones providing web services, which we run plenty of, have built-in support for serving Prometheus compatible metrics - those allow for extremely easy integration with no need to install or write custom monitoring checks or "translation" tools
  • plenty of existing community supported exporters for applications which do not provide built-in Prometheus metrics - the configuration of them is usually standardized, making RPM packaging and deployment easy
  • easy to add custom checks to collect metrics covered by neither of the above

Already a large amount of additional metrics from various services which were not monitored before are gathered using the new setup. For most we already implemented alerting to notice odd behavior early, and added visualization dashboards with graphs - of course because they are pretty, but also to make spotting anomalies at a glance easier.

The alerting rules which are made of PromQL expression can be found in the previously linked repositories, under salt/files/prometheus/alerts/. Currently, we split alerts based on their severity into either IRC (#opensuse-admin-alerts - still a bit experimental), email, or dashboard receivers.

The alerting dashboard (using Karma) - currently only reachable if connected to the openSUSE Heroes VPN:
https://alerts.infra.opensuse.org/

The visualization dashboards in Grafana - accessible to everyone:
https://monitor.opensuse.org/grafana/.

Our monitoring posture is on a good track, but there is still a lot of work to do to cover more services, fine tune alerts, and to update the internal documentation and response.

Security and internal authentication

Previously we used a FreeIPA based LDAP setup for authentication of Heroes to internal services and machines.
We did not use the majority of features FreeIPA offered, and had noone wanting to maintain a machine not running openSUSE anymore - with Kanidm experience in the team, the decision what to migrate to was easy.

In the process, we replaced the sssd installations on all machines with kanidm-unixd, and globally enforced public key based authentications for SSH. Not only is the setup simpler and well maintained, we also found improvements with credential caching, allowing login even throughout potential network issues on a host.

VPN

Gateway to all of our internal openSUSE infrastructure is a OpenVPN setup. We improved its security by switching to the current best practices for compression.

Closing words

Of course, there have been various other changes, some of which being too small to cover here, some of which simply forgotten about. You will either find out about them in future news posts, or by inspecting our commit and ticket histories.

Have fun!

Fixed security issue that affected tsp.opensuse.org (1 comment)

Added by hellcp almost 2 years ago

Hi,

We were contacted by Lukas Euler from Positive Security, to inform us that Travel Support Program (TSP), the application we use to reimburse the costs of traveling to events where you can promote or are organized by the project, had a significant security flaw that impacted our and others' production systems. We have since patched the vulnerability, contacted other organizations that also use the software, and have spent some time and wrote a script to parse logs, in order to asses the impact. Over the span of the last 2 years, the flaw has not been abused, outside of a script written by Lukas, which read contents of the production database via brute force.

The what & the how

In essence, the flaw allowed an attacker to see if any arbitrary string is in the database by detecting size of a response to a request with a query containing the string, table by table. Let me illustrate:

Observe the following requests:

"GET /events.json?q[end_date_gteq]=0&q[name_eq]=Open Mainframe Summit&q[requests_comments_body_cont]=a HTTP/1.1" 200 516
"GET /events.json?q[end_date_gteq]=0&q[name_eq]=Open Mainframe Summit&q[requests_comments_body_cont]=aa HTTP/1.1" 200 12
  • /events.json is our public api endpoint containing all the events ever created on the instance
  • end_date_gteq means match to end_date greater or equal to what is after the equals sign
  • name_eq means match to name equal (not case sensitive) to what is after the equals sign
  • requests_comments_body_cont means all the comment bodies that contain what is after the equals sign in all the requests that were created

As you can see, the two requests have wildly different response sizes (last number), the one with 516B returned a body containing the details of the event, while the second one with 12B did not, because no comment body in any request associated with that event matched string 'aa'. Since every reimbursement request has to have an event that is attached to it, you could find contents of every comment in the database if you spent the time going through every character one by one. Of course, the comments are the least of our worry in an application that deals with addresses and bank information. Bank information is attached to a reimbursement, which is mapped directly onto a request, so that's easy enough to find. You could find users through any object created by them, comments, reimbursements, requests, and from there you have an easy time getting to address information.

The why

By default the Ransack library that we use for querying various objects in the frontend allows for querying any association of those objects. We didn't limit the scope of association or columns that could be accessed by it. Since all the objects within the database of TSP are associated in some way or another, that gave just anybody access to the entire database.

Here's some documentation on the subject:
https://activerecord-hackery.github.io/ransack/going-further/associations/
https://activerecord-hackery.github.io/ransack/getting-started/search-matches/

And the way to fix the scopes that ransack can access:
https://activerecord-hackery.github.io/ransack/going-further/other-notes/#authorization-allowlistingdenylisting

How we fixed it:
https://github.com/openSUSE/travel-support-program/commit/d22916275c51500b4004933ff1b0a69bc807b2b7

Github Advisory:
https://github.com/openSUSE/travel-support-program/security/advisories/GHSA-2wwv-c6xh-cf68

Huge thanks to Lukas and the Positive Security team for letting us know about this, we wouldn't have known about this without you, and our data would have been in jeopardy.

Network outage next Thursday, April 21st

Added by lrupp over 2 years ago

SUSE-IT plans a replacement of some network equipment in the Nuremberg Datacenter next Thursday (2022-04-21), between 4PM to 5PM CEST. Actual downtime is planned to be up to 15 minutes.

Affected services:

  • OBS
  • Wikis
  • Jitsi
  • Forums
  • Matrix
  • Mailing Lists
  • Etherpad
  • Moodle
  • and some more

Please have a look at our status page for more details.

As usual, the openSUSE heroes can be reached via IRC (irc.opensuse.org/#opensuse-admin) or (with delayl Email (admin@opensuse.org).

Thank you, SUSE QE

Added by lrupp about 3 years ago

Some here might not know it, but some teams from the 'SUSE Quality Engineering Linux Systems Group' use the Redmine installation here at https://progress.opensuse.org/ to track the results of the test automation for openSUSE products. Especially openQA feature requests are tracked and coordinated here.

As the plain Redmine installation does not provide all wanted features, we included the "Redmine Agile plugin" from RedmineUP since a while now. Luckily the free version of the plugin already provided nearly 90% of the requested additional features. So everybody was happy and we could run this service without problems. But today, we got some money to buy the PRO version of the plugin - which we happily did :-)

There is another plugin, named Checklist, for which we also got the GO to order the PRO version. Both plugins are now up and running on our instance here - and all projects can make use of the additional features.

We like to thank SUSE QE for their sponsoring. And we also like to thank RedmineUP for providing these (and more) plugins to the community as free and PRO versions. We are happy to be able to donate something back for your work on these plugins. Keep up with the good work!

Thank you, SonarSource (1 comment)

Added by lrupp about 3 years ago

There are times, when keeping your system up-to date does not help you against vulnerabilities. During these times, you want to have your servers and applications hardened as good as possible - including good Apparmor profiles. But even then, something bad can easily happen - and it's very good to see that others take care. Especially if these others are professionals, that take care for you, even if you did not ask them directly.

Tuesday, 2021-08-31, was such a day for our openSUSE infrastructure status page: SonarSource reported to us a pre-auth remote code execution at the https://status.opensuse.org/api/v1/incidents endpoint.

SonarSource, equally driven by studying and understanding real-world vulnerabilities, is trying to help the open-source community to secure their projects. They disclosed vulnerabilities in the open-source status page software Cachet - and informed us directly - that our running version is vulnerable to CVE-2021-39165. Turned out that the Cachet upstream project is meanwhile seen as dead - at least it went out of support by their original maintainers since a while. It went into this unsupported state unnoticed by us - and potentially also unnoticed by many others. A problem, that many other, dead open source projects sadly share.

Thankfully, the openSUSE Security team (well known as first contact for security issues) as well as Christian (as one of our glorious openSUSE heroes) reacted quick and professional:

  • SonarSource informed our Security team 2021-08-31, 15:39
  • Our Security team opened a ticket for us just two hours later, at 2021-08-31, 17:08
  • Already one hour later, at 2021-08-31 18:29, Christian deployed a first hot-fix on our instances (Note: the original admin of the systems was on vacation)
  • 2021-08-31 at 23:35, Christian already provided a collection of suspicious requests to the affected URL
  • Meanwhile, there was a fix provided in a forked Github repository, which was applied to our installations one day later, 2021-09-01. This made our installations secure again (cross-checked by our Security Team and SonarSource). A response time of one day, even if the original upstream of a project is not available any longer - and the original admin of a system is on vacation! :-)
  • ...and we started a long analysis of the "what" and "when"...
  • In the end, we identified 6 requests from one suspicious IP, which we couldn't assign to someone we know. So we decided to distrust our installations. There might be a successful attack, even if we could not find any further evidence on the installed system (maybe thanks to the Apparmor profile?) or in the database. BUT: an attacker could have extracted user account data.
  • The user accounts of the Cachet application are only used to inform our users about any infrastructure incident. An attacker might be able to log in and report fake incidents - or send out Emails to those, who subscribed to incident reports or updates. Something we don't like to see. Luckily, these accounts are in no way connected to the normal user accounts. They just existed on these systems, for exactly one purpose: informing our users.
  • As result, we informed all users of the status.opensuse.org instances that they should change their password on a new system, setup from scratch. This new system is now deployed and in production, while the image of the old system is still available for further investigation.

Big kudos to Thomas Chauchefoin (SonarSource), Gianluca Gabrielli and Marcus Meissner (openSUSE Security Team) and Christian Boltz (openSUSE Heroes) for all their work, their good cooperation and quick reactions!

Upgrading to the next PostgreSQL version (1 comment)

Added by lrupp over 3 years ago

Time passes by so quickly: we installed our PostgreSQL cluster around 2008. At least, this was the time of the first public MirrorBrain release 2.2, which was the reason to run a PostgreSQL installation for openSUSE. But MirrorBrain (and therefor the PostgreSQL cluster behind it) is way older. So maybe it's fair to say that MirrorBrain started with openSUSE in 2005...?

Anyway: if you maintain a database for such a long time, you don't want to loose data. Downtimes are also not a good idea, but that's why we have a cluster, right?

While the MirrorBrain database is currently still the biggest one (>105GB in size and ~120 million entries alone in the table listing the files on the mirrors), our new services like Matrix, Mailman3, Gitlab, Pagure, lnt or Weblate are also not that small any more. All together use currently 142GB.

We already upgraded our database multiple times now (starting with version 7. in the past). But this time, we decided to try a major jump from PostgreSQL 11 to 13, without any step in between.

So how do we handle an upgrade of a PostgreSQL database? - In general, we just follow the documentation from upstream - only adjusting the values to our local setup:

Local setup details

  • Our configuration files are stored in /etc/postgresql/ - and symlinked into the current data directory. This makes it not only easier for us to have them in a general backup, we also set their file ownership to root:postgres - editable just by root and readable just for the postgres group (file permissions: 0640).
  • Below the generic data directory for PostgreSQL on openSUSE (/var/lib/pgsql), we have "data" directories for each version: data11 for the currently used PostgreSQL 11 version.
  • A Symlink /var/lib/pgsql/data points to the currently active database directory (data11 in the beginning)

Step-by-Step

Preparation

First let us set up some shell variables that we will use through out the steps. As we need these variables multiple times as user ‘root’ and user ‘postgres’ later, let’s place them into a file that we can refer to (source) later…

    cat > /tmp/postgresql_update << EOL
    export FROM_VERSION=11
    export TO_VERSION=13
    export DATA_BASEDIR="/var/lib/pgsql/"
    export BIN_BASEDIR="/usr/lib/postgresql"
    EOL

Note: DATA_BASEDIR you can get from the currently running postgresl instance with: ps aufx | grep '^postgres.* -D'

Don’t forget to source the file with the variables in the steps below.

Install new RPMs

Install the new binaries in parallel to the old ones (find out, which ones you need either via rpm or zypper):

    source /tmp/postgresql_update
    zypper in $(rpmqpack | grep "^postgresql${FROM_VERSION}" | sed -e "s|${FROM_VERSION}|${TO_VERSION}|g")

Initialize the new version

Now change into the database directory and create a new sub-directory for the migration:

    su - postgres
    source /tmp/postgresql_update
    cd ${DATA_BASEDIR}
    install -d -m 0700 -o postgres -g postgres data${TO_VERSION}
    cd ${DATA_BASEDIR}/data${TO_VERSION}
    ${BIN_BASEDIR}/bin/initdb .

For the exact parameters for the initdb call, you can search the shell history of the last run of initdb. But we go with the standard setup above.

You should end up in a completely independent, fresh and clean PostgreSQL data directory.

Now start to backup the new config files and create Symlinks to the current ones. It’s recommended to diff the old with the new config files and have a close look at the logs during the first starts. Worst case: the new server won’t start with old settings at all. But this can be found in the log files.

    su - postgres
    source /tmp/postgresql_update
    cd ${DATA_BASEDIR}/data${TO_VERSION}

    for i in  pg_hba.conf pg_ident.conf postgresql.conf postgresql.auto.conf postgresql.replication.conf ; do 
     old $i
     ln -s /etc/postgresql/$i .; 
     # diff $i $i-$(date +"%Y%m%d")
    done

Downtime ahead: do the migration

Next step is to finally do the job - this includes a downtime of the database!

    rcpostgresql stop

    su - postgres
    source /tmp/postgresql_update
    pg_upgrade --link             \
     --old-bindir="${BIN_BASEDIR}${FROM_VERSION}/bin"     \
     --new-bindir="${BIN_BASEDIR}${TO_VERSION}/bin"       \
     --old-datadir="${DATA_BASEDIR}/data${FROM_VERSION}/" \
     --new-datadir="${DATA_BASEDIR}/data${TO_VERSION}/"

The --link option is very important, if you want to have a short downtime:

--link                    link instead of copying files to new cluster

In our case, the operation above took ~20 minutes.

Hopefully you end up with something like:

    [...]
    Upgrade Complete
    ----------------
    Optimizer statistics are not transferred by pg_upgrade so,
    once you start the new server, consider running:
        ./analyze_new_cluster.sh

    Running this script will delete the old cluster's data files:
        ./delete_old_cluster.sh

Switch to the new PostgreSQL version

Switch to the new database directory. In our case, we prefer a Symlink, which points to the right directory:

    source /tmp/postgresql_update
    cd ${DATA_BASEDIR}
    ln -fs data${TO_VERSION} data

As alternative, you can switch the database directory by editing the configuration in /etc/sysconfig/postgresql:

    source /tmp/postgresql_update
    echo "POSTGRES_DATADIR='${DATA_BASEDIR}/data${TO_VERSION}'" >> /etc/sysconfig/postgresql

(Maybe you want to edit the file directly instead and set the correct values right at the point.) The prefix in the file should match the ${DATA_BASEDIR} variable.

Start the new server

    systemctl start postgresql

Cleanup

Postgres created some scripts in the folder where the pg_upgrade started. Either execute these scripts (as postgres user) directly or use the following commands:

    sudo -i -u postgres
    source /tmp/postgresql_update
    ${BIN_BASEDIR}${TO_VERSION}/bin/vacuumdb \
     --all \
     --analyze-in-stages
    ${BIN_BASEDIR}${TO_VERSION}/bin/reindexdb \
     --all \
     --concurrently

Please note that the two commands above have influence on the performance of your server. When you execute them, your database might become less responsive (up to not responsive at all). So you might want to use a maintenance window for them. On the other side, your new database server will perform much better once you executed these commands. So don't wait too long.

Check everything

Now it might be a perfect time to check monitoring, database access from applications and such. After that, you might remove the old database directory and de-install the old binaries as well together with an rm /tmp/postgresql_update.

But in general, you can mark this migration as finished.

Playing along with NFTables (3 comments)

Added by lrupp almost 4 years ago

By default, openSUSE Leap 15.x is using the firewalld firewall implementation (and the firewalld backend is using iptables under the hood).

But since a while, openSUSE also has nftables support available - but neither YaST nor other special tooling is currently configured to directly support it. But we have some machines in our infrastructure, that are neither straight forward desktop machines nor do they idle most of the time. So let's try out how good we are at trying out and testing new things and use one of our central administrative machines: the VPN gateway, which gives all openSUSE heroes access to the internal world of the openSUSE infrastructure.

This machine is already a bit special:

  • The "external" interface holds the connection to the internet
  • The "private" interface is inside the openSUSE heroes private network
  • We run openVPN with tun devices (one for udp and one for tcp) to allow the openSUSE heroes to connect via a personal certificate + their user credentials
  • In addition, we run wireguard to connect the private networks in Provo and Nuremberg (at our Sponsors) together
  • And before we forget: our VPN gateway is not only a VPN gateway: it is also used as gateway to the internet for all internal machines, allowing only 'pre-known traffic' destinations

All this makes the firewall setup a little bit more complicated.

BTW: naming your interfaces by giving them explicit names like "external" or "private", like in our example, has a huge benefit, if you play along with services or firewalls. Just have a look in /etc/udev/rules.d/70-persistent-net.rules once your devices are up and rename them according to your needs (you can also use YaST for this). But remember to also check/rename the interfaces in /etc/sysconfig/network/ifcfg-* to use the same name before rebooting your machine. Otherwise your end up in a non-working network setup.

Let's have a short look at the area we are talking about:

{width: 80%}openSUSE Heroes gateway

As you hopefully notice, none of the services on the community side is affected. There we have standard (iptables) based firewalls and use proxies to forward user requests to the right server.

On the openSUSE hero side, we exchanged the old SuSEfirewall2 based setup with a new one based on nftables.

There are a couple of reasons that influenced us in switching over to nftables:

  • the old SuSEfirewall2 worked, but generated a huge iptables list on our machine in question
  • using ipsets or variables with SuSEfirewall2 was doable, but not an easy task
  • we ran into some problems with NAT and Masquerading using firewalld as frontend
  • Salt is another interesting field:
    • Salt'ing SuSEfirewall2 by deploying some files on a machine is always possible, but not really straight forward
    • there is no Salt module for SuSEfirewall2 (and there will probably never be one)
    • there are Salt modules for firewalld and nftables, both on nearly the same level
  • nftables is integrated since a while in the kernel and should replace all the *tables modules long term. So why not jumping directly to it, as we (as admins) do not use GUI tools like YaST or firewalld-gui anyway?

So what are the major advantages?

  1. Sets are part of the core functionality. You can have sets of ports, interface names, and address ranges. No more ipset. No more multiport. ip daddr { 1.1.1.1, 1.0.0.1 } tcp dport { dns, https } oifname { "external", "wg_vpn1" } accept; This means you can have very compact firewall sets to cover a lot of cases with a few rules.
  2. No more extra rules for logging. Only turn on counter where you need it. counter log prefix "[nftables] forward reject " reject
  3. You can cover IPv4 and IPv6 with a single ruleset when using table inet, but you can have per IP protocol tables as well. And sometimes even need them e.g. for postrouting.

Starting from scratch

A very basic /etc/nftables.conf would look something like this

#!/usr/sbin/nft -f

flush ruleset

# This matches IPv4 and IPv6
table inet filter {
    # chain names are up to you.
    # what part of the traffic they cover, 
    # depends on the type line.
    chain input {
        type filter hook input priority 0; policy accept;
    }
    chain forward {
        type filter hook forward priority 0; policy accept;
    }
    chain output {
        type filter hook output priority 0; policy accept;
    }
}

But so far we did not stop or allow any traffic. Well actually we let everything in and out now because all chains have the policy accept.

#!/usr/sbin/nft -f

flush ruleset

table inet filter {
    chain base_checks {
        ## another set, this time for connection tracking states.
        # allow established/related connections
        ct state {established, related} accept;

        # early drop of invalid connections
        ct state invalid drop;
    }

    chain input {
        type filter hook input priority 0; policy drop;

        # allow from loopback
        iif "lo" accept;

        jump base_checks;

        # allow icmp and igmp
        ip6 nexthdr icmpv6 icmpv6 type { echo-request, echo-reply, packet-too-big, time-exceeded, parameter-problem, destination-unreachable, packet-too-big, mld-listener-query, mld-listener-report, mld-listener-reduction, nd-router-solicit, nd-router-advert, nd-neighbor-solicit, nd-neighbor-advert, ind-neighbor-solicit, ind-neighbor-advert, mld2-listener-report } accept;
        ip protocol icmp icmp type { echo-request, echo-reply, destination-unreachable, router-solicitation, router-advertisement, time-exceeded, parameter-problem } accept;
        ip protocol igmp accept;

        # for testing reject with logging
        counter log prefix "[nftables] input reject " reject;
    }
    chain forward {
        type filter hook forward priority 0; policy accept;
    }
    chain output {
        type filter hook output priority 0; policy accept;
    }
}

You can activate the configuration with nft --file nftables.conf, but do NOT do this on a remote machine. It is also a good habit to run nft --check --file nftables.conf before actually loading the file to catch syntax errors.

So what did we change?

  1. most importantly we changed the policy of the chain to drop and added a reject rule at the end. So nothing gets in right now.
  2. We allow all traffic on the localhost interface.
  3. The base_checks chain handles all packets related to established connections. This makes sure that incoming packets for outgoing connections get through.
  4. We allowed important ICMP/IGMP packets. Again this is using a set and the type names and not some crtyptic numbers. YAY for readability.

Now if someome tries to do a ssh connect to our machine, we will see:

[nftables] input reject IN=enp1s0 OUT= MAC=52:54:00:4c:51:6c:52:54:00:73:a1:57:08:00 SRC=172.16.16.2 DST=172.16.16.30 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=22652 DF PROTO=TCP SPT=55574 DPT=22 WINDOW=64240 RES=0x00 SYN URGP=0 

and nft list ruleset will show us

counter packets 1 bytes 60 log prefix "[nftables] input reject " reject

So we are secure now. Though maybe allowing SSH back in would be nice. You know just in case.
We have 2 options now. Option 1 would be to insert the following line before our reject line.

tcp dport 22 accept;

But did we mention already that we have sets and that they are great? Especially great if we need the same list of ports/ip ranges/interface names in multiple places?

We have 2 ways to define sets:

define wanted_tcp_ports {
  22,
}

Yes the trailing comma is ok. And it makes adding elements to the list easier. So we do them all the time.
This will change our rule above to

tcp dport $wanted_tcp_ports accept;

If we load the config file and run nft list ruleset, we will see:

tcp dport { 22 } accept

But there is actually a slightly better way to do this:

    set wanted_tcp_ports {
        type inet_service; flags interval;
        elements = {
           ssh
        }
    }

That way our firewall rule becomes:

tcp dport @wanted_tcp_ports accept;

And if we dump our firewall with nft list ruleset afterwards it will still be shown as @wanted_tcp_ports and not have variable replaced with the value.
While this is great already, the 2nd syntax actually has one more advantage.

$ nft add element inet filter wanted_tcp_ports \{ 443 \}

Now our wanted_tcp_ports list will allow port 22 and 443.
This is of course often more useful if we use it with IP addresses.

    set fail2ban_hosts {
        type ipv4_addr; flags interval;
        elements = {
           192.168.0.0/24
        }       
    }

Let us append some elements to that set too.

$ nft add element inet filter fail2ban_hosts \{ 192.168.254.255, 192.168.253.0/24 \}
$ nft list ruleset

... and we get ...

        set fail2ban_hosts {
                type ipv4_addr
                flags interval
                elements = { 192.168.0.0/24, 192.168.253.0/24,
                             192.168.254.255 }
        }

Now we could change fail2ban to append elements to the set instead of creating a new rule for each new machine it wants to block. Fewer rules. Faster processing.

But with reloading the firewall we dropped port 443 from the port list again. Oops.
Though ... if you are happy with the rules. You can just run

$ nft list ruleset > nftables.conf

When you are using all the sets instead of the variables, all your firewall rules will still look nice.

Our complete firewall looks like

table inet filter {
        set wanted_tcp_ports {
                type inet_service
                flags interval
                elements = { 22, 443 }
        }

        set fail2ban_hosts {
                type ipv4_addr
                flags interval
                elements = { 192.168.0.0/24, 192.168.253.0/24,
                             192.168.254.255 }
        }

        chain base_checks {
                ct state { established, related } accept
                ct state invalid drop
        }

        chain input {
                type filter hook input priority filter; policy drop;
                iif "lo" accept
                jump base_checks
                ip6 nexthdr ipv6-icmp icmpv6 type { destination-unreachable, packet-too-big, time-exceeded, parameter-problem, echo-request, echo-reply, mld-listener-query, mld-listener-report, mld-listener-done, nd-router-solicit, nd-router-advert, nd-neighbor-solicit, nd-neighbor-advert, ind-neighbor-solicit, ind-neighbor-advert, mld2-listener-report } accept
                ip protocol icmp icmp type { echo-reply, destination-unreachable, echo-request, router-advertisement, router-solicitation, time-exceeded, parameter-problem } accept
                ip protocol igmp accept
                tcp dport @wanted_tcp_ports accept
                counter packets 12 bytes 828 log prefix "[nftables] input reject " reject
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
        }

        chain output {
                type filter hook output priority filter; policy accept;
        }
}

For more see the nftables wiki

IPv6 support for machines in US region

Added by lrupp almost 4 years ago

Today we reached a new milestone: all openSUSE services around the world now support IPv6 natively. The last set of machines in Provo are equipped with IPv6 addresses since today. IPv6 was missing for those machines since the renumbering (which was needed because of the carve out of SUSE from Microfocus). Thanks to one of our providers, who now reserved and routed a whole /48-IPv6 network for us.

With this, we can also run all our DNS servers with IPv6 (and they do not only have a IPv6 address, but all our external DNS entries for the opensuse.org domain should now contain IPv4 and IPv6 addresses as well. Don't worry, you did not miss much. The Dual-Stack (IPv4 and IPv6) is the case for all services in Germany since a long, long time already - and we even had it for the machines in US for a long time, before SUSE switched the provider. But this finally brings us to the same level on all locations!

Playing with Etherpad-lite

Added by lrupp almost 4 years ago

When updating to the latest etherpad-lite version 1.8.7 (including quite some bug fixes), we also revisuted the currently installed plugins and updated them to their latest version as well, as usual.

To get some impressions about the usage of our Etherpad instance, we also enabled the ether-o-meter plugin now, which is now producing some nice live graphs and statistics here: https://etherpad.opensuse.org/metrics

We also enabled some additional styles now and hope this makes the usage of Etherpad even more fun and productive for you. If you want to have some more modules enabled (or just want to say "hello"), feel free to contact us! Either via an Email to admin@opensuse.org or by reaching out for us at irc.opensuse.org channel #opensuse-admin.

Upgraded matomo

Added by lrupp almost 4 years ago

As usual, we keep our infrastructure up-to-date. While this is easy for the base system ('zypper patch', you know? ;-) most of the applications need special handling. Normally, we package them as well in our OBS repositories. But this often means that we need to maintain them on our own. At least: packaging them allows us to track them easily and integrates them in the normal workflow. All we have to do to keep them updated on the production machines is a 'zypper up' (which updates all packages with higher version and/or release number - while a 'zypper patch' only updates packages, which have an official patchinfo with them).

Upgrading Matomo from version 3.14.1 to 4.1.1 was not that easy: simply replacing the files in the package was not enough. Upstream changed so much in the database structure, that the standard calls in the post installation script (which normally update the database as well during package update) were just not enough. As this is (hopefully) a one-time effort, we run some steps manually from the commandline, which took ~20hours. After that, our DB was updated, cleaned up and ready to go again.

Summary: Being an openSUSE hero includes not only being an infrastructure Guru with automation and scripting skills. Often enough, you need some packaging expertise as well. ...and sometimes even this is not enough!

(1-10/52)

Also available in: Atom