2021-11-02 18:00 UTC: openSUSE Heroes meeting November 2021
When: 2021-11-02 18:00 UTC / 20:00 CEST
Who: The openSUSE Heroes team and everybody else!
- Questions and answers from the community
- status reports about everything
- review old tickets
- Contributor Agreement
- nala2.infra.opensuse.org (aka mirrordb4.infra.opensuse.org)
Connected status machines to private network¶
status3 and status2 are now connected to the internal network. This should make accessing them easier (incl. Salt).
TODO: re-setup status1.infra.opensuse.org on the q.beyond machine.
Newest version: 4.5.0
Re-enabled DNS CAA¶
Re-enabled DNS CAA for opensuse.org. We had it enabled in the past, but it got lost during the migration from FreeIPA to PowerDNS.
> dig +short -t caa opensuse.org 0 iodef "mailto:firstname.lastname@example.org" 128 issue "letsencrypt.org"
This means, that we only trust Let's encrypt certificates for the opensuse.org domain. We would need to change this, if we accept Certificates from other Certification Authorities for any opensuse.org DNS entry.
Fixed and Upgraded Galera cluster¶
The outage at 2021-10-20 was a result of two problems:
Nodes not in sync¶
- Changes in the configuration blocked the flushing of logs -> this "influenced" the synchronization between the nodes significantly
- Once this setting was reverted, the nodes began to synchronize themselves again - but with a huge delay of ~2 months. As we had a DB problem around this time, the config settings might be a left over from this time.
- Sadly, the nodes were not able to get their re-syncs done successfully. Problem No.2 might - or might not - be the reason for this. This left us more or less with just one single node with current data: galera2
- turned out, that the (xfs) filesystem on galera2 is broken (maybe also a result of the problem 2 months before?)
As result, the whole cluster was somehow broken. While we initially tried an open-heart operation, we realized after some time that this would in the end just take ages...
In the end, we decided to restore a database dump, that we extracted (means: the dump of the node with the broken filesystem took a while to verify with a dump from another out-of-sync node). But wait: if we need to start from scratch anyway, why not using this to do the needed version upgrade of the MariaDB?
...and so we ended up in:
- installing new RPMs (that were updated and published just in the same minute, thanks to Darix)
- bootstrapping one node
- restoring the DB dump
- adding the other two nodes (one after the other)
The whole incident (from the first information that the DB is down to a working and running cluster) took 8:30h.
Thanks to everyone involved in this!
Newest version: 5.1.16
- Checklist changed from [ ] Questions and answers from the community, [ ] status reports about everything, [ ] review old tickets to [ ] Questions and answers from the community, [ ] status reports about everything, [ ] review old tickets, [ ] Contributor Agreement
openSUSE Infrastructure Contributor Agreement¶
The following text should be an entry point for discussions. Discussions inside the openSUSE heroes, inside SUSE - and between both.
URLs that might be helpful in the discussions:
At the moment, all work on the openSUSE heroes infrastructure is voluntary. While SUSE is providing some resources (especially: hardware, storage and network resources), the majority of the openSUSE infrastructure is meanwhile driven by the openSUSE heroes: a group of volunteers.
Even if some of these volunteers are currently SUSE employees, the work they are doing inside the openSUSE heroes is been seen as voluntary work, not mandated by SUSE as company.
On the other side, SUSE still has some legal responsibility. Even more: SUSE has a environmental, cultural and historical responsibility to take care about the openSUSE community.
Included in this responsibility is the right to decide about the usage and non-usage of the openSUSE infrastructure. But with less and less SUSE employees involved into the openSUSE infrastructure, there is a high potential risk that SUSE will somehow loose control over it. At the moment, there is no special agreement/contract between SUSE and the openSUSE heroes to be always friendly and do nothing problematic with the given permissions. For SUSE employees, there is a signed contract that does not allow the employees to do something evil/harmful against SUSE/openSUSE. But today, with community members having access to the MX and DNS servers of openSUSE, there is a rising risk for the company. A risk, that can only be solved by some Contributor Agreement.
We - as openSUSE heroes - should make clear what we expect from SUSE as company. We should also agree on something like: "don't be evil" - for our own safety.
Topics, that should be provided by SUSE:¶
- handling of critical account data
- handling of GDPR requests
- sponsoring of needed infrastructure
- 24/7 remote-hands, if needed
- fixed contact persons for the openSUSE heroes
- access to the systems providing services driven by openSUSE heroes
- a dedicated account for Amazon Web Services
- hosting of servers in multiple data centers - for redundancy reasons
- a reasonable amount of IPv4 and IPv6 address ranges in the data centers
- access to the DNS Registrar
- backup capabilities
Topics, that can be covered by openSUSE heroes:¶
- general setup of the services
- basic operating system maintenance
- private network setups between the data centers
- contact persons for questions from SUSE
Topics, that should be provided by volunteers of the services:¶
- general documentation about the services
- general maintenance of the services
- security response for the provided services
- Apparmor, SELinux or similar security hardening of the services
- Contact data, in case of a service emergency