2019-04-02 heroes meeting [20:00:37] i'll join later, only just about to eat some pizza. [20:00:50] will probably be 15min [20:02:11] no problem ;-) [20:02:11] okay [20:02:14] so can we start? [20:02:33] yes [20:02:47] the (usual) topics are on https://progress.opensuse.org/issues/48728 [20:03:13] does someone from the community dare to ask any question? ;-) [20:03:50] I wanted to ask tuanpembual if there is any progress with progress :) [20:04:54] apparently he's not here [20:05:40] some days ago he mentioned http://progress.infra.opensuse.org:3000 which is currently a redmine without plugins [20:05:54] ah cool [20:06:11] did he mention if the db is local or if he is using our cluster? [20:06:29] no idea [20:06:36] okay [20:08:01] since we already slipped into the status reports, let's continue with them ;-) [20:08:10] tampakrap: anything from you? [20:08:25] so thomic was in Prague last week [20:08:37] we did a huge writeup of all the services that are in the atreju cluster [20:08:43] opensuse plus suse-dmz [20:09:08] we found even VMs that are idle so we removed them after getting approval from the maintainers [20:09:11] and more are coming [20:09:42] and also we wrote down a lot of notes and procedures regarding services [20:09:55] so we reduced the bus factor :) [20:10:28] Idle VMs are a good point - narwal, narwal2 and narwal3 should be idle since narwal[5-7] are online (assuming they don't do anything I'm not aware of, so please double-check) [20:10:28] most of the VMs are in salt, we wrote down the few ones that are not [20:11:05] file a ticket plz so we don't forget [20:11:08] ok [20:11:16] that's it pretty much [20:12:53] will you put the openSUSE part of that notes into a somewhat public space (for example the heroes wiki or into pillar/id/*)? [20:13:18] sure, some of them was already added to the wiki [20:14:02] eg of adding vpn account [20:14:22] i'n back [20:15:09] yes, I've seen that [20:15:23] for the machine-specific information, I'd prefer pillar/id/* to avoid spreading stuff over multiple places (and to force people to have a checkout of our salt repo *g*) [20:15:56] (and yes, I know that some VMs already have a (probably outdated) wiki page) [20:16:05] yes we agree [20:18:34] pjessen: any status updates from you? [20:18:48] i'm trying to think if I have anything in particular. [20:19:20] well, AFAIK you had some "fun" with rsync.o.o in the last days. Is it in sync again? [20:19:24] no updates. I seem to be spending a lot of time on mirroring lately. [20:19:37] thomic is really taking care of widehat [20:19:50] ah, ok [20:19:57] I think it is up to daste now. [20:20:26] :-) [20:20:55] it was quite slow, and the 15.0 update issue caused a lot more traffic [20:21:09] which slowed things down even more. [20:21:59] indeed, syncing the whole 15.0 update repo to all mirrors (because of the path change) is a good way to generate traffic ;-) [20:22:22] someone has to test the internet [20:22:43] lol [20:23:18] I've also put some of the push mirrors on ipv6, that's working well. [20:23:25] s/well/fast/ [20:23:46] does v6 have bigger cables? ;-) [20:24:06] i know it sounds silly, but sometimes you do wonder. [20:24:38] I've seen strange things more than once, so I'm not too surprised ;-) [20:25:08] I also have some status updates: [20:25:45] static.opensuse.org and some other static domains are now served by narwal[5-7] which are fully salted [20:25:59] there's also a cronjob that pulls the content from github hourly [20:26:21] (well, as soon as someone reviews my merge request to get the script working ;-) [20:26:42] which merge request? [20:27:00] it's in my todo to go through all of them this month either way [20:27:10] I'd have to check, but I'm quite sure you'll find it in the list ;-) [20:27:15] (it's one of the newest ones) [20:27:15] okay [20:28:03] today I installed kernel updates on the machines I maintain, and noticed that several other machines also have pending kernel updates [20:28:11] so everybody please install the latest kernel ;-) [20:29:10] oh, and you might want to have a look at https://build.opensuse.org/project/show/home:cboltz:infra (especially the second page) [20:29:41] checking [20:29:56] the 1_31 packages? [20:29:59] yes [20:30:15] they are the base for updating the wiki [20:31:24] currently I'm blocked by elasticsearch because MediaWiki expects a specific version, but I'm in contact with Klaus who will help me to package the version we need [20:32:13] when that is done, I'll need a new VM (water3) because AFAIK you can't run two elasticsearch versions in parallel [20:32:34] I see [20:33:37] maybe we should try to create a 15.1 beta JeOS image so that we don't need to update it in a few weeks ;-) [20:34:07] also true [20:34:22] ... which reminds me that we have lots of 42.3 VMs that need an upgrade to 15.x [20:34:36] let's hope we don't have too many Requires: php5 on them ;-) [20:34:52] which ones are left? [20:35:21] basically all VMs with a pending kernel update ;-) [20:36:07] + sarabi + riesling + status* + water (but updating water is superfluous since it will be replaced with water3) [20:37:45] we also have some SLE11 left, which also need an update [20:37:50] or better a salted replacement [20:38:29] (have fun finding out what exactly they do, I'm quite sure their documentation is as old as SLE11, or they are undocumented) [20:38:36] yes we have them also written down with thomic [20:38:46] all the SLE11 and all the 42.3 that need upgrade or replacement [20:39:18] we'll see if you found out all the tasks they run ;-) [20:41:06] * cboltz expects quite some "oh, that VM also does that?" surprises [20:41:43] I just did some statistics (thanks to salt grains.get osrelease) [20:42:10] we have 6 SLE11 left (5 if you ignore the to-be-shutdown narwal) [20:42:15] 20 VMs run 42.3 [20:42:20] and 13 run 15.0 [20:42:57] does someone else have a status report? [20:43:26] is anybody else here? [20:43:45] there are a few more [20:44:06] the question is if they are online or if they are near their computer ;-) [20:44:47] that are not on salt [20:45:23] that makes counting them with grains.get hard ;-) [20:45:57] oh, speaking about grains, here's a crazy idea: [20:46:14] can/should we introduce a grain to indicate that a VM is "reboot-safe"? [20:46:33] in practise, this would allow everybody to install kernel updates and to reboot the VM [20:46:44] without a (big) risk of breaking something [20:46:54] not a bad idea. [20:47:56] yes I like it [20:48:09] because not everyone knows all services that are running in a VM [20:49:13] ok, so how should we call that grain? [20:49:28] I'd propose something like "reboot_allowed" or "may_reboot" [20:49:54] with a value of "yes" or "no" [20:49:59] reboot_adhoc ? [20:51:14] all of those names are fine to me, better discuss it in an MR though :) [20:51:31] no [20:51:35] let's decide on the name _now_ [20:51:42] discussing it in the MR will delay it too much [20:52:29] My vote: reboot_safe [20:53:25] nice one [20:53:44] pjessen, mstroeder - your proposals look better than mine ;-) [20:53:51] reboot_safe for me as well then [20:53:57] I'd tend to reboot_safe because it best describes what we want [20:54:14] actually it was Christian's suggestion. I only replaced - by _ in "reboot-safe". [20:54:55] hi all [20:54:59] :-) [20:55:04] sorry for late [20:55:18] hi tuanpembual [20:55:19] missing different time [20:55:21] no problem ;-) [20:55:30] *scroll up [20:55:57] europe will probably stop switching summer/winter time in 2021 ;-) [20:56:27] back to the grains for a second - looks like we'll use reboot_safe: yes :-) [20:56:58] update from me. [20:56:59] you can also add reboot_safe: no if needed, but a) I'd call that a bug and b) please add a comment about the reason [20:57:35] I like also the comment [20:58:33] I use local db on progress. [20:58:59] gotta take a break, the cat's brough in a mouse. my wife doesn't like mice. [20:59:28] pjessen: as long as she likes the mouse next to your computer... ;-) [21:00:54] trackball ..... [21:01:00] moin [21:01:18] hi thomic [21:01:27] sorry. worked like 14h today [21:01:31] and i'm really tired [21:01:37] any questions for me? [21:01:48] do you have any status updates? [21:02:00] (for example, is rsync.o.o in sync again?) [21:02:32] well... let's say it like this - as good as it can [21:02:42] it gets push from stage.o.o again since last week [21:02:45] which is a major step [21:02:55] but I'm still running the full sync over it from stage.o.o [21:02:55] but [21:03:07] BuildOPS people kill rsync processes from time-to-time atm [21:03:18] i'm watching and caring about rsync.o.o almost every day [21:03:25] since it failed [21:03:32] as we had several outages since then [21:03:53] now I consider it being in a sane state, but the problem is, that it only has 2x1GBit Ethernet [21:04:02] and verrry slow spinning disks [21:04:11] most of the time it tells me STAT D :D [21:04:29] ;-) [21:04:33] what caused the outages? [21:04:38] but still we are receiving ~150MB/s and sending out ~200-300MB/s [21:04:44] well ... some times disk full [21:04:50] back then when I exchanged disks [21:05:01] we had 14GB on pontifex [21:05:04] now it's around 17 [21:05:09] we have 19TB [21:05:14] s/GB/TB/g [21:05:20] so [21:05:36] we obviously need a solution for this whole mess in first-line monitoring [21:05:50] If I would have the time at the moment I would try to adress this [21:06:04] but the best would be to have 4-8 servers world wide [21:06:08] which allow push [21:06:11] which are managed by us [21:06:20] and have around 25TB each of storage for now [21:06:30] which would cost some money [21:06:38] but would build a reliable network of mirrors [21:06:53] even if "rsync.o.o" would be down, not the whole world would start crying [21:07:04] indeed [21:07:18] if we only would have somebody in the board who could push for that :D [21:07:50] cboltz: ^^ [21:07:59] actually I just wanted to ask if you can provide a hardware "inventory" and a wishlist of machines that need replacement or upgrades [21:08:09] the inventory is just waste [21:08:18] it's 5-10 yrs old hardware out of the suse stock [21:08:29] but hardware (like RAM and CPU) is not that important [21:08:41] SSDs instead of HDDs would be nice [21:08:45] and maybe 10GBit of Ethernet [21:08:55] bandwidth is the real issue [21:08:57] but where to get it with transit costs included ;) [21:09:11] pjessen: well... atm the disks are even not filling up 2x1GBit/s [21:09:23] really?? [21:09:25] because they are 4TB spinning disks in a RAID5 [21:09:40] I had to go low budget ... for the repair [21:09:43] as always [21:09:45] thought - run it from stage.o.o, but with bandwidth restriction. [21:10:01] wait .. stage.o.o has a proper storage backend :) [21:10:10] use traffic control to limit bandwidth used by rsync.o.o [21:10:21] I have one Interface for rsync.o.o [21:10:31] and an IP which is not shown public where stage.o.o pushes [21:10:49] I already thought about the "public IP only for download" [21:11:04] but still by the read/write we get on the disk, we can try to optimize [21:11:14] but nginx and rsync are fighting against eachother [21:11:20] plus writes from stage.o.o [21:11:48] so we are peaking around 550MBit/s I guess - which is not that bad for the mixed usage of spinning disks [21:11:48] yeah, random access on spinning rust doesn't perform well [21:11:57] syncing from rsync. can be slow, use "spare" bandiwdth [21:12:41] well we could think of disabling rsyncd pub modules for a while on rsync.o.o .. with a proper announcement [21:12:49] and maybe moving rsyncd to another machine [21:13:00] but all of this would be "intermediate" solutions [21:13:13] i would prefer to build a cool solution for the problem [21:13:43] another thought - why aren't these non-piblic mirrors just rsync'ing from the public mirrors? [21:15:07] the big ones usally have rsync too, on big pipes. [21:15:29] s/big/biiiig/ [21:16:00] pjessen: well maybe we should update the wikipage on this [21:16:04] but in-general- [21:16:10] if we write on our wikipage [21:16:16] please rsync from $freesponsor [21:16:26] the free sponsor are not so happy [21:16:32] with us redirecting the traffic [21:17:41] obviously we should ask them first ;-) bug I'm not sure if http traffic via mirrorbrain differs that much from rsync traffic [21:18:01] I'm think of having https://www.hetzner.de/dedicated-rootserver/sx62 [21:18:05] 4 times those [21:18:07] 2 in Finland [21:18:11] 2 in Germany [21:18:19] with 4x10TB just as RAID0 [21:18:24] if one of them dies - no problem [21:18:31] each of them has 1 GBit Uplink [21:18:48] enough to have some load balancing handled by mirrorbrain [21:20:06] Just my 2 cents - lets also look at which problem we are *actually* solving. It's about moving those 200-300MB/sec off to somewhere else. or just reduce the usage to something that can be handled by stage.o.o [21:20:28] so 1st stage.o.o can't handle it [21:20:47] after the oss-update desaster [21:20:57] stage.o.o was not even able to get the push traffic out [21:21:12] not speaking about rsync-pulls ... [21:21:14] we can reduce the rsync.o.o traffic to even less. [21:21:33] well the idea of having rsync.o.o [21:21:38] is having a first line mirror [21:21:49] which does not hit downloadcontent (pontifex) directly [21:21:52] for non-public mirrors [21:22:05] the main traffic I guess is HTTP [21:22:13] that ends up on widehat [21:22:19] so having 4 mirrors [21:22:26] that can handle this european traffic [21:22:33] and maybe 2 more in US and Asia [21:22:36] would be awesome [21:22:42] because even in disaster situations [21:22:46] we can fill those first [21:22:58] and move away traffic from stage.o.o and pontifex again [21:23:11] that is the most critical sit we have atm [21:23:19] if we release bigger chunks of updates [21:23:32] people complain about getting it too late, because widehat is not yet updated [21:23:39] or fully loaded [21:23:46] waiting with STAT D [21:24:11] huh, I missed majority of the meeting again :/ [21:24:28] any thoughts on pushing something like this forward? [21:25:09] Agree, it would be cool setup, but I cant help thinking "over-enginering". For the monthly cost of renting those 4 boxes at Hetzner, I can get a 10Gbit link, maybe two. [21:25:48] well [21:25:53] than let's have two and 10G [21:25:55] okay, consumer SLA, not business [21:25:59] :) but we need them somewhere [21:26:06] like with a lot of traffic [21:26:18] hetzner unfortunately does only allow traffic unlimited [21:26:19] on 1G [21:30:01] how much bandwidth do we currently use for pontifex and rsync.o.o? [21:30:18] (a rough number is good enough, no need for details) [21:30:36] cboltz: i can deliver those numbers tommorow [21:30:39] if somebody reminds me [21:30:40] :) [21:30:46] ok [21:30:48] I would have to take a look, I don't have any snmp on pontifex [21:31:00] well .. we have there mrtg [21:31:04] on those ports [21:31:05] ah [21:31:07] cool [21:31:15] ** inclusive monthly traffic for servers 10G uplink is 20TB. There is no bandwidth limitation. Overusage will be charged with € 1/TB. [21:31:20] says hetzner [21:31:27] just for discussion [21:31:30] or gimme 5min [21:31:35] will start my other laptop [21:32:23] 20 TB/month isn't much when hosting 19 TB [21:32:30] yay [21:32:39] i will provide numbers for this as well ... [21:40:19] guys, if nobody minds, it's enough for me for tonight. [21:40:32] i will send the links on the ml [21:40:38] as soon as i published them [21:40:42] Thanks, I'd like that. [21:40:56] pjessen: sure, have a nice evening [21:41:37] allright, nice discussion, talk later. [21:42:41] so, anything else? [21:42:59] hi tampakrap [21:43:04] well, in theory "review of old tickets", but I'd say it's late enough to skip that ;-) [21:43:13] any question for me? [21:43:31] hello tuanpembual [21:43:41] just as a note - we also have some (mostly old) tickets at https://bugzilla.opensuse.org/buglist.cgi?component=Infrastructure&list_id=11646919&product=openSUSE.org&resolution=--- [21:43:44] tuanpembual: did you use local mysql or our cluster? [21:44:02] I use local mysql [21:44:14] dont have access on cluster. [21:44:29] tuanpembual: okay I'll create a db for you and I'll put the credentials on the VM [21:44:43] noted. [21:44:51] cboltz: we should close them and redirect the people to the appropriate ticketing system (admin@o.o or github) [21:45:38] tampakrap: define "we" ;-) [21:45:46] What software is used as auth DNS server for opensuse.org? [21:45:46] * cboltz already closed some of them a few months ago [21:45:53] you, me, anyone that would like to do it [21:46:10] I also closed a few last month [21:46:15] at least the ones assigned to me [21:46:37] can I ask question about new progress.o.o? [21:46:47] yes, of course [21:47:08] mstroeder: I don't remember the name, maybe thomic does [21:47:13] tuanpembual: shoot [21:47:17] I need sugestion, which we use on sntp? [21:47:25] *for mail sender. [21:47:56] *which one. [21:48:34] postfix, and we have relay.infra.opensuse.org to do the relay [21:49:04] can I see the config? [21:49:43] or wiki I can read. [21:49:50] cboltz: I would close the complaints about font size too, everything that is important is moving into chameleon theme >:T [21:50:03] mstroeder: infobloxx [21:50:04] tuanpembual: you should be able to ssh relay.infra.opensuse.org as user and can read the postfix config there [21:50:10] cboltz: and fonts there are too big [21:50:12] is what we push against with powerdns [21:50:33] basically just send the mails to relay.infra.o.o port 25, no auth needed [21:50:43] noted cboltz [21:51:09] thomic: So you're using pdns as authorative DNS server or you plan to use it? [21:51:15] actually - IIRC our default setup includes postfix which relays to relay.infra.o.o, so sending to localhost 25 should also work [21:51:54] lcp: since you are the design expert, feel free to close these bugreports ;-) [21:52:22] cboltz: well, from my pov the compliant is that fonts are too black and too big >:D [21:52:58] I also remember complaints about fonts not beeing big and black enough ;-) [21:53:00] there is a rule of thumb that to make something visible, you should make stuff over visible to not strain the eyes [21:53:26] but on the other side there are people that are so blind they might need that strain to read anything [21:54:05] you can't win [21:54:40] I'd argue that you win if you are somewhere in the middle [21:54:55] and since we got complaints from both directions... ;-) [21:55:13] mstroeder: pdns have our wrapper scripts.. infobloxx is what holds the external zones atm [21:55:13] nope [21:55:23] there are no plans yet to change this yet [21:55:27] people believe black fonts are the magic cure to all their issues [21:57:53] tuanpembual: do you want also a dump of the production db? [22:00:13] thomic: I wondered where to eventually do DNSSEC zone signing. pdns is a hidden primary and infoblox gets updated via zone transfer? [22:00:53] mstroeder: correct... this infobloxx network is run by our "internet provider" atm [22:00:59] tampakrap: sure [22:01:02] who owns administers the domain [22:01:06] called "microfocus" [22:01:16] this will change.. but not now [22:01:20] more like in mid-future [22:01:54] It will help me more, thanks tampakrap [22:02:13] sure, tomorrow you'll have it [22:06:21] thomic: mid-future, huh? [22:07:16] cboltz: hm, piwik should be used for forums instead of google analytics [22:07:43] also would be nice to update piwik to its new name (I don't remember what it was atm :/) [22:07:54] matomo [22:08:21] agreed, but getting the forum change done while the forums are hosted in Provo will be funny[tm] [22:08:31] anything else or can we close the meeting? [22:09:24] cboltz: yeeeah, I hope moving to discourse and out of provo could happen at the same time [22:09:43] speaking of which https://github.com/openSUSE/chameleon-discourse/tree/master [22:10:10] I'm looking forward to the day we do this move :-) [22:11:09] cboltz: like with everything moved out of provo [22:12:30] nope from me. [22:16:30] cboltz / lcp: shall we close the meeting? [22:16:41] yeah, it's everything I got [22:18:22] staying up to date, thank you guys [22:19:38] thank you guys. [22:19:51] thank you all for joining [22:20:04] that was my last heroes meeting, I'll be active till the end of the month [22:20:07] so keep on rocking! [22:20:14] :'( [22:20:52] you'll still be allowed to join the meetings ;-) [22:21:59] true! [22:22:33] and hopefully Stiopa will too >:D [22:22:46] polonization of the openSUSE forces >:DDD [22:23:08] lcp, old times ;-) [22:28:44] for the last meeting of tampakrap some greek trivia I learned today by accident. Aristoteles Onassis was such an extrovert guy that he has his own Tie knot http://www.101knots.com/onassis-knot.html :D [22:28:49] named after him [22:29:03] not bad... [22:30:33] didn't like it [22:31:41] ok [22:31:42] gn8 [22:31:44] see ya all [22:31:56] good night! [22:32:06] hahaha [22:32:10] good night thomic!