2017-09-24 heroes meeting [19:58:58] hello [19:59:03] meeting time? [19:59:24] in 30 seconds ;-) [20:01:21] https://progress.opensuse.org/issues/25536 [20:02:46] can anybody "see" me ? [20:03:26] no ;-) [20:03:52] sure [20:04:09] puh - my IRC client seems to have problems... the people list is completely empty for me [20:04:30] But if at least someone see's me writing, I guess we can start with the meeting :-) [20:05:00] So giving some initial details: [20:05:01] we see you yes, and fyi there are 41 people in this channel [20:05:17] It's planned to shut down all services and machines at Friday afternoon (18:00 CEST), 2017-10-13 [20:05:52] So the maintenance company of the power transformers can start with their work early on Saturday morning [20:06:16] The hope is that everything goes well and they can finish their work latest around 16:00 [20:06:32] until then, the whole office building in Nuremberg will have no power [20:07:12] The SUSE-IT team will shut down the machines on Friday - and will be at the office around 15:00 to get an overview and start with their additional work once the power is back [20:07:59] That means NOT that they will power on all systems imediately, as there is other work scheduled, like moving Racks [20:08:20] but we can expect that the systems will go online again around Saturday evening/night [20:08:35] So hopefully on Sunday everything should work already [20:09:02] That's all about the planned schedule so far. [20:09:04] Any questions ? [20:09:14] all the times you mention are UTC or CEST? [20:09:23] CEST [20:09:51] Betreten Ada_Lovelace (~skriesch@p200300CDE3D5D56CE6A471FFFEE23641.dip0.t-ipconnect.de) hat diesen Kanal betreten. [20:10:08] Hi [20:10:10] So - if there are no questions about the background, I guess we can start with some initial questions [20:10:23] Hi Ada_Lovelace, you are here at the right time ;-) [20:10:39] okay [20:10:43] hi Ada_Lovelace [20:10:50] First of all: I guess we should coordinate the communication about the downtime [20:11:00] hi Ada_Lovelace! [20:11:05] http://paste.opensuse.org/ae9ef6f6 <-- what you missed [20:11:39] Ok [20:11:43] I would love to see some volunteer who can do some (initial and final) announcements and be present on the remaining communication channels [20:12:07] IMHO there should be something on: [20:12:24] * news.opensuse.org (article with some details - or a link to the wiki page with details?) [20:12:35] * Email to announce@opensuse.org [20:13:05] I could handle that [20:13:06] s/to /&opensuse-/ please ;-) [20:13:11] * hint on the most important pages like OBS, wiki - and of course status.o.o [20:13:39] thanks tampakrap! [20:13:54] should we discuss about the channels to use for the initial announcement ? [20:14:17] I would suggest etherpad [20:14:20] Otherwise my next question would be about the "when" [20:14:43] https://etherpad.opensuse.org/p/NUE_office_downtime ? [20:14:45] I'd also mail opensuse-project and opensuse-factory, but besides that, your list looks good [20:14:59] ack [20:15:34] when: I'd say a week before, so friday 2017-10-06 [20:16:06] and then tell the people that handle the social media to reproduce the news article the day before the downtime [20:16:06] ack - but add one 2nd reminder on Friday ? [20:16:13] you really believe people will remember such an early downtime announcement? [20:16:38] ack, a reminder on Friday makes lots of sense [20:17:00] I would say 1 week before and 3 days before [20:17:01] so first announcement 2017-10-06, second announcement / reminder thursday 2017-10-12 [20:17:02] cboltz - I would even warn earlier: last time, when download.o.o was down, people were crying because they planned a release date and could not deliver their packages/images to their customers [20:17:32] Maybe we could scale the announcement ? [20:17:42] 2 weeks before on news.opensuse.org [20:17:56] 1 week before via heades on wiki and OBS ? [20:18:11] 3 days before via announcement Emails on different mailing lists [20:18:31] two weeks before is 2017-09-29 [20:18:33] on Friday again via Email and by setting the topic on IRC ? [20:19:01] What do you think about such a schedule ? [20:19:03] yeah makes sense, it is a big downtime that affects many services, so two weeks before is fine by me [20:19:23] That way we will not repeat our message too often on the same media, but should cover a lot of people [20:19:39] I would actually send the mail on opensuse-announce both on 2017-09-29 and a reminder 3 and 1 day before [20:19:49] ^ fine with me [20:19:53] and maybe even say on the initial announcement that there will be reminders [20:19:59] fine [20:20:20] sounds good [20:21:17] maybe we can re-use the countdown for this ? [20:21:59] creative idea - why not? ;-) [20:22:20] I'd say no, countdown is being used for nice stuff like conference and release so far [20:22:26] it will be confusing to use it for a downtime [20:23:12] I would say funny idea, but no like Theo [20:23:16] I was just thinking about this as the countdown image is included in many other pages as well [20:23:26] but if you don't like the idea, I'm fine [20:25:01] oki - I guess we can enhance the etherpad page later, if we find more to add .... [20:25:42] https://progress.opensuse.org/projects/opensuse-admin/news - I guess this should get a shortened part of the news.o.o article ? [20:25:43] How about changing the message in opensuse-admin during the downtime? [20:26:22] what do you want to print there during the downtime ? [20:27:25] That our servers have a downtime now. We are working on it to get them running again. [20:27:26] Ada_Lovelace: ^^ ? [20:27:28] I'm not sure, I'm thinking about it [20:27:37] it will go to planet, but so will the news.o.o article [20:27:43] Ada_Lovelace: that's something I would expect on all #opensuse- IRC channels [20:28:14] tampakrap: if you add the news on progress later, it will be like a 2nd reminder on planet.o.o [20:28:23] * cboltz hopes that someone actually reads the channel topic ;-) [20:28:30] good idea [20:28:36] Can you do that? Nobody of us has permissions for that... [20:28:57] 6/10 together with the other services probably [20:29:07] @kl_eisbaer [20:29:52] Ada_Lovelace: if you don't have it, request it ;-) [20:30:09] per ticket or here? [20:30:19] I don't have it. ^^ [20:30:23] * cboltz sees a "add news" link in the top-right corner of https://progress.opensuse.org/projects/opensuse-admin/news [20:30:26] Ada_Lovelace: if in doubt, ask the freenode admins [20:31:08] for the IRC topic - I'd say we should ask darix [20:31:11] Modus ChanServ gibt kl_eisbaer Operator-Status. [20:31:35] AFAIK he has permissions to change the topic in all opensuse-* channels [20:31:42] Thema kl_eisbaer setzt das Kanalthema auf „Downtime of NUE office: 2017-10-13 - 2017-10-15 | http://en.opensuse.org/Infrastructure_policy | https://progress.opensuse.org/projects/opensuse-admin/news | report problems to mailto:admin@opensuse.org | https://status.opensuse.org/“. [20:31:58] (and he probably also can give more people permissions to do it) [20:32:01] Modus ChanServ nimmt kl_eisbaer Operator-Status. [20:32:37] Then I'll ask darix... [20:34:01] More about the communication ? [20:34:55] would be cool if more people can join the IRC channels (and IMHO also the forums) during the downtime, to help our irritated customers [20:36:31] => NEXT TOPIC: migration of services ? [20:36:43] status.o.o runs in provo? [20:36:55] there is already status2.opensuse.org up and running in Provo [20:37:03] it just has not the current database dump [20:37:25] okay that's our first then :) [20:37:32] IMHO we can simply dump the DB from status and dump it in status2 then [20:38:15] provo-mirror.opensuse.org should become a full (means: with mirrorbrain) download.o.o during that time [20:38:30] that is IMHO the main topic for me after SUSECon [20:38:45] I'm unsure about: [20:38:51] * EMail (esp. lists.opensuse.org [20:38:55] * wikis [20:39:10] * conncheck [20:39:30] the other services are IMHO not so important for a single weekend [20:39:52] but maybe there are volunteers who want to help to set "their" services up in Provo ? [20:40:08] conncheck is easy, it is a simple static page [20:40:43] about the other two I am unsure as well [20:40:51] yes. The question for me is more if we want to setup a haproxy peer also in Provo ? [20:41:03] if we migrate the wiki, I would vote for yes [20:41:21] Ha! [20:41:23] and database clusters as well? [20:41:29] I forgot static.opensuse.org :-) [20:41:36] for the wikis - _if_ we do it, then I tend to setup a readonly copy that we can simply delete after switching back [20:41:41] so at least one narwal should be migrated as well [20:41:44] but given the dependencies, it's still "some" work [20:41:46] yes the whole static.o.o can be migrated to provo [20:42:20] I'm pretty sure conncheck is behind static.o.o, let me check [20:42:49] tapakrap: static.o.o is delivered via haproxy localy [20:42:55] äh, conncheck [20:43:30] as haproxy can deliver single static pages on his own, this was too easy to leave that to another host ;-) [20:43:42] no conncheck is delivered direclty by haproxy [20:44:07] okay [20:44:28] So what about focussing on the four services first - and see if someone else might want to work on others ? [20:44:50] about the wiki: @cbolz what do you need for read only wikis in Provo ? [20:45:58] I'd go for one big VM - including mysql, apache and maybe even elasticsearch [20:46:30] the base setup could be done by salt, so that's easy [20:46:46] if the search should be able to be used. ^^ [20:46:56] then rsync the uploaded files there, also easy (just takes time) [20:47:07] I would say not to migrate the wiki [20:47:11] it can be down for a weekend [20:47:47] the most annoying part is creating and importing database dumps, even if it's scriptable, someone has to write that script [20:49:03] what about machines like scar ? [20:49:24] is there a host in Provo already to be used ? [20:49:28] what about freeipa ? [20:49:34] hmm, or a completely different idea - would it be possible to "copy" a snapshot of the wiki the mysql VMs to provo? [20:50:08] ("snapshot" as in "disk snapshot with all content) [20:50:17] freeipa and scar are not needed to be migrated [20:50:25] scar offers vpn for services that will be down [20:50:36] tampakrap: we need a redundancy for freeipa anyway, right? [20:50:37] ah freeipa maybe, it offers the dns [20:50:51] freeipa and chip send the dns to microfocus [20:50:56] so what will happen if they are down? [20:51:16] tampakrap that depends on the TTL you set in your SOA ? ;-) [20:51:29] if the TTL is too short, opensuse.org will be not available [20:51:46] but this is a detail for the week before the downtime [20:52:13] anyway: we need a redundant freeipa server anyway - or you will loose all the DNS and user data once the machine crashes [20:52:28] including the possibility to log in and fix the broken machine ;-) [20:52:45] true [20:53:31] which brings me - of course to the question about a master - slave setup for the database server [20:54:06] with a master - slave, we could go with the delay that happens if you want to sync the machines between Nuremberg and Provo [20:54:39] the slave will always be some seconds behind - but can (for exampl) be used to run the DB-backups or other analysis on it [20:55:55] okay agree [20:56:12] but is this something we need for this downtime? [20:56:15] if we can bring it to a slave DB in Provo, we might run a read-only wiki all the time [20:56:37] no, this is just a "if time permits" or "if we have no other ideas" ;-) [20:57:08] ...but we might indeed think about having machines in Provo as "staging" machines [20:57:48] that way, we can always play with the latest and greatest stuff there - and if everything works, we can deploy the stuff on the production machines in Nuremberg [20:58:17] for mysql - ideally we'd first have the separate mysql cluster for openSUSE and then setup a slave in Provo - but that probably conflicts with "if time permits" ;-) [20:58:22] That would include, that we have as many machines in Provo as in Nuremberg in the end (and especially the storage might become a problem), but IMHO a good idea [20:58:36] If you want to have Geo High Availability for some machines... [20:59:10] tampakrap: so what about a haproxy (like anna) in Provo for this downtime already? [20:59:33] haproxy + static.o.o agreed yes [20:59:34] this anna could be used to serve conncheck for this downtime, but later can be extended to a similar setup than in Nuremberg [20:59:51] ...and our static machines are already behind the haproxy, right [21:00:13] any other machines that we define as important for this downtime ? [21:00:21] => https://etherpad.opensuse.org/p/NUE_office_downtime [21:01:07] what about connect.o.o? [21:01:11] gcc, icc, planet, gitlab, salt, osc-collab [21:01:20] it has the mail aliases, I don't know how they are transfered to the mail server [21:01:27] the o.o mail aliases I mean [21:01:33] tampakrap: good point ;-) [21:01:53] but the script which is transferring them to the mail server will also not run during the downtime [21:02:07] opensuse.org mail is handled by 42 mx2.suse.de. <--- will this also be down? If yes, question answered ;-) [21:02:13] and the MX needs to cache the mails anyway [21:02:51] cboltz: the MX will IMHO run in Prague during that time [21:02:52] mx will probably move to prague, still in discussion with suse-it [21:04:50] I'd recommend to make /etc/postfix/ read-only - chattr +i is guaranteed to work ;-) [21:05:00] ok, any other ideas about the downtime ? [21:05:18] cboltz: I simply leave that to the SUSE-IT specialists ;-) [21:05:54] I hope that SUSE-IT will power on all the Nuremberg machines again on Saturday evening [21:06:13] * cboltz wonders how kl_eisbaer calls himself if he talks about "SUSE-IT specialists" as "the others" [21:06:18] someone needs to move the DNS entries back after that, but this can IMHO be done at any time when the services are back [21:06:55] cboltz: there are luckily more(and better) experts in SUSE-IT than you are currently aware of [21:06:58] :D [21:07:15] I hope, too. ^^ [21:07:24] ;-) [21:07:56] But hoping isn't enough... [21:07:58] How should we inform the people that the downtime is over ? [21:08:09] news [21:08:20] via a separate news entry ? [21:08:25] yes [21:08:26] or should we update the old one ? [21:08:34] ok [21:08:48] No. That is bad! So you wouldn't know when the end was. [21:08:55] update the old one [21:09:07] that's what we did three times so far [21:09:12] the "old" article is scheduled quite early, so please write a new one [21:09:16] Ada_Lovelace: why not? on status.o.o we always write a "UPDATE: " line at the first line... [21:09:32] Ok. If you mean. [21:09:37] yeah, because adding separate updates is somewhat broken :-/ [21:10:06] but we might combine the new article with some nice "lessons learned" or "see what we have achieved since" or "new machines/services..." news ? [21:10:50] I hope we won't need the "lessons learned" section ;-) [21:10:58] As far as I know, we did no news article about our Salt, gitlab, and openVPN - right? [21:11:01] this is acceptable yes [21:11:02] That sounds nice. ;) [21:11:31] why should we announce internal services to people? I doubt they care much about them [21:11:35] so in that case I am for a completely new article after the downtime, that includes some nice news about these services [21:11:56] tampakrap because people tend to learn from our setups for their own ones ? [21:12:20] still we don't need news article for that, our heroes@ mails are enough imho [21:12:27] and the osc17 presentation that mentions them [21:12:52] I would say:especially the way how we use the virtualization and subnetting together with SAN and hypervisors is a very interesting topic for many companies [21:13:23] tampakrap: what about doing something good and talk about it [21:13:45] be warned that "people tend to learn from our setup" might result in people asking for (read-only) access to our salt code [21:13:59] tampakrap: if you think you had everything that is nice and important in your presentation, feel free to add a link to it :-) [21:14:18] cboltz: ...and why should that be a problem? [21:14:37] I don't have a problem with it ;-) [21:14:50] but there must be a reason why we have it non-public now, right? [21:15:13] main reason is that gitlab has often vulnerabilities :) [21:15:20] for me, the openSUSE heroes should not only be the admins - they should also be the guys people like to ask if they want to offer similar services :-) [21:15:48] but I guess we are leaving the topic [21:16:02] did we forget anything ? [21:16:03] tampakrap: sounds like the boring solution would be to mirror everything to github ;-) [21:16:10] ah, yes [21:16:27] who is taking care about the machines that need to be migrated to Provo ? [21:16:28] will there be error pages for all services we didn't discuss? [21:17:02] cboltz: could be, if we change the DNS and configure the haproxy in Provo correctly [21:17:05] (a 503 page isn't nice, but a timeout is worse) [21:17:46] cboltz: so we need someone who: [21:17:54] * creates a dump from the current DNS setup [21:18:08] * changes all (alias) entries to the new haproxy in Provo [21:18:16] * changes everything back in freeipa once the downtime is over [21:18:34] ...who has currently access to the DNS ? [21:18:35] we can just change migration2.o.o to point to the provo haproxy public ip [21:18:53] kl_eisbaer: you me darix max afaik, want me to check? [21:19:01] tampakrap: right, but some services (like comunity) use different IPs at the moment [21:19:30] tampakrap: don't forget login2.o.o ;-) [21:19:39] tampakrap: yes, please [21:19:43] if we move login and migration2 we have moved 99% of the services [21:19:47] what is behind community? [21:20:16] tampakrap: dunno atm. But we might think about clearing up that IP address anyway [21:20:29] the public IP of community is handled by haproxy anyway [21:20:45] planet.opensuse.org is an alias for community.opensuse.org. [21:20:59] counter also [21:21:04] so dns access are the four people I mentioned before [21:21:14] (and I'm just doing some random DNS lookups ;-) [21:21:26] which reminds me - kl_eisbaer I need access to the planet/counter machine [21:21:31] to add it to salt [21:21:54] tampakrap: what's the name of the machine ? [21:22:03] kruemel ? [21:22:08] cboltz: file a ticket to me please to move community to migration2 [21:22:17] and then rename migration2 to proxy.o.o [21:22:19] kl_eisbaer: yes [21:22:45] need to go afk for around ~15 mins sorry [21:23:01] to get the full list, search for "community" in the freeipa DNS list - planet.o.o and doc.o.o are the most prominent services [21:23:02] np - I guess we can call this a meeting anyway [21:23:43] tampakrap: Sep 24 21:23:25 kruemel salt-minion[24513]: [ERROR ] The Salt Master has cached the public key for this node, this salt minion will wait for 10 seconds before attempting to re-authenticate [21:23:43] Hint: Some lines were ellipsized, use -l to show in full. [21:25:13] tampakrap: and your 20,000 ssh keys are now also on the machine [21:25:29] any questions left ? [21:26:01] do we need another meeting? if so, when? [21:26:10] I would say: it depends ;-) [21:26:22] cboltz: if we have a need, let's meet [21:26:28] otherwise we have the IRC channel [21:26:32] ...or the mailing list [21:26:50] but the most important question for me: can someone attach the meeting log to https://progress.opensuse.org/issues/25536 ? [21:27:21] I miss that for the regular September meeting anyway [21:27:28] yes, I can do that [21:27:36] cboltz: thanks [21:27:42] in that case I would say: ENDING MEETING [21:27:45] for the september meeting - [21:27:50] first we moved it [21:27:53] thanks to everyone who participated [21:28:01] and on the second date, even less people where there [21:28:03] thanks, too [21:28:13] so basically we only discussed that we should stop moving meetings ;-)