Project

General

Profile

communication #53702 » 2019-08-06-heroes-meeting.txt

IRC meeting log - cboltz, 2019-08-06 21:41

 
2019-08-06 Heroes meeting

[19:53:05] <tuanpembual> Hi all
[20:00:53] <cboltz> Hi!
[20:01:15] <cboltz> meeting time - who's here? ;-)
[20:01:26] <tuanpembual> me
[20:01:30] <mstroeder> meinereiner
[20:02:15] <cboltz> :-)
[20:02:43] <cboltz> while maybe a few more people wake up ;-) let's start with our first topic
[20:02:49] <bmwiedemann> Hi
[20:02:59] <cboltz> does someone from the community have a question?
[20:04:23] <cboltz> doesn't look so ;-)
[20:04:47] <cboltz> our next (and catch-all) topic is "status reports about everything"
[20:04:54] <cboltz> who wants to start?
[20:05:37] <tuanpembual> Me.
[20:05:55] <tuanpembual> update for new progress.
[20:06:11] <tuanpembual> *it is been a while where no update.
[20:07:39] <tuanpembual> cboltz suggest. We need "official" domain for testing progress to get routed via login2.opensuse.org
[20:08:21] <tuanpembual> domain like "progress-test.opensuse.org"
[20:08:33] <tuanpembual> so who will can help me to do this?
[20:08:45] <tuanpembual> any idea @cboltz ?
[20:09:04] * tuanpembual will update ticket about this issue.
[20:09:42] <tuanpembual> done from me.
[20:09:44] <cboltz> I'd say/hope that kbabioch_ or mcaj should have enough permissions to do the DNS entries, but they are both away
[20:09:49] <cboltz> so please open a ticket ;-)
[20:10:03] <tuanpembual> noted
[20:11:18] <cboltz> so - who's next?
[20:13:01] <cboltz> nobody? then I'll do a quick report about two recent downtimes
[20:13:05] <bmwiedemann> I'm somewhat worried that heroes are understaffed and tickets remain untouched for too long. I'd like to do more myself, but only got the VPN, no credentials
[20:13:58] <cboltz> sounds like something we should fix ;-)
[20:14:16] <cboltz> what exactly do you need?
[20:14:39] <cboltz> (for example, does the VPN connection work? Can you login as user on all VMs?)
[20:15:19] <bmwiedemann> VPN worked (last time I tried some weeks ago)... didnt try user login (was that documented somewhere?)
[20:16:24] <cboltz> you should be able to ssh to all VMs (see pillar/id/* in the salt repo for a list) as user
[20:16:47] <cboltz> to actually do something, someone needs to give you sudo permissions on specific machines
[20:18:16] <cboltz> that's the quick summary - if you have questions, feel free to ask me ;-)
[20:18:48] <bmwiedemann> OK. ssh works :-)
[20:18:59] <cboltz> :-)
[20:22:03] <cboltz> let's continue while you look around on various VMs ;-)
[20:22:22] <cboltz> as I already mentioned, we had two recent downtimes:
[20:23:02] <cboltz> a) a mailinglist outage - I'm sure you've seen the mail on heroes@ (about two weeks ago)
[20:23:59] <cboltz> b) events.o.o was down for two days (after a reboot which "activated" a new ruby version) - kbabioch_ was able to hotfix it when we noticed the downtime, and in the meantime henne did a proper fix and updated the VM to Leap 15.1
[20:24:46] <cboltz> oh, and I migrated the meeting reminder (currently only used by the board) from an old SLE 11 to pinot.infra.o.o
[20:25:41] <cboltz> that's it from me
[20:25:54] <tuanpembual> ticket update: https://progress.opensuse.org/issues/27720
[20:27:07] <cboltz> thanks!
[20:27:14] <cboltz> does someone else have a status report or something to discuss?
[20:28:45] <mstroeder> Æ-DIR, should I stay or should I go? The test environment is up and running for one year now. But I guess nobody has the time to try it.
[20:29:19] <cboltz> yes, that's probably the problem :-(
[20:30:20] <mcaj> HI
[20:30:29] <cboltz> however, given the "fun" we have with sssd (it seems rebooting a 15.1 VM reliably breaks sudo unless you restart sssd - via the saltmaster, because sudo doesn't work) I'm more than open for switching to Ædir
[20:30:57] <cboltz> hi mcaj
[20:31:24] <mstroeder> For the client side with Æ-DIR it's highly recommended to switch to aehostd (see https://www.ae-dir.com/aehostd.html).
[20:31:24] <cboltz> the last comment in https://progress.opensuse.org/issues/27720 might be something for you (creating a DNS entry)
[20:31:25] <mcaj> Is it a bug of an error in config files ?
[20:32:03] <cboltz> restarting sssd "fixes" it
[20:32:18] <cboltz> so IMHO it shouldn't be an error in the config
[20:32:30] <bmwiedemann> maybe timing issue - something started too early in boot?
[20:32:42] <bmwiedemann> because systemd parallelizes as much as it can
[20:32:56] <bmwiedemann> journal should have some info?
[20:33:04] <cboltz> yes, could be
[20:33:22] <cboltz> I have to admit that I didn't check the boot log and "only" restarted sssd to get it working
[20:33:25] <mcaj> looks like missing dependency for sysctl conf
[20:34:12] <mcaj> I had simmilar problem with Haproxy and that time help me update the package to latest stable version
[20:34:47] <mcaj> but strange is that we sow this error on several machines but not all of them
[20:35:54] <cboltz> I didn't test everywhere, but on all 15.1 machines where I tested, sudo was broken after reboot, and restarting sssd "fixed" it
[20:36:26] <cboltz> I remember similar problems on 42.3 where sssd even broke ssh login as user
[20:36:41] <cboltz> but not always IIRC
[20:36:56] <cboltz> (now please don't argue that the situation improved on 15.1 because at least user login doesn't break ;-)
[20:37:02] <mstroeder> Hmm, I have customers running different versions of sssd. With years-old 1.9.6 there were several crashes during startup.
[20:38:24] <cboltz> systemctl status sssd always says it's running, even in the cases it breaks sudo or ssh login
[20:39:35] <bmwiedemann> so it does not quit in case of errors
[20:39:54] <cboltz> if you or someone else wants to debug it - the "future TSP" VM is currently bored and nobody complains if it gets rebooted more often
[20:40:13] <mstroeder> Furthermore there were some bugs in the code using netlink detection. That's why I usually recommend to compile sssd with configure option --with-libnl=no
[20:40:18] <cboltz> for obvious reasons, I'll add your ssh pubkey to /root/.ssh/authorized_keys ;-)
[20:41:19] <mstroeder> Building with --with-libnl=no means that sssd won't detect network link for determining whether in online of offline mode.
[20:42:25] <mstroeder> sssd writes many log messages into various files beneath /var/log/sssd. Can you point me to an affected system?
[20:42:39] <cboltz> a quick look at the sssd.spec (in Factory) shows that it BuildRequires libnl, and that it doesn't use this configure option
[20:43:24] <cboltz> affected systems are probably all running Leap 15.1, for example narwal5, narwal6, narwal7, dale
[20:43:39] <mstroeder> libnl is automatically detected (sssd's configure default for --with-libnl is "auto")
[20:44:33] <cboltz> since /var/log/sssd/ is root-only, I'll copy those logs to /dev/shm on narwal7
[20:45:35] <cboltz> the log should show the problem after the last reboot
[20:55:25] <mstroeder> sssd_nss.log shows "The Data Provider returned an error [org.freedesktop.sssd.Error.DataProvider.Offline]" which could mean that D-BUS is not ready yet during startup?
[20:55:46] <mstroeder> But it seems sssd is running with almost no logging at all.
[20:56:29] <mcaj> To me looks like a bug in Leap 15.1 ... see our frends thread at: https://lists.fedorahosted.org/archives/list/sssd-users@lists.fedorahosted.org/thread/OZOHTCTD4CQE6XB6ZQS3YLDYX76DW5XI/
[20:56:38] <heroes-bot> <https://x0.no/4lfjs> (at lists.fedorahosted.org)
[21:01:54] <cboltz> I see a familiar name there ;-) - and yes, it looks like a valid explanation
[21:05:04] <mcaj> we can put there a fix in the config file but the bug should be fixed by opensuse or better up stream
[21:05:52] <cboltz> the sssd config is in pillar/common.sls
[21:06:07] <cboltz> and I'm sure you know how to write a bugreport ;-)
[21:10:19] <mstroeder> You mean I should write a bug report to the sssd bug tracker?
[21:11:04] <cboltz> you are the sssd expert, so assuming you think it's a bug - yes, please ;-)
[21:11:37] <mstroeder> Well, I rather gave up using sssd. ;-)
[21:12:08] <mstroeder> Maybe this is already fixed in sssd 2.x. Its change logs are huge.
[21:13:17] <cboltz> interesting question - Tumbleweed has 2.2.0, but Leap 15.1 is at 1.16.1
[21:13:47] <cboltz> (and I slightly ;-) doubt Leap and SLE want to do a major version upgrade)
[21:14:29] <mcaj> ok we will report it as a bug, fine, shall I open next topic ? Leap 42.3 machines ? ) ticket https://progress.opensuse.org/issues/54968
[21:14:30] <mstroeder> IIRC Leap's sssd package is from SLE15.
[21:15:22] <cboltz> right, and that makes a version upgrade even less likely
[21:15:25] <mstroeder> Anyway, one of the reasons why I sat down implementing aehostd is getting rid of sssd's complexity.
[21:15:41] <mcaj> see https://en.opensuse.org/Lifetime
[21:17:41] <cboltz> mcaj: I'm afraid your list is incomplete - a "salt \* grains.get osrelease" lists some more 42.3 machines
[21:17:41] <mcaj> Our Team will try to do some update soon but we definitly need Hereos to help us with it . (team = SUSE Eng-Infra)
[21:18:54] <mcaj> yes, feel free to update the ticket, this was just what I found doring the kernel update on machines.. salt is rigts ..
[21:19:35] <mcaj> anyway zypper dup is need it ;)
[21:21:00] <cboltz> ticket updated
[21:21:37] <bmwiedemann> do zypper repos use $releasever already?
[21:22:29] <cboltz> no, you'll have to adjust them manually (or, better, adjust salt to switch to using $releasever ;-)
[21:23:47] <bmwiedemann> will do
[21:25:02] <cboltz> :-)
[21:27:21] <cboltz> speaking about salt - can someone who knows postgresql please have a look at https://progress.opensuse.org/issues/54503 ?
[21:27:40] <cboltz> I'd love not to have a highstate "timebomb" ;-)
[21:40:15] <bmwiedemann> made the $releasever salt MR
[21:43:13] <cboltz> any idea why I see "Could not retrieve the pipeline status" in your MR?
[21:43:52] <bmwiedemann> dont know
[21:44:34] <bmwiedemann> maybe the $ sign is special and needs escaping?
[21:49:17] <cboltz> no, AFAIK the $ sign doesn't have a special meaning in sls files
[21:50:04] <kbabioch_> the pipeline error is (probably) not related to $ -> https://gitlab.infra.opensuse.org/bmwiedemann/salt/-/jobs/4456
[21:50:12] <heroes-bot> <https://x0.no/4lfll> (at gitlab.infra.opensuse.org)
[21:50:19] <kbabioch_> "The requested URL returned error: 403"
[21:50:30] <kbabioch_> "
[21:50:30] <kbabioch_> remote: You are not allowed to download code from this project."
[21:51:02] <kbabioch_> bmwiedemann: if i see it correctly, you've forked the project ... not sure if this workflow is fully supported with the ci/cd ...
[21:52:47] <bmwiedemann> interesting... I thought that is the standard workflow (unless you want to give everyone write permissions on the main repo)
[21:53:23] * mcaj has 10% of the laptop batery :(
[21:53:56] <cboltz> bmwiedemann: making your fork more public might help
[21:54:00] <kbabioch_> bmwiedemann: yes, agreed, its the better way and what we all know from github, etc. pp. ... its just one thing i've noticed that is different from your merge request comapred to many others that do work fine ;-)
[21:54:38] <cboltz> some people use forks and some people create branches in the main repo
[21:54:46] <cboltz> so you'll see both ways depending on who does a MR
[21:55:38] <mcaj> time ti go CU next time
[21:56:02] <cboltz> BTW: I just tested (on a local test VM) and $releasever works as expected
[21:57:18] <bmwiedemann> kbabioch_: settings have it as "Private" and dont allow to change visibility
[21:59:12] <bmwiedemann> good night to you then. I'll get me some sleep, too.
    (1-1/1)