action #96512
closedcoordination #96980: [qe-core][samba_adcli][epic] Tracker for samba_adcli failures
[qe-core][sporadic] Re-Enable Active Directory tests
Added by dzedro over 3 years ago. Updated over 1 year ago.
100%
Description
Observation¶
openQA test in scenario sle-12-SP5-Server-DVD-Incidents-x86_64-mau-extratests2@64bit fails in
samba_adcli
Test suite description¶
Run console tests against aggregated test repo
Reproducible¶
Fails since (at least) Build :20666:webkit2gtk3 (current job)
Expected result¶
Last good: :4705:fetchmail (or more recent)
Further details¶
Always latest result in this scenario: latest
Files
Updated by dzedro over 3 years ago
- Subject changed from [sporadic] test fails in samba_adcli to [qe-core][sporadic] test fails in samba_adcli
Updated by okurz over 3 years ago
- Related to action #96513: [qe-core][sporadic][samba_adcli] wbinfo fails added
Updated by okurz over 3 years ago
- Priority changed from Normal to Urgent
Seeing this in couple of tests e.g. https://openqa.suse.de/tests/6640016 and https://openqa.suse.de/tests/6623775 . As it's sporadic I consider it "urgent" as label carry over does not work. Consider https://github.com/os-autoinst/scripts/blob/master/README.md#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger to address urgency
Updated by okurz over 3 years ago
Proposing removing the test module for now from mau-extratests2 in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/13072 to address the urgency
Updated by geor over 3 years ago
- Subject changed from [qe-core][sporadic] test fails in samba_adcli to [qe-core][sporadic] test fails in samba_adcli (fails on ssh to domain)
Updated by okurz about 3 years ago
- Priority changed from Urgent to Normal
Discussed in weekly QE sync together. This ticket showed up violating the SLO for urgent tickets. We agreed that the ticket is actually not urgent according to these criteria. See https://progress.opensuse.org/projects/openqatests/wiki#SLOs-service-level-objectives for details
Updated by tjyrinki_suse about 3 years ago
- Status changed from New to Workable
- Start date deleted (
2021-08-03)
Updated by tjyrinki_suse almost 3 years ago
- Target version set to QE-Core: Ready
Would be nice to work on samba_ad issues (or at least find out which tasks are relevant at the moment) for the next sprint.
Updated by tjyrinki_suse almost 3 years ago
- Priority changed from Normal to High
Updated by tjyrinki_suse almost 3 years ago
- Target version deleted (
QE-Core: Ready)
Updated by tjyrinki_suse almost 3 years ago
- Priority changed from High to Normal
Updated by slo-gin almost 2 years ago
This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.
Updated by dzedro almost 2 years ago
- Assignee set to dzedro
- Target version set to QE-Core: Ready
Updated by szarate almost 2 years ago
- Related to action #120859: [qe-core] test fails in samba_adcli - Unschedule test added
Updated by szarate almost 2 years ago
- Sprint set to QE-Core: February Sprint (Feb 08 - Mar 08)
samba_adcli tests are unscheduled for now, as the windows machine needs to be set up again/moved out from phobos.qa.suse.de (which is in the TAM lab).
Updated by szarate almost 2 years ago
- Subject changed from [qe-core][sporadic] test fails in samba_adcli (fails on ssh to domain) to [qe-core][sporadic] Re-Enable Active Directory tests - test fails in samba_adcli (fails on ssh to domain)
You can use qamaster.qa.suse.de to deploy a new Windows VM. I think Marita can provide a valid windows 2019 license key, in case I can't find it.
Updated by ph03nix almost 2 years ago
- Status changed from Workable to In Progress
- Assignee changed from dzedro to ph03nix
Cheekily taking this ticket over :-)
Updated by ph03nix almost 2 years ago
Test was deleted in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15968 due to https://progress.opensuse.org/issues/120859
Updated by ph03nix almost 2 years ago
I'm collecting notes and setup instructions on https://confluence.suse.com/display/qasle/AD+configuration+for+testing
Updated by ph03nix almost 2 years ago
- File Win2019_Validation.png Win2019_Validation.png added
The provided product key doesn't seem to work:
Updated by ph03nix almost 2 years ago
Managed to install, activate and update the Windows Server installation. Now I'm assigning a static IP address to it.
The hostname will be "methusalix"
https://gitlab.suse.de/qa-sle/qanet-configs/-/merge_requests/48
Updated by ph03nix almost 2 years ago
I'm updating the https://confluence.suse.com/display/qasle/AD+configuration+for+testing along the way to document the installation process.
Updated by szarate almost 2 years ago
GPO enablement PR (Needs it's own ticket, & somebody to pick it up) https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16107
Updated by ph03nix almost 2 years ago
I've installed the Windows Server 2019 and documented the process on https://confluence.suse.com/display/qasle/AD+configuration+for+testing
I've tested to join the Domain via Windows 10 (OK) and via SLES 15-SP5 (YaST - OK). Now will check for joining the domain via the CLI and on how the test setup is interacting with it.
Updated by ph03nix almost 2 years ago
szarate wrote:
GPO enablement PR (Needs it's own ticket, & somebody to pick it up) https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16107
Let's please do this in a separate ticket, otherwise this grows too much. The new ticket should be blocked by this one and I'm likely to take it afterwards. But please still: New ticket.
Updated by ph03nix almost 2 years ago
- % Done changed from 0 to 30
Right now I'm adapting the old test case to work with the new server. In the process I'm also changing the hard-coded value for the domain stuff to become settings.
Updated by ph03nix almost 2 years ago
Currently facing the problem, that the AD Domain Controller has network issues from duck norris
# host feldspaten.org 10.162.30.119
;; connection timed out; no servers could be reached
# ping -c 2 10.162.30.119
PING 10.162.30.119 (10.162.30.119) 56(84) bytes of data.
64 bytes from 10.162.30.119: icmp_seq=1 ttl=126 time=1.10 ms
64 bytes from 10.162.30.119: icmp_seq=2 ttl=126 time=1.16 ms
--- 10.162.30.119 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.099/1.129/1.160/0.030 ms
However it works from my laptop.
Updated by ph03nix almost 2 years ago
DNS works from laptop and duck-norris against a new host (automatix
) on qa-master (10.162.30.153).
The issue is apparently the AD configuration.
Updated by ph03nix almost 2 years ago
- Related to action #96983: [qe-core][sporadic][samba_adcli] adcli joining domain fails added
Updated by ph03nix almost 2 years ago
After some firewall fiddling, I notice that the openQA host can reach it now.
I added a rule to allow all firewall traffic, so that I can advance for now.
Updated by ph03nix almost 2 years ago
# net ads join --domain METHUSALIX.QA.SUSE.DE -U Administrator -S methusalix.qa.suse.de
Password for [METHUSALIX.QA.SUSE.DE\Administrator]:
Failed to join domain: failed to lookup DC info for domain 'METHUSALIX.QA.SUSE.DE' over rpc: {Operation Failed} The requested operation was unsuccessful
Updated by ph03nix almost 2 years ago
https://duck-norris.qe.suse.de/tests/11940#step/samba_adcli/161
Password for [METHUSALIX0\Administrator]:
Failed to join domain: failed to lookup DC info for domain 'METHUSALIX.QA.SUSE.DE' over rpc: {Device Timeout} The specified I/O operation on %hs was not completed before the time-out period expired.
There is https://www.suse.com/support/kb/doc/?id=000020507 for this, looks like a configuration issue on the Windows Server. I have allowed all ports and services there, so I don't know where this is coming from.
Updated by ph03nix almost 2 years ago
I can join the domain just fine on a Leap 15.4 VM from my Laptop:
# net ads join --domain 'METHUSALIX.QA.SUSE.DE' -U Administrator -S 'methusalix.qa.suse.de' --no-dns-updates
Password for [METHUSALIX0\Administrator]:
Using short domain name -- METHUSALIX0
Joined 'LEAP15-4' to dns domain 'methusalix.qa.suse.de'
Need to check what is blocking the network from the openQA VM to the Domain.
Updated by ph03nix almost 2 years ago
Good, issue can be reproduced on a new VM on ada:
zergling:~ # net ads join --domain 'METHUSALIX.QA.SUSE.DE' -U Administrator -S 'methusalix.qa.suse.de' --no-dns-updates
Password for [METHUSALIX0\Administrator]:
Failed to join domain: failed to lookup DC info for domain 'METHUSALIX.QA.SUSE.DE' over rpc: {Device Timeout} The specified I/O operation on %hs was not completed before the time-out period expired.
Now going to check where the issue comes from.
Updated by ph03nix almost 2 years ago
A tcpdump
shows that requests are going out to ports 445 and 139 (as described in https://www.suse.com/support/kb/doc/?id=000020507) in TCP - but nothing comes back
09:05:47.579065 IP 10.137.7.199.56258 > 10.162.30.119.445: Flags [S], seq 3725595907, win 64240, options [mss 1460,sackOK,TS val 377180263 ecr 0,nop,wscale 7], length 0
09:05:47.584195 IP 10.137.7.199.48880 > 10.162.30.119.139: Flags [S], seq 3763447905, win 64240, options [mss 1460,sackOK,TS val 377180268 ecr 0,nop,wscale 7], length 0
09:05:48.593784 IP 10.137.7.199.48880 > 10.162.30.119.139: Flags [S], seq 3763447905, win 64240, options [mss 1460,sackOK,TS val 377181278 ecr 0,nop,wscale 7], length 0
09:05:48.593808 IP 10.137.7.199.56258 > 10.162.30.119.445: Flags [S], seq 3725595907, win 64240, options [mss 1460,sackOK,TS val 377181278 ecr 0,nop,wscale 7], length 0
09:05:50.609816 IP 10.137.7.199.56258 > 10.162.30.119.445: Flags [S], seq 3725595907, win 64240, options [mss 1460,sackOK,TS val 377183294 ecr 0,nop,wscale 7], length 0
09:05:50.609820 IP 10.137.7.199.48880 > 10.162.30.119.139: Flags [S], seq 3763447905, win 64240, options [mss 1460,sackOK,TS val 377183294 ecr 0,nop,wscale 7], length 0
Updated by ph03nix almost 2 years ago
I can see on the Domain Controller that DNS and CLDAP (UDP/389) packets are incoming, but I'm missing the tcp packets for port 445 and 139.
I have firewall rules to allow everything incoming and outgoing on the Domain Controller in place.
Updated by ph03nix almost 2 years ago
A tcpdump between the test host (zergling
) and the VM Hypervisor qamaster
host shows that there are packets being blocked between the subnets:
OUTGOING from zergling
(showing only TCP)
09:30:36.657256 IP 10.137.7.199.49328 > 10.162.30.119.445: Flags [S], seq 3452051387, win 64240, options [mss 1460,sackOK,TS val 378669341 ecr 0,nop,wscale 7], length 0
09:30:36.662387 IP 10.137.7.199.37764 > 10.162.30.119.139: Flags [S], seq 2501689049, win 64240, options [mss 1460,sackOK,TS val 378669347 ecr 0,nop,wscale 7], length 0
09:30:37.681754 IP 10.137.7.199.37764 > 10.162.30.119.139: Flags [S], seq 2501689049, win 64240, options [mss 1460,sackOK,TS val 378670366 ecr 0,nop,wscale 7], length 0
09:30:37.681770 IP 10.137.7.199.49328 > 10.162.30.119.445: Flags [S], seq 3452051387, win 64240, options [mss 1460,sackOK,TS val 378670366 ecr 0,nop,wscale 7], length 0
09:30:39.697759 IP 10.137.7.199.49328 > 10.162.30.119.445: Flags [S], seq 3452051387, win 64240, options [mss 1460,sackOK,TS val 378672382 ecr 0,nop,wscale 7], length 0
09:30:39.697762 IP 10.137.7.199.37764 > 10.162.30.119.139: Flags [S], seq 2501689049, win 64240, options [mss 1460,sackOK,TS val 378672382 ecr 0,nop,wscale 7], length 0
INCOMING on qa-master
(showing all)
10:30:36.645760 IP 10.137.7.199.58684 > 10.162.30.119.389: UDP, length 103
10:30:36.646472 IP 10.162.30.119.389 > 10.137.7.199.58684: UDP, length 184
I'm missing here all TCP packets from port 445 and 139.
Updated by ph03nix almost 2 years ago
I opened https://sd.suse.com/servicedesk/customer/portal/1/SD-113037 for this issue.
Updated by ph03nix over 1 year ago
Firewall issue has been resolved, I can continue with the test itself now.
Updated by ph03nix over 1 year ago
While working on the test, I notice that some rather important kerberos utitlities are missing: kdestroy
, kpasswd
, kswitch
. Investigating.
Updated by ph03nix over 1 year ago
Advancing slowly: https://duck-norris.qe.suse.de/tests/12108#
Updated by ph03nix over 1 year ago
The DC password is exposed and that is not nice. I need to move the define_secret_variable
function from https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/publiccloud/utils.pm#L177 to lib/utils.pm
.
Updated by ph03nix over 1 year ago
define_secret_variable
PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16482
Updated by ph03nix over 1 year ago
New attempt for define_secret_variable
: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16493
Updated by szarate over 1 year ago
might be useful: https://progress.opensuse.org/issues/117979
Updated by ph03nix over 1 year ago
- Related to tickets #117979: [Regression][AD] AD: Use nautilus to access DFS share Folder added
Updated by ph03nix over 1 year ago
A first prototype works for 15-SP4. Now checking how well it works for other, especially older versions.
Updated by ph03nix over 1 year ago
kinit succeeded but ads_sasl_spnego_gensec_bind(KRB5) failed for ldap/methusalix.qa.suse.de with user[Administrator] realm[METHUSALIX.QA.SUSE.DE]: An invalid parameter was passed to a service or function.
kinit succeeded but ads_sasl_spnego_gensec_bind(KRB5) failed for ldap/methusalix.qa.suse.de with user[SUSETEST$] realm[METHUSALIX.QA.SUSE.DE]: An invalid parameter was passed to a service or function.
https://duck-norris.qe.suse.de/tests/12176#step/samba_adcli/163 It was clear, that older versions would not work immediately.
Updated by ph03nix over 1 year ago
Updated by ph03nix over 1 year ago
AD Secret settings: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/501
Updated by szarate over 1 year ago
- Sprint changed from QE-Core: February Sprint (Feb 08 - Mar 08) to QE-Core: March Sprint (Mar 08 - Apr 05)
Updated by ph03nix over 1 year ago
How long does it take to edit a stupid simple config files? 1 full day of learning salt because that's a requirement for contributing. grml
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/802
Updated by ph03nix over 1 year ago
dzedro wrote:
@pho3nix is our new Salt expert. 🤩
hides
Updated by ph03nix over 1 year ago
This week I was busy with the manual validation, will continue next week.
Updated by ph03nix over 1 year ago
Back on the task. I checked and the AD Windows machine is updating itself automatically, which is nice :-)
Updated by ph03nix over 1 year ago
PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16517
The Schedule needs to be modified accordingly in the next step.
Updated by ph03nix over 1 year ago
- % Done changed from 30 to 90
PR merged, let's see if this works in tomorrow's test runs.
Updated by ph03nix over 1 year ago
Reverting due to some issues in https://openqa.suse.de/tests/10738750#
Updated by ph03nix over 1 year ago
Add settings to Single Incident test runs: https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/merge_requests/486
Updated by ph03nix over 1 year ago
Re-Introduce Samba AD: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16671
Updated by ph03nix over 1 year ago
Updated by ph03nix over 1 year ago
Disable IPv6: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16684
And POO for changing this after not being needed anymore: https://progress.opensuse.org/issues/126482
Updated by ph03nix over 1 year ago
Asking our s390x workers to also get access to qa.suse.de, so they can reach the AD: https://sd.suse.com/servicedesk/customer/portal/1/SD-115963
Updated by ph03nix over 1 year ago
I also need to disable samba_adcli
for 12-SP2. Yes, 12-SP2, see https://openqa.suse.de/tests/10753217
Updated by ph03nix over 1 year ago
PR: Disable samba_adcli on SLES<12-SP3: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16692
Updated by ph03nix over 1 year ago
Request in the SD Ticket to get a test machine, so they can fix the network issue (DC not reachable for s390x)
Updated by ph03nix over 1 year ago
- Subject changed from [qe-core][sporadic] Re-Enable Active Directory tests - test fails in samba_adcli (fails on ssh to domain) to [qe-core][sporadic] Re-Enable Active Directory tests
Updated by ph03nix over 1 year ago
- Status changed from In Progress to Resolved
- % Done changed from 90 to 100
Marking this as resolved and will fix the firewall issues in https://progress.opensuse.org/issues/126866
Updated by ph03nix over 1 year ago
- Related to action #126866: [qe-core] test fails in samba_adcli on s390x added