action #96512
closedcoordination #96980: [qe-core][samba_adcli][epic] Tracker for samba_adcli failures
[qe-core][sporadic] Re-Enable Active Directory tests
100%
Description
Observation¶
openQA test in scenario sle-12-SP5-Server-DVD-Incidents-x86_64-mau-extratests2@64bit fails in
samba_adcli
Test suite description¶
Run console tests against aggregated test repo
Reproducible¶
Fails since (at least) Build :20666:webkit2gtk3 (current job)
Expected result¶
Last good: :4705:fetchmail (or more recent)
Further details¶
Always latest result in this scenario: latest
Files
Updated by dzedro about 2 years ago
- Subject changed from [sporadic] test fails in samba_adcli to [qe-core][sporadic] test fails in samba_adcli
Updated by okurz about 2 years ago
- Related to action #96513: [qe-core][sporadic][samba_adcli] wbinfo fails added
Updated by okurz about 2 years ago
- Priority changed from Normal to Urgent
Seeing this in couple of tests e.g. https://openqa.suse.de/tests/6640016 and https://openqa.suse.de/tests/6623775 . As it's sporadic I consider it "urgent" as label carry over does not work. Consider https://github.com/os-autoinst/scripts/blob/master/README.md#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger to address urgency
Updated by okurz about 2 years ago
Proposing removing the test module for now from mau-extratests2 in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/13072 to address the urgency
Updated by geor about 2 years ago
- Subject changed from [qe-core][sporadic] test fails in samba_adcli to [qe-core][sporadic] test fails in samba_adcli (fails on ssh to domain)
Updated by okurz about 2 years ago
- Priority changed from Urgent to Normal
Discussed in weekly QE sync together. This ticket showed up violating the SLO for urgent tickets. We agreed that the ticket is actually not urgent according to these criteria. See https://progress.opensuse.org/projects/openqatests/wiki#SLOs-service-level-objectives for details
Updated by tjyrinki_suse about 2 years ago
- Status changed from New to Workable
- Start date deleted (
2021-08-03)
Updated by tjyrinki_suse almost 2 years ago
- Target version set to QE-Core: Ready
Would be nice to work on samba_ad issues (or at least find out which tasks are relevant at the moment) for the next sprint.
Updated by tjyrinki_suse over 1 year ago
- Target version deleted (
QE-Core: Ready)
Updated by slo-gin 8 months ago
This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.
Updated by szarate 8 months ago
- Related to action #120859: [qe-core] test fails in samba_adcli - Unschedule test added
Updated by szarate 8 months ago
- Subject changed from [qe-core][sporadic] test fails in samba_adcli (fails on ssh to domain) to [qe-core][sporadic] Re-Enable Active Directory tests - test fails in samba_adcli (fails on ssh to domain)
You can use qamaster.qa.suse.de to deploy a new Windows VM. I think Marita can provide a valid windows 2019 license key, in case I can't find it.
Updated by ph03nix 7 months ago
Test was deleted in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/15968 due to https://progress.opensuse.org/issues/120859
Updated by ph03nix 7 months ago
I'm collecting notes and setup instructions on https://confluence.suse.com/display/qasle/AD+configuration+for+testing
Updated by ph03nix 7 months ago
- File Win2019_Validation.png Win2019_Validation.png added
The provided product key doesn't seem to work:
Updated by ph03nix 7 months ago
Managed to install, activate and update the Windows Server installation. Now I'm assigning a static IP address to it.
The hostname will be "methusalix"
https://gitlab.suse.de/qa-sle/qanet-configs/-/merge_requests/48
Updated by ph03nix 7 months ago
I'm updating the https://confluence.suse.com/display/qasle/AD+configuration+for+testing along the way to document the installation process.
Updated by szarate 7 months ago
GPO enablement PR (Needs it's own ticket, & somebody to pick it up) https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16107
Updated by ph03nix 7 months ago
I've installed the Windows Server 2019 and documented the process on https://confluence.suse.com/display/qasle/AD+configuration+for+testing
I've tested to join the Domain via Windows 10 (OK) and via SLES 15-SP5 (YaST - OK). Now will check for joining the domain via the CLI and on how the test setup is interacting with it.
Updated by ph03nix 7 months ago
szarate wrote:
GPO enablement PR (Needs it's own ticket, & somebody to pick it up) https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16107
Let's please do this in a separate ticket, otherwise this grows too much. The new ticket should be blocked by this one and I'm likely to take it afterwards. But please still: New ticket.
Updated by ph03nix 7 months ago
Currently facing the problem, that the AD Domain Controller has network issues from duck norris
# host feldspaten.org 10.162.30.119
;; connection timed out; no servers could be reached
# ping -c 2 10.162.30.119
PING 10.162.30.119 (10.162.30.119) 56(84) bytes of data.
64 bytes from 10.162.30.119: icmp_seq=1 ttl=126 time=1.10 ms
64 bytes from 10.162.30.119: icmp_seq=2 ttl=126 time=1.16 ms
--- 10.162.30.119 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.099/1.129/1.160/0.030 ms
However it works from my laptop.
Updated by ph03nix 7 months ago
- Related to action #96983: [qe-core][sporadic][samba_adcli] adcli joining domain fails added
Updated by ph03nix 7 months ago
# net ads join --domain METHUSALIX.QA.SUSE.DE -U Administrator -S methusalix.qa.suse.de
Password for [METHUSALIX.QA.SUSE.DE\Administrator]:
Failed to join domain: failed to lookup DC info for domain 'METHUSALIX.QA.SUSE.DE' over rpc: {Operation Failed} The requested operation was unsuccessful
Updated by ph03nix 7 months ago
https://duck-norris.qe.suse.de/tests/11940#step/samba_adcli/161
Password for [METHUSALIX0\Administrator]:
Failed to join domain: failed to lookup DC info for domain 'METHUSALIX.QA.SUSE.DE' over rpc: {Device Timeout} The specified I/O operation on %hs was not completed before the time-out period expired.
There is https://www.suse.com/support/kb/doc/?id=000020507 for this, looks like a configuration issue on the Windows Server. I have allowed all ports and services there, so I don't know where this is coming from.
Updated by ph03nix 7 months ago
I can join the domain just fine on a Leap 15.4 VM from my Laptop:
# net ads join --domain 'METHUSALIX.QA.SUSE.DE' -U Administrator -S 'methusalix.qa.suse.de' --no-dns-updates
Password for [METHUSALIX0\Administrator]:
Using short domain name -- METHUSALIX0
Joined 'LEAP15-4' to dns domain 'methusalix.qa.suse.de'
Need to check what is blocking the network from the openQA VM to the Domain.
Updated by ph03nix 7 months ago
Good, issue can be reproduced on a new VM on ada:
zergling:~ # net ads join --domain 'METHUSALIX.QA.SUSE.DE' -U Administrator -S 'methusalix.qa.suse.de' --no-dns-updates
Password for [METHUSALIX0\Administrator]:
Failed to join domain: failed to lookup DC info for domain 'METHUSALIX.QA.SUSE.DE' over rpc: {Device Timeout} The specified I/O operation on %hs was not completed before the time-out period expired.
Now going to check where the issue comes from.
Updated by ph03nix 7 months ago
A tcpdump
shows that requests are going out to ports 445 and 139 (as described in https://www.suse.com/support/kb/doc/?id=000020507) in TCP - but nothing comes back
09:05:47.579065 IP 10.137.7.199.56258 > 10.162.30.119.445: Flags [S], seq 3725595907, win 64240, options [mss 1460,sackOK,TS val 377180263 ecr 0,nop,wscale 7], length 0
09:05:47.584195 IP 10.137.7.199.48880 > 10.162.30.119.139: Flags [S], seq 3763447905, win 64240, options [mss 1460,sackOK,TS val 377180268 ecr 0,nop,wscale 7], length 0
09:05:48.593784 IP 10.137.7.199.48880 > 10.162.30.119.139: Flags [S], seq 3763447905, win 64240, options [mss 1460,sackOK,TS val 377181278 ecr 0,nop,wscale 7], length 0
09:05:48.593808 IP 10.137.7.199.56258 > 10.162.30.119.445: Flags [S], seq 3725595907, win 64240, options [mss 1460,sackOK,TS val 377181278 ecr 0,nop,wscale 7], length 0
09:05:50.609816 IP 10.137.7.199.56258 > 10.162.30.119.445: Flags [S], seq 3725595907, win 64240, options [mss 1460,sackOK,TS val 377183294 ecr 0,nop,wscale 7], length 0
09:05:50.609820 IP 10.137.7.199.48880 > 10.162.30.119.139: Flags [S], seq 3763447905, win 64240, options [mss 1460,sackOK,TS val 377183294 ecr 0,nop,wscale 7], length 0
Updated by ph03nix 7 months ago
A tcpdump between the test host (zergling
) and the VM Hypervisor qamaster
host shows that there are packets being blocked between the subnets:
OUTGOING from zergling
(showing only TCP)
09:30:36.657256 IP 10.137.7.199.49328 > 10.162.30.119.445: Flags [S], seq 3452051387, win 64240, options [mss 1460,sackOK,TS val 378669341 ecr 0,nop,wscale 7], length 0
09:30:36.662387 IP 10.137.7.199.37764 > 10.162.30.119.139: Flags [S], seq 2501689049, win 64240, options [mss 1460,sackOK,TS val 378669347 ecr 0,nop,wscale 7], length 0
09:30:37.681754 IP 10.137.7.199.37764 > 10.162.30.119.139: Flags [S], seq 2501689049, win 64240, options [mss 1460,sackOK,TS val 378670366 ecr 0,nop,wscale 7], length 0
09:30:37.681770 IP 10.137.7.199.49328 > 10.162.30.119.445: Flags [S], seq 3452051387, win 64240, options [mss 1460,sackOK,TS val 378670366 ecr 0,nop,wscale 7], length 0
09:30:39.697759 IP 10.137.7.199.49328 > 10.162.30.119.445: Flags [S], seq 3452051387, win 64240, options [mss 1460,sackOK,TS val 378672382 ecr 0,nop,wscale 7], length 0
09:30:39.697762 IP 10.137.7.199.37764 > 10.162.30.119.139: Flags [S], seq 2501689049, win 64240, options [mss 1460,sackOK,TS val 378672382 ecr 0,nop,wscale 7], length 0
INCOMING on qa-master
(showing all)
10:30:36.645760 IP 10.137.7.199.58684 > 10.162.30.119.389: UDP, length 103
10:30:36.646472 IP 10.162.30.119.389 > 10.137.7.199.58684: UDP, length 184
I'm missing here all TCP packets from port 445 and 139.
Updated by ph03nix 7 months ago
I opened https://sd.suse.com/servicedesk/customer/portal/1/SD-113037 for this issue.
Updated by ph03nix 7 months ago
Advancing slowly: https://duck-norris.qe.suse.de/tests/12108#
Updated by ph03nix 7 months ago
The DC password is exposed and that is not nice. I need to move the define_secret_variable
function from https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/publiccloud/utils.pm#L177 to lib/utils.pm
.
Updated by ph03nix 7 months ago
define_secret_variable
PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16482
Updated by ph03nix 7 months ago
New attempt for define_secret_variable
: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16493
Updated by szarate 7 months ago
might be useful: https://progress.opensuse.org/issues/117979
Updated by ph03nix 7 months ago
- Related to tickets #117979: [Regression][AD] AD: Use nautilus to access DFS share Folder added
Updated by ph03nix 7 months ago
kinit succeeded but ads_sasl_spnego_gensec_bind(KRB5) failed for ldap/methusalix.qa.suse.de with user[Administrator] realm[METHUSALIX.QA.SUSE.DE]: An invalid parameter was passed to a service or function.
kinit succeeded but ads_sasl_spnego_gensec_bind(KRB5) failed for ldap/methusalix.qa.suse.de with user[SUSETEST$] realm[METHUSALIX.QA.SUSE.DE]: An invalid parameter was passed to a service or function.
https://duck-norris.qe.suse.de/tests/12176#step/samba_adcli/163 It was clear, that older versions would not work immediately.
Updated by ph03nix 7 months ago
AD Secret settings: https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/501
Updated by ph03nix 7 months ago
How long does it take to edit a stupid simple config files? 1 full day of learning salt because that's a requirement for contributing. grml
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/802
Updated by ph03nix 6 months ago
PR: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16517
The Schedule needs to be modified accordingly in the next step.
Updated by ph03nix 6 months ago
Add settings to Single Incident test runs: https://gitlab.suse.de/qa-maintenance/qam-openqa-yml/-/merge_requests/486
Updated by ph03nix 6 months ago
Re-Introduce Samba AD: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16671
Updated by ph03nix 6 months ago
Disable IPv6: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16684
And POO for changing this after not being needed anymore: https://progress.opensuse.org/issues/126482
Updated by ph03nix 6 months ago
Asking our s390x workers to also get access to qa.suse.de, so they can reach the AD: https://sd.suse.com/servicedesk/customer/portal/1/SD-115963
Updated by ph03nix 6 months ago
I also need to disable samba_adcli
for 12-SP2. Yes, 12-SP2, see https://openqa.suse.de/tests/10753217
Updated by ph03nix 6 months ago
PR: Disable samba_adcli on SLES<12-SP3: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16692
Updated by ph03nix 6 months ago
- Status changed from In Progress to Resolved
- % Done changed from 90 to 100
Marking this as resolved and will fix the firewall issues in https://progress.opensuse.org/issues/126866
Updated by ph03nix 6 months ago
- Related to action #126866: [qe-core] test fails in samba_adcli on s390x added